WO2024041318A1 - 图像集的生成方法、装置、设备和计算机可读存储介质 - Google Patents

图像集的生成方法、装置、设备和计算机可读存储介质 Download PDF

Info

Publication number
WO2024041318A1
WO2024041318A1 PCT/CN2023/110271 CN2023110271W WO2024041318A1 WO 2024041318 A1 WO2024041318 A1 WO 2024041318A1 CN 2023110271 W CN2023110271 W CN 2023110271W WO 2024041318 A1 WO2024041318 A1 WO 2024041318A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
abnormal
creature
background
Prior art date
Application number
PCT/CN2023/110271
Other languages
English (en)
French (fr)
Inventor
石瑞姣
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Publication of WO2024041318A1 publication Critical patent/WO2024041318A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image

Definitions

  • the present disclosure relates to the field of image processing, and specifically to a method, device, equipment and computer-readable storage medium for generating an image set.
  • target detection algorithms based on deep learning networks are used to detect whether abnormal organisms appear in restricted areas.
  • deep learning networks usually require a large number of data sets to train and generate.
  • most of the images captured do not contain abnormal creatures. Therefore, even if the above target detection algorithm has high accuracy, Using a large number of images without abnormal targets (abnormal organisms) to train a deep learning network, the detection accuracy of the trained deep learning network cannot meet the detection requirements.
  • Embodiments of the present disclosure provide a method, device, equipment and computer-readable storage medium for generating an image set.
  • embodiments of the present disclosure provide a method for generating an image set.
  • the image set is used to train a detection model for abnormal organisms in a restricted area.
  • the image set includes a plurality of sample images; the method includes the following steps: Generate each sample image:
  • the first image set includes a plurality of first images
  • the target image and the background image are synthesized to obtain the sample image, wherein the background image is obtained by photographing the restricted area.
  • At least one target image is acquired based on the pre-acquired first image set, including:
  • the target image set includes at least one target image.
  • performing instance segmentation processing on each first image in the first image set to obtain a target image set corresponding to each first image includes:
  • the first image is input to the Mask R-CNN instance segmentation network for processing to obtain the target image set.
  • the background image is obtained by photographing the restricted area using a target device
  • the target size parameter for synthesizing the target image into the background image according to the corresponding coordinates of the location to be pasted, the device parameters of the target device, the image size of the background image, and the preset size of the abnormal creature.
  • the position to be pasted is the position in the background image when the target image and the background image are combined;
  • the adjusted target image and the background image are image synthesized to obtain the sample image.
  • the method before determining the target size parameter, the method further includes:
  • performing semantic segmentation processing on the background image and determining multiple image areas in the background image includes:
  • the background image is input to the U-Net semantic segmentation network for processing, and multiple image regions are obtained.
  • the device parameters of the target device include at least: the installation height of the target device, the focal length of the target device, and the angle between the optical axis of the target device and the vertical direction,
  • Determining a target size parameter for synthesizing the target image into the background image based on the device parameters of the target device, the image size of the background image and the preset size of the abnormal creature includes:
  • the first angle is determined according to the corresponding coordinates of the position to be pasted and the focal length of the target device; the first angle is the connection between the position of the target device and the bottom position of the abnormal creature, and the line between the target device and the location of the target device.
  • the second angle is determined according to the first angle, the installation height of the target device, the angle between the optical axis of the target device and the vertical direction, and the preset size of the abnormal creature; wherein, the abnormal creature The preset size is determined according to the type of the abnormal creature; the second angle is the angle between the first connection line and the second connection line, and the first connection line is the location of the target device and The connection between the bottom position of the abnormal creature, the second connection is the connection between the position of the target device and the top position of the abnormal creature;
  • the target size parameter is determined based on the first angle, the second angle and the image size.
  • the step of resizing the target image according to the size parameter to obtain an adjusted target image includes:
  • the width and height of the target image are adjusted respectively to obtain the adjusted target image.
  • image synthesis of the adjusted target image and the background image to obtain the sample image includes:
  • Color adjustment is performed on the first image to obtain the sample image, and the color adjustment includes brightness adjustment and/or chroma adjustment.
  • performing color adjustment on the first image to obtain the sample image includes:
  • the first image and the second image are input to a color neural network to perform color adjustment on the first image to obtain the sample image.
  • inventions of the present disclosure provide a device for generating an image set.
  • the image set is used to train a detection model for abnormal organisms in restricted areas.
  • the image set includes a plurality of sample images; the device includes:
  • the acquisition module is configured to acquire at least one target image based on a pre-acquired first image set, where the target image is an image of the abnormal creature segmented from a first image in the first image set, so
  • the first image set includes a plurality of first images
  • the processing module is configured to synthesize the target image and a background image to obtain the sample image, wherein the background image is obtained by photographing the restricted area.
  • an embodiment of the present disclosure provides a device for generating an image set, including a memory and a processor, a computer program is stored on the memory, and when the computer program is executed by the processor, the method described in the first aspect is implemented.
  • embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the method described in the first aspect is implemented.
  • FIG. 1 is a schematic flowchart of a method for generating an image set provided by an embodiment of the present disclosure.
  • FIG. 2 is a schematic flowchart of another method for generating an image set provided by an embodiment of the present disclosure.
  • Figure 3 is a schematic framework diagram of a Mask R-CNN instance segmentation network provided by an embodiment of the present disclosure.
  • Figure 4a is an original image provided by an embodiment of the present disclosure.
  • Figure 4b is an image after instance segmentation processing of Figure 4a.
  • Figure 5 is a schematic diagram of a target image set provided by an embodiment of the present disclosure.
  • Figure 6 is a schematic framework diagram of a U-Net semantic segmentation network provided by an embodiment of the present disclosure.
  • Figure 7 is an image after semantic segmentation processing of Figure 4a.
  • FIG. 8 is a schematic diagram of the camera imaging principle provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic flowchart of another method for generating an image set provided by an embodiment of the present disclosure.
  • Figure 10 is a schematic framework diagram of a RainNet neural network provided by an embodiment of the present disclosure.
  • FIG. 11 is a schematic structural diagram of a device for generating an image set provided by an embodiment of the present disclosure.
  • FIG. 12 is a schematic structural diagram of an image set generating device provided by an embodiment of the present disclosure.
  • Figure 13 is a schematic structural diagram of a computer-readable storage medium provided by an embodiment of the present disclosure.
  • target detection algorithms based on deep learning networks are used to detect whether abnormal organisms appear in restricted areas.
  • deep learning networks usually require a large number of data sets to train and generate, and for restricted areas, most of the images captured do not contain abnormal creatures. Therefore, even if the above target detection algorithm has high accuracy, using a large number of images without abnormal creatures to train the deep learning network, the detection accuracy of the trained deep learning network cannot meet the detection requirements.
  • image sets can be generated through image synthesis.
  • target images including abnormal creatures and background images including restricted area environments are synthesized to generate image sets to train deep learning networks and improve Detection accuracy of deep learning networks.
  • the above-mentioned image synthesis method has the problem that the proportion of the target image is inconsistent, causing the target image to be too abrupt, and the chromaticity and/or brightness of the target image and the background image are inconsistent, causing the image to be unreal.
  • embodiments of the present disclosure provide a method for generating an image set.
  • the above image set is used to train a detection model for abnormal organisms in a restricted area.
  • Figure 1 is a schematic flowchart of a method for generating an image set provided by an embodiment of the present disclosure.
  • the image set includes multiple sample images.
  • the method for generating an image set includes generating each sample image according to the following steps:
  • S1 Acquire at least one target image based on a pre-acquired first image set.
  • the target image is an image of an abnormal creature segmented from a first image in the first image set.
  • the first image set includes a plurality of first images.
  • the first image set is selected from an existing image set, and each first image in the first image set is an image with an abnormal creature.
  • the abnormal creatures here refer to creatures that are prohibited from entering the restricted area, such as humans or other animals.
  • the method for generating an image set acquires a target image based on a first image set, and synthesizes a target image with abnormal creatures and a background image with a restricted area environment to obtain a sample image. Since there are abnormal organisms in the above sample images, the detection model trained using the image set formed by multiple above sample images is used to detect abnormal organisms in the restricted area. The detection accuracy can be improved when measuring.
  • FIG. 2 is a schematic flowchart of another method for generating an image set provided by an embodiment of the present disclosure.
  • step S1 may include:
  • S11 Perform instance segmentation processing on each first image in the first image set to obtain a target image set corresponding to each first image; the target image set includes at least one target image.
  • the above-mentioned instance segmentation refers to framing different instances in the image according to the target detection method, and labeling the different instance areas pixel by pixel through the semantic segmentation algorithm to segment at least one target.
  • a two-stage network with higher segmentation accuracy is used for instance segmentation processing, such as Mask R-CNN instance segmentation network.
  • step S11 may include: inputting the first image to a pre-trained Mask R-CNN instance segmentation network for processing to obtain a target image set. Since the first image may include at least one abnormal creature, after the instance segmentation process, at least one target image may be segmented and a target image set may be formed.
  • Figure 3 is a schematic framework diagram of a Mask R-CNN instance segmentation network provided by an embodiment of the present disclosure.
  • the image is input into the region of interest alignment (ROI Align) network through the "bilinear interpolation" algorithm.
  • ROI Align region of interest alignment
  • the above feature map is divided into multiple candidate boxes (class boxes) according to the size of the area of interest in the feature map and the degree of pooling.
  • convolution (conv) operation is performed to achieve Accurate segmentation of input images.
  • the advantage of the Mask R-CNN instance segmentation network is that it uses the ROI Align operation, which is the "bilinear interpolation” algorithm, and no longer introduces quantization errors, so that the pixels in the input image and the pixels in the feature image are completely aligned without deviation, improving detection Accuracy.
  • Figure 4a is an original image provided by an embodiment of the present disclosure
  • Figure 4b is an image after instance segmentation processing of Figure 4a.
  • the instance segmentation technology can be used to segment the target person in the image, and at the same time, different target persons can be distinguished.
  • FIG. 5 is a schematic diagram of a target image set provided by an embodiment of the present disclosure.
  • Each target image in the target image set can be stored in a png format with a transparent background.
  • the multiple target images in the target image set shown in Figure 5 may be segmented from one first image or may be segmented from multiple first images.
  • the embodiments of the present disclosure are suitable for This is not a limitation.
  • the image set synthesis method may also include:
  • S20 Perform semantic segmentation processing on the background image to determine multiple image areas in the background image; use one of the multiple image areas as the target area; determine any position in the target area as the location to be pasted.
  • the synthesis process of the target image and the background image can be simply regarded as pasting the target image onto the background image;
  • the position to be pasted refers to the position where the target image is placed on the background image during the synthesis process of the target image and the background image.
  • the placement position of the target image on the background image that is, the position to be pasted, to ensure the authenticity of the sample image.
  • semantic segmentation technology is used to divide different areas in the background image, and the area to be pasted is selected based on the division results. For example, for the lake water fall detection scene, the target image is pasted only in the lake area; for the lawn trampling detection scene, the target image is pasted only in the lawn area.
  • the U-Net semantic segmentation network is used to process the background image to ensure the accuracy of the segmented image areas.
  • performing semantic segmentation processing on the background image in S20 to determine multiple image areas in the background image may specifically include: inputting the background image into a pre-trained U-Net semantic segmentation network for processing, and obtaining multiple image areas. image area.
  • FIG 6 is a schematic framework diagram of a U-Net semantic segmentation network provided by an embodiment of the present disclosure.
  • the U-Net semantic segmentation network includes a first module 1, a second module 2 and a third module.
  • Block 3 wherein the first module 1 includes a plurality of first units 11, each first unit 11 includes a plurality of first networks 11a and a pooling network (Pooling) 11b;
  • the second module includes a plurality of second units 12, each The second unit 12 includes an upsampling network (Upsampling) 12a and a plurality of first networks 11a, and each second unit 12 has a corresponding first unit 11;
  • the third module 3 includes a regression network 31 (Softmax).
  • the above-mentioned first network 11a uses convolution (conv) and batch normalization (batch normalization) operations and combines them with an activation function (ReLU) to transform a low-resolution image containing high-dimensional features into a high-resolution image while retaining high-dimensional features. resolution.
  • conv convolution
  • batch normalization batch normalization
  • ReLU activation function
  • the input of the first first unit 11 in the first module 1 is the original image
  • the inputs of other first units 11 are the output images of the upper level first unit 11, and each first unit 11 is in After feature extraction through continuous convolution and pooling processing, the feature image is input to the corresponding second unit 12 .
  • the input of the second unit 12 other than the first one in the second module 2 also includes the characteristic image processed by the second unit 12 of the upper level, that is, the second unit 12 other than the first one converts the first
  • the feature image input by the unit 11 and the feature image input by the upper-level second unit 12 are feature fused, and then combined with the activation function for upsampling processing, and the last-level second unit 12 inputs the processed feature image to the third Module 3.
  • the loss function is calculated through the regression network 31 in the third module 3, and when the loss function meets the preset function requirements, the final region segmentation result is output.
  • Figure 7 is an image after semantic segmentation processing of Figure 4a.
  • the scene in Figure 4a is divided into different areas, and the segmentation results are shown in Figure 7, where the crowd can be represented by different colors , trees, grass, and sky in the area in the picture.
  • step S2 may include steps S21-step S23, wherein:
  • the target size parameters for synthesizing the target image into the background image according to the corresponding coordinates of the position to be pasted, the device parameters of the target device, the image size of the background image, and the preset size of the abnormal creature; the position to be pasted is between the target image and The background image is composited in the background location in the image.
  • the corresponding coordinates of the position to be pasted refer to the coordinates of the position to be pasted in the background image.
  • the target device is a device that shoots the restricted area to obtain the background image.
  • the above target size parameters are determined based on the following considerations: First, when the target device and shooting angle are determined, the photographed object is located in different positions in the shooting environment, which will form differences in the imaging picture relative to the background environment. Proportion. For example, under the same background, the closer the target object is to the lens, the higher the proportion of the image it occupies in the imaging screen. Secondly, different devices have different device parameters, so even if the same scene is shot at the same location, the resulting picture effects will be different. Finally, the target size parameter is the size of the target image presented in the background image. Therefore, in order to ensure the authenticity of the image, it is also necessary to consider the preset size of the abnormal creature and the image size of the background image to be as close as possible to the real collected image. .
  • the above-mentioned target size parameter may be a height parameter of the target image synthesized into the background image, or a width parameter of the target image synthesized into the background image, which is not limited in this embodiment of the disclosure.
  • the determination process of the target size parameter will be explained below with reference to the accompanying drawing, taking the target size parameter as the height parameter as an example.
  • Figure 8 is a schematic diagram of the camera imaging principle provided by the embodiment of the present disclosure. Combined with the above analysis, as shown in Figure 8, the corresponding coordinate O' of the above-mentioned position to be pasted, the device parameters of the target device, the image size of the background image image_h and the abnormal creature The preset sizes h all have an impact on the target size parameters.
  • the equipment parameters of the target device at least include: the installation height H of the target device, the focal length f of the target device, and the angle ⁇ between the optical axis of the target device and the vertical direction.
  • step S21 may specifically include:
  • the first angle ⁇ is determined, which can be expressed specifically by Formula 1.
  • the above-mentioned first angle ⁇ refers to the angle between the line OC connecting the position O of the target device and the bottom position C of the abnormal creature, and the optical axis of the target device.
  • the second angle ⁇ is determined based on the first angle ⁇ , the installation height H of the target device, the angle ⁇ between the optical axis of the target device and the vertical direction, and the preset size h of the abnormal creature, which can be expressed specifically by Formula 2.
  • the above-mentioned second angle ⁇ refers to the angle between the first connection line and the second connection line.
  • the first connection line is the connection line OC between point O where the target device is located and the bottom position of the abnormal creature
  • the second connection line is O The line OD connecting the point and the top position D of the abnormal creature.
  • the preset size of the abnormal creature refers to a size that is related to the type of the abnormal creature and is close to the actual size of the abnormal creature (the size here may specifically refer to the height).
  • the preset size of the abnormal creature is determined according to the type of the abnormal creature, and the preset size of the same type of abnormal creature is the same.
  • the preset size of the abnormal creature is determined according to a pre-stored mapping relationship table between the preset size and the biological type; the mapping table may store preset sizes corresponding to multiple biological types.
  • the preset size can be set to 1.6m or 1.75m, etc.; when the abnormal creature is a dog, the preset size can be set to 0.3m or 0.5m, etc.
  • Formula 2 DP CP ⁇ tan(90°- ⁇ )
  • the target size parameter AB is determined, which can be expressed specifically by Formula 3.
  • the image size is the height of the image.
  • the image size is the width of the image.
  • S22 Adjust the size of the target image according to the target size parameter to obtain the adjusted target image.
  • step S22 may specifically include:
  • the adjustment ratio of the target image is determined; according to the adjustment ratio, the width and height of the target image are adjusted respectively to obtain the adjusted target image.
  • the image size of the target image is a ⁇ b
  • the target size parameter is c.
  • the adjustment ratio is c/a
  • the height of the target image is adjusted according to the adjustment ratio.
  • the width is adjusted to c/a ⁇ b, and the adjusted target image is obtained;
  • the adjustment ratio is c/b
  • the width of the target image is adjusted to c according to the adjustment ratio, and the height is adjusted is c/b ⁇ a, the adjusted target image is obtained.
  • the method for generating an image set obtained by the embodiment of the present disclosure obtains a target image set containing at least one target image through instance segmentation in the pre-acquired first image set; since the background image is captured by a camera device in a restricted area, Therefore, based on the imaging principle, the target size parameters of the target image on the background image are determined, and the image size of the target image is adjusted according to the target size parameters to improve the authenticity of the sample image after the target image and the background image are synthesized.
  • FIG. 9 is a schematic flowchart of another method for generating an image set provided by an embodiment of the present disclosure.
  • step S23 may specifically include step S231 to step S233.
  • S231 Determine the calibration position in the adjusted target image according to the type of abnormal creature.
  • the calibration position in the adjusted target image is the position of the person's feet in the image.
  • S232 Paste the adjusted target image into the background image to obtain a first image.
  • the calibration position is aligned with the position to be pasted in the background image.
  • S233 Perform color adjustment on the first image to obtain a sample image.
  • the color adjustment includes brightness adjustment and/or chroma adjustment.
  • the color of the first image formed after pasting is not coordinated. Therefore, using the lighting of the background image as a benchmark, a color neural network is used to classify the first image. Make color adjustments.
  • step S233 may specifically include:
  • An area in the first image located outside the area where the target image is located is set as the first preset color to obtain a second image.
  • the first image and the second image are input to the color neural network to perform color adjustment on the first image to obtain a sample image.
  • the first preset color is black, that is, pixels in areas of the first image located outside the area where the target image is located are set to 0.
  • the color neural network can use the RainNet neural network to perform style transfer on the pasted target image based on the background image, so that the target image and the background image are more integrated.
  • Figure 10 is a schematic framework diagram of a RainNet neural network provided by an embodiment of the present disclosure. As shown in Figure 10, the RainNet neural network includes a first convolution module 4, a second convolution module 5, a third convolution module 6 and an inverse convolution module. Convolution module 7.
  • the first convolution module 4 includes a convolution network 41; the deconvolution module 7 includes a deconvolution network 71; the second convolution module 5 includes a plurality of second convolutions based on the modified activation function (LReLU) Network 51; the third convolution module 6 includes a plurality of cascaded convolution units, specifically including a plurality of first convolution units 61, a plurality of second convolution units 62 and a third convolution unit 63, wherein the first The convolution unit 61 includes a second convolution network 51 and an activation function-based deconvolution network 61a; the second convolution unit 62 includes a second convolution network 51, an activation function-based deconvolution network 61a and An attention self-control network 62a; the third convolution unit 63 includes a deconvolution based on activation function network 61a, a convolutional network 41 and an attention self-control network 62a.
  • the first convolution unit 61 includes a second convolution network 51 and an activation function-
  • the first image Ic is subjected to multi-layer convolution processing through the first convolution module 4 and the second convolution module 5, and high-dimensional features are extracted and input to the third convolution module 6.
  • the third volume The product module 6 takes the first image input by the second convolution network 51 of each level and the second image M with the same resolution as input together.
  • Ic ⁇ (1-M) obtains the background area in Ic
  • Ic ⁇ M obtains Ic In the middle foreground area
  • deconvolution module 7 the statistical style parameters ⁇ i and ⁇ i are obtained.
  • the generated ⁇ i and ⁇ i are multiplied and added to the normalized foreground features in a channel manner to obtain the sample image I ⁇ to achieve color balance. , making the picture content in the sample image I ⁇ more coordinated and improving the authenticity of the sample image.
  • FIG. 11 is a schematic structural diagram of an image set generating device provided by an embodiment of the present disclosure.
  • the device is used to execute the above image set generating method.
  • the device for generating an image set includes: an acquisition module 10 and a processing module 20 .
  • the acquisition module 10 is configured to acquire at least one target image based on a pre-acquired first image set, where the target image is an image of the abnormal creature segmented from a first image in the first image set.
  • the first image set includes a plurality of first images.
  • the processing module 20 is configured to synthesize the target image and a background image to obtain the sample image, wherein the background image is obtained by photographing the restricted area.
  • each module can be found in the description of the above image set generation method, and will not be described again here.
  • Figure 12 is a schematic structural diagram of an image set generating device provided by an embodiment of the present disclosure.
  • the electronic device 100 includes: a memory 101 and a processor 102.
  • a computer program is stored on the memory 101, wherein the computer program When executed by the processor 102, the above image set generating method is implemented, for example, steps S1 to S2 in FIG. 1 are implemented.
  • the electronic device 100 may be a computing device such as a desktop computer, a notebook, a PDA, a cloud server, etc.
  • Electronic device 100 may include, but is not limited to, processor 102 and memory 101 .
  • FIG. 12 is only an example of the electronic device 100.
  • the electronic device 100 is not limited and may include more or fewer components than shown in the figure, or a combination of certain components, or different components.
  • the electronic device 100 may also include input and output devices, network access devices, etc. devices, buses, etc.
  • the processor 102 can be a central processing unit (Central Processing Unit, CPU), or other general-purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or off-the-shelf processor.
  • Programmable gate array Field-Programmable Gate Array, FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general processor 102 may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 101 may be an internal storage unit of the electronic device 100, such as a hard disk or memory of the electronic device 100.
  • the memory 101 may also be an external storage device of the electronic device 100, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), or a secure digital (SD) equipped on the electronic device 100. Card, Flash Card, etc.
  • the memory 101 may also include both an internal storage unit of the electronic device 100 and an external storage device.
  • the memory 101 is used to store the computer program and other programs and data required by the terminal device.
  • the memory 101 can also be used to temporarily store data that has been output or is to be output.
  • Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.
  • Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above-mentioned integrated unit can be hardware-based. It can also be implemented in the form of software functional units.
  • the specific names of each functional unit and module are only for the convenience of distinguishing each other and are not used to limit the scope of protection of the present application.
  • Figure 13 is a schematic structural diagram of a computer-readable storage medium provided by an embodiment of the present disclosure.
  • a computer program 201 is stored on the computer-readable storage medium 200.
  • the computer program 201 is implemented when executed by a processor.
  • the above method for generating an image set for example, implements step S1 to step S2 in Figure 1 .
  • Computer readable storage medium 200 includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, disk storage or other magnetic storage devices , or any other medium that can be used to store the desired information and can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例提供一种图像集的生成方法、装置、设备和计算机可读存储介质。图像集用于训练针对禁区异常生物的检测模型,图像集包括多个样本图像;图像集的生成方法包括根据以下步骤生成每一样本图像:根据预先获取的第一图像集,获取至少一个目标图像,目标图像是从第一图像集中的第一图像上的分割出的异常生物的图像;将目标图像与背景图像进行合成,得到样本图像,其中,背景图像是对禁区进行拍摄得到的。

Description

图像集的生成方法、装置、设备和计算机可读存储介质 技术领域
本公开涉及图像处理领域,具体涉及一种图像集的生成方法、装置、设备和计算机可读存储介质。
背景技术
出于对生命或财产安全的考虑,某些地区是严禁行人或动物出没的,以避免形成安全隐患,因此,针对禁止人员或动物出现的禁区内进行异常生物检测是十分有必要的。
相关技术中,通过基于深度学习网络的目标检测算法来检测禁区中是否出现异常生物。其中,深度学习网络通常需要大量的数据集来训练生成,而对于禁区来说,其拍摄到的图像中大多数是没有异常生物出现的,因此,即使上述目标检测算法具有较高的精度,但使用大量不存在异常目标(异常生物)的图像来训练深度学习网络,得到的训练后的深度学习网络的检测准确度无法满足检测要求。
发明内容
本公开实施例提供一种图像集的生成方法、装置、设备和计算机可读存储介质。
第一方面,本公开实施例提供一种图像集的生成方法,所述图像集用于训练针对禁区内异常生物的检测模型,所述图像集包括多个样本图像;所述方法包括根据以下步骤生成每一样本图像:
根据预先获取的第一图像集,获取至少一个目标图像,所述目标图像是从所述第一图像集中的第一图像上的分割出的所述异常生物的图像, 所述第一图像集包括多个第一图像;
将所述目标图像与背景图像进行合成,得到所述样本图像,其中,所述背景图像是对所述禁区进行拍摄得到的。
在一些实施例中,根据预先获取的第一图像集,获取至少一个目标图像,包括:
对所述第一图像集中的每个第一图像,进行实例分割处理,得到与每个所述第一图像对应的目标图像集;所述目标图像集中包括至少一个所述目标图像。
在一些实施例中,所述对所述第一图像集中的每个第一图像,进行实例分割处理,得到与每个所述第一图像对应的目标图像集,包括:
将所述第一图像输入至Mask R-CNN实例分割网络进行处理,得到所述目标图像集。
在一些实施例中,所述背景图像是使用目标设备对所述禁区进行拍摄得到的,
所述将所述目标图像与背景图像进行合成,得到所述样本图像,包括:
根据待粘贴位置的对应坐标、所述目标设备的设备参数、所述背景图像的图像尺寸和所述异常生物的预设尺寸,确定将所述目标图像合成至所述背景图像中的目标尺寸参数;所述待粘贴位置是所述目标图像与所述背景图像进行合成时在所述背景图像中所处的位置;
根据所述目标尺寸参数对所述目标图像进行尺寸调整,得到调整后的目标图像;
将所述调整后的目标图像与所述背景图像进行图像合成,得到所述样本图像。
在一些实施例中,确定所述目标尺寸参数之前,所述方法还包括:
对所述背景图像进行语义分割处理,确定所述背景图像中的多个图 像区域;将所述多个图像区域中的其中一个,作为目标区域;
将所述目标区域中任一位置确定为所述待粘贴位置。
在一些实施例中,所述对所述背景图像进行语义分割处理,确定所述背景图像中的多个图像区域,包括:
将所述背景图像输入至U-Net语义分割网络进行处理,得到多个所述图像区域。
在一些实施例中,所述目标设备的设备参数至少包括:所述目标设备的安装高度、所述目标设备的焦距、所述目标设备的光轴与竖直方向的夹角,
所述根据所述目标设备的设备参数、所述背景图像的图像尺寸和所述异常生物的预设尺寸,确定将所述目标图像合成至所述背景图像中的目标尺寸参数,包括:
根据待粘贴位置的对应坐标和所述目标设备的焦距,确定第一角度;所述第一角度为所述目标设备所在位置和所述异常生物的底部位置之间的连线,与所述目标设备的光轴之间的夹角;
根据所述第一角度、所述目标设备的安装高度、所述目标设备的光轴与竖直方向的夹角和所述异常生物的预设尺寸,确定第二角度;其中,所述异常生物的预设尺寸是根据所述异常生物的类型确定的;所述第二角度为第一连线与第二连线之间的夹角,所述第一连线为所述目标设备所在位置和所述异常生物的底部位置之间的连线,所述第二连线为所述目标设备所在位置和所述异常生物的顶部位置之间的连线;
根据所述第一角度、所述第二角度和所述图像尺寸,确定所述目标尺寸参数。
在一些实施例中,所述根据所述尺寸参数对所述目标图像进行尺寸调整,得到调整后的目标图像,包括:
根据所述图像尺寸和所述目标尺寸参数,确定所述目标图像的调整 比例;
根据所述调整比例,分别对所述目标图像进行宽度调整和高度调整,得到所述调整后的目标图像。
在一些实施例中,所述将所述调整后的目标图像与所述背景图像进行图像合成,得到所述样本图像,包括:
根据所述异常生物的类型,确定所述调整后的目标图像中的校准位置;
将所述调整后的目标图像粘贴至所述背景图像中,得到第一图像,所述第一图像中,所述校准位置与所述背景图像的待粘贴位置对正;
对所述第一图像进行色彩调整,得到所述样本图像,所述色彩调整包括亮度调整和/或色度调整。
在一些实施例中,所述对所述第一图像进行色彩调整,得到所述样本图像,包括:
将所述第一图像中位于所述目标图像所在区域之外的区域设置为第一预设颜色,得到第二图像;
将所述第一图像和所述第二图像输入至色彩神经网络,以对所述第一图像进行色彩调整,得到所述样本图像。
第二方面,本公开实施例提供一种图像集的生成装置,所述图像集用于训练针对禁区异常生物的检测模型,所述图像集包括多个样本图像;所述装置包括:
获取模块,被配置为根据预先获取的第一图像集,获取至少一个目标图像,所述目标图像是从所述第一图像集中的第一图像上的分割出的所述异常生物的图像,所述第一图像集包括多个第一图像;
处理模块,被配置为将所述目标图像与背景图像进行合成,得到所述样本图像,其中,所述背景图像是对所述禁区进行拍摄得到的。
第三方面,本公开实施例提供一种图像集的生成设备,包括存储器 和处理器,所述存储器上存储有计算机程序,所述计算机程序被所述处理器执行时实现第一方面所述的方法。
第四方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现第一方面所述的方法。
附图说明
附图是用来提供对本公开的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本公开,但并不构成对本公开的限制。在附图中:
图1为本公开实施例提供的一种图像集的生成方法的流程示意图。
图2为本公开实施例提供的另一图像集的生成方法的流程示意图。
图3为本公开实施例提供的一种Mask R-CNN实例分割网络的框架示意图。
图4a为本公开实施例提供的原始图像。
图4b为对图4a进行实例分割处理后的图像。
图5为本公开实施例提供的目标图像集的示意图。
图6为本公开实施例提供的一种U-Net语义分割网络的框架示意图。
图7为对图4a进行语义分割处理后的图像。
图8为本公开实施例提供的相机成像原理的示意图。
图9为本公开实施例提供的另一图像集的生成方法的流程示意图。
图10为本公开实施例提供的一种RainNet神经网络的框架示意图。
图11为本公开实施例提供的一种图像集的生成装置的结构示意图。
图12为本公开实施例提供的一种图像集的生成设备的结构示意图。
图13为本公开实施例提供的一种计算机可读存储介质的结构示意图。
具体实施方式
以下结合附图对本公开的具体实施方式进行详细说明。应当理解的是,此处所描述的具体实施方式仅用于说明和解释本公开,并不用于限制本公开。
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另作定义,本公开实施例使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
出于对生命或财产安全的考虑,某些地区是严禁行人或动物出没的,以避免形成安全隐患,因此,针对禁止人员或动物出现的禁区内进行异常生物检测是十分有必要的。
相关技术中,通过基于深度学习网络的目标检测算法来检测禁区中是否出现异常生物。其中,深度学习网络通常需要大量的数据集来训练生成,而对于禁区来说,其拍摄到的图像中大多数是没有异常生物出现 的,因此,即使上述目标检测算法具有较高的精度,但使用大量不存在异常生物的图像来训练深度学习网络,得到的训练后的深度学习网络的检测准确度无法满足检测要求。
基于此,相关技术中提出可以通过图像合成的方式来生成图像集,具体地,将包括异常生物的目标图像,和包括禁区环境的背景图像进行合成,以生成图像集来训练深度学习网络,提高深度学习网络的检测准确度。但上述图像合成的方式中存在目标图像比例不协调导致目标图像过于突兀、目标图像和背景图像的色度和/或亮度不一致导致图像不真实的问题。
为了解决上述技术问题中的至少一个,本公开实施例提供一种图像集的生成方法,上述图像集用于训练针对禁区内异常生物的检测模型。
图1为本公开实施例提供的一种图像集的生成方法的流程示意图,图像集中包括多个样本图像;如图1所示,图像集的生成方法包括根据以下步骤生成每一样本图像:
S1,根据预先获取的第一图像集,获取至少一个目标图像,目标图像是从第一图像集中的第一图像上的分割出的异常生物的图像,第一图像集包括多个第一图像。
需要说明的是,第一图像集是在现有图像集中选取的,且第一图像集中的每个第一图像上均为具有异常生物的图像。以及,这里的异常生物是指,禁止进入所述禁区的生物,例如,人或其他动物。
S2,将目标图像与背景图像进行合成,得到样本图像,其中,背景图像是对禁区进行拍摄得到的。
本公开实施例提供的图像集的生成方法,根据第一图像集获取目标图像,并将具有异常生物的目标图像与具有禁区环境的背景图像合成,得到样本图像。由于上述样本图像中存在异常生物,因此使用多个上述样本图像形成的图像集训练得到的检测模型,在对禁区异常生物进行检 测时能够提高检测准确性。
图2为本公开实施例提供的另一图像集的生成方法的流程示意图,在一些实施例中,如图2所示,步骤S1可以包括:
S11,对第一图像集中的每个第一图像,进行实例分割处理,得到与每个第一图像对应的目标图像集;目标图像集中包括至少一个目标图像。上述实例分割是指根据目标检测方法在图像中框出不同实例,并通过语义分割算法在不同实例区域内进行逐像素标记,以分割出至少一个目标。
由于在目标图像的生成过程中,对实例分割的精度具有一定要求,同时对实例分割的速度不作限定,因此,本公开实施例中采用分割精度较高的二阶段网络进行实例分割处理,例如Mask R-CNN实例分割网络。
可选地,步骤S11可以包括:将第一图像输入至经过预先训练的Mask R-CNN实例分割网络进行处理,得到目标图像集。由于第一图像中可以包括至少一个异常生物,因此在实例分割处理后,可以分割出至少一个目标图像,并形成目标图像集。
图3为本公开实施例提供的一种Mask R-CNN实例分割网络的框架示意图,如图3所示,将图像输入至感兴趣区域对齐(ROI Align)网络中通过“双线性插值”算法进行池化操作,得到特征图,根据特征图中感兴趣区域的大小和池化的程度将上述特征图分为多个候选框(class box),最后再进行卷积(conv)操作,以实现对输入图像的准确分割。Mask R-CNN实例分割网络的优势在于使用ROI Align操作即“双线性插值”算法,进而不再引入量化误差,以使输入图像中的像素和特征图像中的像素完全对齐没有偏差,提高检测精度。
图4a为本公开实施例提供的原始图像,图4b为对图4a进行实例分割处理后的图像。在一个示例中,如图4a、图4b所示,基于Mask R-CNN实例分割网络,利用实例分割技术可以将图像中的目标人员分割出来,同时可以区别不同的目标人员。
图5为本公开实施例提供的目标图像集的示意图,目标图像集中的每一目标图像可以以透明背景的png格式进行存储。需要说明的是,图5所示出的目标图像集中的多个目标图像可以是从一个第一图像中分割得到的,也可以是从多个第一图像中分割得到的,本公开实施例对此不作限定。
在一些实施例中,如图2所示,步骤S21之前,图像集的合成方法还可以包括:
S20,对背景图像进行语义分割处理,确定背景图像中的多个图像区域;将多个图像区域中的其中一个,作为目标区域;将目标区域中任一位置确定为待粘贴位置。
其中,目标图像与背景图像的合成过程可以简单看做,将目标图像粘贴至背景图像上;待粘贴位置是指目标图像与背景图像的合成过程中,目标图像在背景图像上的放置位置。针对于禁区中异常生物入侵检测的场景,需要对目标图像在背景图像上的放置位置即待粘贴位置进行合理选择,以保证样本图像的真实性。
具体地,利用语义分割技术对背景图像中的不同区域进行划分,并根据划分结果选择待粘贴区域。例如,对于湖面落水检测场景,仅在湖面区域粘贴目标图像;对于草坪践踏检测场景,仅在草坪区域粘贴目标图像。
本公开实施例中采用U-Net语义分割网络对背景图像进行处理,以保证分割出的图像区域的准确性。在一些实施例中,S20中对背景图像进行语义分割处理,确定背景图像中的多个图像区域,具体可以包括:将背景图像输入至经过预先训练的U-Net语义分割网络进行处理,得到多个图像区域。
图6为本公开实施例提供的一种U-Net语义分割网络的框架示意图,如图6所示,U-Net语义分割网络包括第一模块1、第二模块2和第三模 块3,其中第一模块1包括多个第一单元11,每个第一单元11包括多个第一网络11a和池化网络(Pooling)11b;第二模块包括多个第二单元12,每个第二单元12包括上采样网络(Upsampling)12a和多个第一网络11a,且每个第二单元12都有与其对应的第一单元11;第三模块3包括回归网络31(Softmax)。上述第一网络11a通过卷积(conv)和批归一化(batch normalization)操作同时结合激活函数(ReLU),将包含高维特征的低分辨率图像,在保留高维特征的同时变为高分辨率。
具体地,第一模块1中的第一个第一单元11的输入为原始图像,其他第一单元11的输入均为上一级第一单元11的输出图像,且每一第一单元11在通过连续的卷积和池化处理进行特征提取后,将特征图像输入至对应的第二单元12。第二模块2中的除第一个之外的第二单元12,其输入还包括上一级第二单元12处理后的特征图像,即除第一个之外的第二单元12将第一单元11输入的特征图像和上一级第二单元12输入的特征图像进行特征融合,然后结合激活函数进行上采样处理,并由最后一级第二单元12将处理后的特征图像输入至第三模块3。通过第三模块3中的回归网络31计算损失函数,在损失函数满足预设函数要求的情况下,输出最终的区域分割结果。
图7为对图4a进行语义分割处理后的图像。在一个示例中,如图4a、图7所示,基于U-Net语义分割网络,将图4a中的场景分别为不同的区域,分割结果如图7所示,其中,可以通过不同颜色表示人群、树木、草地、天空在图中所在的区域。
在一些实施例中,如图2所示,上述步骤S2可以包括步骤S21-步骤S23,其中:
S21,根据待粘贴位置的对应坐标、目标设备的设备参数、背景图像的图像尺寸和异常生物的预设尺寸,确定将目标图像合成至背景图像中的目标尺寸参数;待粘贴位置是目标图像与背景图像进行合成时在背景 图像中所处的位置。待粘贴位置的对应坐标是指,待粘贴位置在背景图像中的坐标。
其中,目标设备是对禁区进行拍摄以得到背景图像的设备。
上述目标尺寸参数的确定是出于以下几点考虑:首先,在目标设备以及拍摄角度确定的情况下,被拍摄物体位于拍摄环境中的不同位置,在成像画面中会形成相对于背景环境的不同比例。例如,在同一背景下,目标体离镜头距离较近,在成像画面中所占有的画面比例越高。其次,不同设备的设备参数不同,因此即使是在同一位置对相同的场景进行拍摄,所形成的画面效果也会有所差异。最后,目标尺寸参数是目标图像呈现在背景图像中的尺寸,因此为了保证图像的真实性,还需要考虑异常生物的预设尺寸和背景图像的图像尺寸,以最大程度的接近于真实的采集图像。
上述目标尺寸参数可以是目标图像合成至背景图像中的高度参数,也可以是目标图像合成至背景图像中的宽度参数,本公开实施例对此不作限定。
下面结合附图以目标尺寸参数为高度参数为例,说明目标尺寸参数的确定过程。
图8为本公开实施例提供的相机成像原理的示意图,结合上述分析,如图8所示,上述待粘贴位置的对应坐标O’、目标设备的设备参数、背景图像的图像尺寸image_h和异常生物的预设尺寸h均对目标尺寸参数有影响。其中,目标设备的设备参数至少包括:目标设备的安装高度H、目标设备的焦距f、目标设备的光轴与竖直方向的夹角θ。
需要说明的是,如图8所示,基于相机成像原理,当异常生物的位置为CD时,其经设置于O点的目标设备拍摄后,映射在图像中的位置为O’A。
在一些实施例中,步骤S21具体可以包括:
根据待粘贴位置的对应坐标和目标设备的焦距,确定第一角度α,具体可以通过公式1来表示。上述第一角度α是指目标设备所在位置O点和异常生物的底部位置C点的连线OC,与目标设备的光轴之间的夹角。
根据第一角度α、目标设备的安装高度H、目标设备的光轴与竖直方向的夹角θ和异常生物的预设尺寸h,确定第二角度β,具体可以通过公式2来表示。上述第二角度β是指第一连线与第二连线之间的夹角,第一连线为目标设备所在位置O点和异常生物的底部位置的连线OC,第二连线为O点和异常生物的顶部位置D点的连线OD。
另外,异常生物的预设尺寸是指,与异常生物的类型相关,且与异常生物的实际尺寸相近的尺寸(这里的尺寸具体可以是指高度)。其中,异常生物的预设尺寸是根据异常生物的类型确定的,同一种类型的异常生物的预设尺寸相同。例如,根据预先存储的预设尺寸与生物类型的映射关系表,确定异常生物的预设尺寸;所述映射关系表中可以存储有多种生物类型对应的预设尺寸,例如,当异常生物为行人时,可以将预设尺寸设置为1.6m或者1.75m等;当异常生物为狗时,可以将预设尺寸设置为0.3m或者0.5m等。
γ=90°-θ-α

CP=h·cos(90°-γ)         公式2
DP=CP·tan(90°-γ)
根据第一角度α、第二角度β和图像尺寸image_h,确定目标尺寸参数AB,具体可以通过公式3来表示。
需要说明的是,当目标尺寸参数为高度参数时图像尺寸为图像的高 度,当目标尺寸参数为宽度参数时图像尺寸为图像的宽度。
S22,根据目标尺寸参数对目标图像进行尺寸调整,得到调整后的目标图像。
在一些实施例中,步骤S22具体可以包括:
根据图像尺寸和目标尺寸参数,确定目标图像的调整比例;根据调整比例,分别对目标图像进行宽度调整和高度调整,得到调整后的目标图像。
在一个示例中,目标图像的图像尺寸为a×b,目标尺寸参数为c,则在目标尺寸参数为高度参数的情况下,调整比例为c/a,根据调整比例将目标图像的高度调整为c,宽度调整为c/a×b,得到调整后的目标图像;在目标尺寸参数为宽度参数的情况下,调整比例为c/b,根据调整比例将目标图像的宽度调整为c,高度调整为c/b×a,得到调整后的目标图像。
S23,将调整后的目标图像与背景图像进行图像合成,得到样本图像。
本公开实施例提供的图像集的生成方法,在预先获取的第一图像集中通过实例分割,得到包含有至少一个目标图像的目标图像集;由于背景图像是由摄像设备对禁区进行拍摄获取的,因此,基于成像原理,确定目标图像在背景图像上的目标尺寸参数,并根据目标尺寸参数对目标图像的图像尺寸进行调整,以提高目标图像与背景图像合成后的样本图像的真实性。
图9为本公开实施例提供的另一图像集的生成方法的流程示意图,在一些实施例中,如图9所示,步骤S23具体可以包括步骤S231-步骤S233。
S231,根据异常生物的类型,确定调整后的目标图像中的校准位置。
在一个示例中,禁区中禁止人员通行,则异常生物的类型为人类, 则调整后的目标图像中的校准位置为图像中人的脚的位置。
S232,将调整后的目标图像粘贴至背景图像中,得到第一图像,第一图像中,校准位置与背景图像的待粘贴位置对正。
S233,对第一图像进行色彩调整,得到样本图像,色彩调整包括亮度调整和/或色度调整。
将调整后的目标图像粘贴到背景图像上之后,由于光照等不同,粘贴后所形成的第一图像的色彩并不协调,因此,以背景图像的光照作为基准,采用色彩神经网络对第一图像进行色彩调整。
在一些实施例中,步骤S233具体可以包括:
将第一图像中位于目标图像所在区域之外的区域设置为第一预设颜色,得到第二图像。将第一图像和第二图像输入至色彩神经网络,以对第一图像进行色彩调整,得到样本图像。在一个示例中,第一预设颜色为黑色,即将第一图像中位于目标图像所在区域之外的区域的像素设置为0。
具体地,色彩神经网络可以采用RainNet神经网络,以背景图像为基准,对粘贴的目标图像进行风格迁移,使目标图像与背景图像更加融合。图10为本公开实施例提供的一种RainNet神经网络的框架示意图,如图10所示,RainNet神经网络包括第一卷积模块4、第二卷积模块5、第三卷积模块6和反卷积模块7。其中,第一卷积模块4包括一个卷积网络41;反卷积模块7包括一个反卷积网络71;第二卷积模块5中包括多个基于修正激活函数(LReLU)的第二卷积网络51;第三卷积模块6包括多个级联的卷积单元,具体包括多个第一卷积单元61和多个第二卷积单元62和一个第三卷积单元63,其中第一卷积单元61包括一个第二卷积网络51和一个基于激活函数的反卷积网络61a;第二卷积单元62包括一个第二卷积网络51、一个基于激活函数的反卷积网络61a和一个注意力自制网络62a;第三卷积单元63包括一个基于激活函数的反卷积 网络61a、一个卷积网络41和一个注意力自制网络62a。
具体地,RainNet神经网络中通过第一卷积模块4和第二卷积模块5对第一图像Ic进行多层卷积处理,提取高维特征并输入至第三卷积模块6,第三卷积模块6将每一级第二卷积网络51输入的第一图像和与其同分辨率的第二图像M共同作为输入,Ic×(1-M)得到Ic中背景区域,Ic×M得到Ic中前景区域,经过反卷积模块7之后得到统计风格参数γi和βi,将生成的γi和βi相乘,并以通道方式添加到归一化前景特征,得到样本图像I^,以实现色彩平衡,使得样本图像I^中的画面内容更加协调,提高样本图像的真实性。
图11为本公开实施例提供的一种图像集的生成装置的结构示意图,该装置用于执行上述图像集的生成方法。如图11所示,图像集的生成装置包括:获取模块10和处理模块20。
其中,获取模块10被配置为根据预先获取的第一图像集,获取至少一个目标图像,所述目标图像是从所述第一图像集中的第一图像上的分割出的所述异常生物的图像,所述第一图像集包括多个第一图像。
处理模块20被配置为将所述目标图像与背景图像进行合成,得到所述样本图像,其中,所述背景图像是对所述禁区进行拍摄得到的。
其中,各模块的功能参见上述图像集的生成方法中的描述,这里不再赘述。
图12为本公开实施例提供的一种图像集的生成设备的结构示意图,如图12所示,电子设备100包括:存储器101和处理器102,存储器101上存储有计算机程序,其中,计算机程序被处理器102执行时实现上述的图像集的生成方法,例如实现图1中的步骤S1至S2。
电子设备100可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。电子设备100可包括,但不仅限于,处理器102和存储器101。本领域技术人员可以理解,图12仅仅是电子设备100的示例, 并不构成对电子设备100的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述电子设备100还可以包括输入输出设备、网络接入设备、总线等。
处理器102可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器102可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器101可以是电子设备100的内部存储单元,例如电子设备100的硬盘或内存。所述存储器101也可以是所述电子设备100的外部存储设备,例如所述电子设备100上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器101还可以既包括所述电子设备100的内部存储单元也包括外部存储设备。所存储器101用于存储所述计算机程序以及所述终端设备所需的其他程序和数据。存储器101还可以用于暂时地存储已经输出或者将要输出的数据。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、 模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
图13为本公开实施例提供的一种计算机可读存储介质的结构示意图,如图13所示,计算机可读存储介质200上存储有计算机程序201,其中,计算机程序201被处理器执行时实现上述图像集的生成方法,例如实现图1中的步骤S1至步骤S2。计算机可读存储介质200包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
可以理解的是,以上实施方式仅仅是为了说明本公开的原理而采用的示例性实施方式,然而本公开并不局限于此。对于本领域内的普通技术人员而言,在不脱离本公开的精神和实质的情况下,可以做出各种变型和改进,这些变型和改进也视为本公开的保护范围。

Claims (13)

  1. 一种图像集的生成方法,其特征在于,所述图像集用于训练针对禁区内异常生物的检测模型,所述图像集包括多个样本图像;所述方法包括根据以下步骤生成每一样本图像:
    根据预先获取的第一图像集,获取至少一个目标图像,所述目标图像是从所述第一图像集中的第一图像上的分割出的所述异常生物的图像;
    将所述目标图像与背景图像进行合成,得到所述样本图像;
    其中,所述第一图像集包括多个第一图像,所述背景图像是对所述禁区进行拍摄得到的。
  2. 根据权利要求1所述的方法,其特征在于,根据预先获取的第一图像集,获取至少一个目标图像,包括:
    对所述第一图像集中的每个第一图像,进行实例分割处理,得到与每个所述第一图像对应的目标图像集;所述目标图像集中包括至少一个所述目标图像。
  3. 根据权利要求2所述的方法,其特征在于,所述对所述第一图像集中的每个第一图像,进行实例分割处理,得到与每个所述第一图像对应的目标图像集,包括:
    将所述第一图像输入至Mask R-CNN实例分割网络进行处理,得到所述目标图像集。
  4. 根据权利要求1所述的方法,其特征在于,所述背景图像是使用目标设备对所述禁区进行拍摄得到的,
    所述将所述目标图像与背景图像进行合成,得到所述样本图像,包括:
    根据待粘贴位置的对应坐标、所述目标设备的设备参数、所述背景图像的图像尺寸和所述异常生物的预设尺寸,确定将所述目标图像合成至所述背景图像中的目标尺寸参数;所述待粘贴位置是所述目标图像与所述背景图像进行合成时在所述背景图像中所处的位置;
    根据所述目标尺寸参数对所述目标图像进行尺寸调整,得到调整后的目标图像;
    将所述调整后的目标图像与所述背景图像进行图像合成,得到所述样本图像。
  5. 根据权利要求4所述的方法,其特征在于,确定所述目标尺寸参数之前,所述方法还包括:
    对所述背景图像进行语义分割处理,确定所述背景图像中的多个图像区域;将所述多个图像区域中的其中一个,作为目标区域;
    将所述目标区域中任一位置确定为所述待粘贴位置。
  6. 根据权利要求5所述的方法,其特征在于,所述对所述背景图像进行语义分割处理,确定所述背景图像中的多个图像区域,包括:
    将所述背景图像输入至U-Net语义分割网络进行处理,得到多个所述图像区域。
  7. 根据权利要求4所述的方法,其特征在于,所述目标设备的设备参数至少包括:所述目标设备的安装高度、所述目标设备的焦距、所述目标设备的光轴与竖直方向的夹角,
    所述根据所述目标设备的设备参数、所述背景图像的图像尺寸和所述异常生物的预设尺寸,确定将所述目标图像合成至所述背景图像中的目标尺寸参数,包括:
    根据待粘贴位置的对应坐标和所述目标设备的焦距,确定第一角度;所述第一角度为所述目标设备所在位置和所述异常生物的底部位置之间的连线,与所述目标设备的光轴之间的夹角;
    根据所述第一角度、所述目标设备的安装高度、所述目标设备的光轴与竖直方向的夹角和所述异常生物的预设尺寸,确定第二角度;其中,所述异常生物的预设尺寸是根据所述异常生物的类型确定的;所述第二角度为第一连线与第二连线之间的夹角,所述第一连线为所述目标设备所在位置和所述异常生物的底部位置之间的连线,所述第二连线为所述目标设备所在位置和所述异常生物的顶部位置之间的连线;
    根据所述第一角度、所述第二角度和所述图像尺寸,确定所述目标尺寸参数。
  8. 根据权利要求4所述的方法,其特征在于,所述根据所述尺寸参数对所述目标图像进行尺寸调整,得到调整后的目标图像,包括:
    根据所述图像尺寸和所述目标尺寸参数,确定所述目标图像的调整比例;
    根据所述调整比例,分别对所述目标图像进行宽度调整和高度调整,得到所述调整后的目标图像。
  9. 根据权利要求4所述的方法,其特征在于,所述将所述调整后的目标图像与所述背景图像进行图像合成,得到所述样本图像,包括:
    根据所述异常生物的类型,确定所述调整后的目标图像中的校准位置;
    将所述调整后的目标图像粘贴至所述背景图像中,得到第一图像,所述第一图像中,所述校准位置与所述背景图像的待粘贴位置对正;
    对所述第一图像进行色彩调整,得到所述样本图像,所述色彩调 整包括亮度调整和/或色度调整。
  10. 根据权利要求9所述的方法,其特征在于,所述对所述第一图像进行色彩调整,得到所述样本图像,包括:
    将所述第一图像中位于所述目标图像所在区域之外的区域设置为第一预设颜色,得到第二图像;
    将所述第一图像和所述第二图像输入至色彩神经网络,以对所述第一图像进行色彩调整,得到所述样本图像。
  11. 一种图像集的生成装置,其特征在于,所述图像集用于训练针对禁区异常生物的检测模型,所述图像集包括多个样本图像;所述装置包括:
    获取模块,被配置为根据预先获取的第一图像集,获取至少一个目标图像,所述目标图像是从所述第一图像集中的第一图像上的分割出的所述异常生物的图像,所述第一图像集包括多个第一图像;
    处理模块,被配置为将所述目标图像与背景图像进行合成,得到所述样本图像,其中,所述背景图像是对所述禁区进行拍摄得到的。
  12. 一种图像集的生成设备,包括存储器和处理器,所述存储器上存储有计算机程序,其特征在于,所述计算机程序被所述处理器执行时实现权利要求1至10中任意一项所述的方法。
  13. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至10中任意一项所述的方法。
PCT/CN2023/110271 2022-08-23 2023-07-31 图像集的生成方法、装置、设备和计算机可读存储介质 WO2024041318A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211017275.7A CN115359319A (zh) 2022-08-23 2022-08-23 图像集的生成方法、装置、设备和计算机可读存储介质
CN202211017275.7 2022-08-23

Publications (1)

Publication Number Publication Date
WO2024041318A1 true WO2024041318A1 (zh) 2024-02-29

Family

ID=84002799

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110271 WO2024041318A1 (zh) 2022-08-23 2023-07-31 图像集的生成方法、装置、设备和计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN115359319A (zh)
WO (1) WO2024041318A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359319A (zh) * 2022-08-23 2022-11-18 京东方科技集团股份有限公司 图像集的生成方法、装置、设备和计算机可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112053366A (zh) * 2019-06-06 2020-12-08 阿里巴巴集团控股有限公司 模型训练、样本生成方法、电子设备及存储介质
US20210118112A1 (en) * 2019-08-22 2021-04-22 Beijing Sensetime Technology Development Co., Ltd. Image processing method and device, and storage medium
CN113160231A (zh) * 2021-03-29 2021-07-23 深圳市优必选科技股份有限公司 一种样本生成方法、样本生成装置及电子设备
CN113449538A (zh) * 2020-03-24 2021-09-28 顺丰科技有限公司 视觉模型的训练方法、装置、设备及存储介质
CN113537209A (zh) * 2021-06-02 2021-10-22 浙江吉利控股集团有限公司 一种图像处理方法、装置、设备及计算机可读存储介质
CN114581728A (zh) * 2022-02-22 2022-06-03 中国人民解放军军事科学院国防科技创新研究院 训练图像集生成方法、装置及设备
CN115359319A (zh) * 2022-08-23 2022-11-18 京东方科技集团股份有限公司 图像集的生成方法、装置、设备和计算机可读存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112053366A (zh) * 2019-06-06 2020-12-08 阿里巴巴集团控股有限公司 模型训练、样本生成方法、电子设备及存储介质
US20210118112A1 (en) * 2019-08-22 2021-04-22 Beijing Sensetime Technology Development Co., Ltd. Image processing method and device, and storage medium
CN113449538A (zh) * 2020-03-24 2021-09-28 顺丰科技有限公司 视觉模型的训练方法、装置、设备及存储介质
CN113160231A (zh) * 2021-03-29 2021-07-23 深圳市优必选科技股份有限公司 一种样本生成方法、样本生成装置及电子设备
CN113537209A (zh) * 2021-06-02 2021-10-22 浙江吉利控股集团有限公司 一种图像处理方法、装置、设备及计算机可读存储介质
CN114581728A (zh) * 2022-02-22 2022-06-03 中国人民解放军军事科学院国防科技创新研究院 训练图像集生成方法、装置及设备
CN115359319A (zh) * 2022-08-23 2022-11-18 京东方科技集团股份有限公司 图像集的生成方法、装置、设备和计算机可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JINTAO SHI, LI ZHE; GU CHAOYUE; SHENG GEHAO; JIANG XIUCHEN: "Research on Foreign Matter Monitoring of Power Grid with Faster R-CNN Based on Sample Expansion", POWER SYSTEM TECHNOLOGY, POWER SYSTEM TECHNOLOGY, vol. 44, no. 1, 10 June 2019 (2019-06-10), pages 44 - 51, XP093142314, DOI: 10.13335/j.1000-3673.pst.2019.0433 *

Also Published As

Publication number Publication date
CN115359319A (zh) 2022-11-18

Similar Documents

Publication Publication Date Title
EP3454250B1 (en) Facial image processing method and apparatus and storage medium
US11037278B2 (en) Systems and methods for transforming raw sensor data captured in low-light conditions to well-exposed images using neural network architectures
WO2019227615A1 (zh) 校正发票图像的方法、装置、计算机设备和存储介质
US9355302B2 (en) Method and electronic equipment for identifying facial features
WO2019085792A1 (en) Image processing method and device, readable storage medium and electronic device
JP6688277B2 (ja) プログラム、学習処理方法、学習モデル、データ構造、学習装置、および物体認識装置
WO2024041318A1 (zh) 图像集的生成方法、装置、设备和计算机可读存储介质
KR102383129B1 (ko) 이미지에 포함된 오브젝트의 카테고리 및 인식률에 기반하여 이미지를 보정하는 방법 및 이를 구현한 전자 장치
CN112215255A (zh) 一种目标检测模型的训练方法、目标检测方法及终端设备
WO2021027692A1 (zh) 视觉特征库的构建方法、视觉定位方法、装置和存储介质
WO2022237153A1 (zh) 目标检测方法及其模型训练方法、相关装置、介质及程序产品
WO2021184302A1 (zh) 图像处理方法、装置、成像设备、可移动载体及存储介质
WO2022133382A1 (en) Semantic refinement of image regions
CN108717530A (zh) 图像处理方法、装置、计算机可读存储介质和电子设备
CN110400278A (zh) 一种图像颜色和几何畸变的全自动校正方法、装置及设备
US11810256B2 (en) Image modification techniques
WO2022206517A1 (zh) 一种目标检测方法及装置
CN110298829A (zh) 一种舌诊方法、装置、系统、计算机设备和存储介质
WO2022165722A1 (zh) 单目深度估计方法、装置及设备
US12015835B2 (en) Multi-sensor imaging color correction
US20210312200A1 (en) Systems and methods for video surveillance
AU2020294259B2 (en) Object association method, apparatus and system, electronic device, storage medium and computer program
US11797854B2 (en) Image processing device, image processing method and object recognition system
WO2021147316A1 (zh) 物体识别方法及装置
CN112258435A (zh) 图像处理方法和相关产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23856408

Country of ref document: EP

Kind code of ref document: A1