WO2022021287A1 - Data enhancement method and training method for instance segmentation model, and related apparatus - Google Patents

Data enhancement method and training method for instance segmentation model, and related apparatus Download PDF

Info

Publication number
WO2022021287A1
WO2022021287A1 PCT/CN2020/106112 CN2020106112W WO2022021287A1 WO 2022021287 A1 WO2022021287 A1 WO 2022021287A1 CN 2020106112 W CN2020106112 W CN 2020106112W WO 2022021287 A1 WO2022021287 A1 WO 2022021287A1
Authority
WO
WIPO (PCT)
Prior art keywords
instance
candidate
image
neighborhood
training
Prior art date
Application number
PCT/CN2020/106112
Other languages
French (fr)
Chinese (zh)
Inventor
张昕
胡杰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202080006082.4A priority Critical patent/CN114375460A/en
Priority to PCT/CN2020/106112 priority patent/WO2022021287A1/en
Publication of WO2022021287A1 publication Critical patent/WO2022021287A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present application relates to the field of computer vision, and, more particularly, to data augmentation methods, training methods and related apparatuses for instance segmentation models.
  • the task of instance segmentation is the current research direction of deep learning in the field of computer vision. mask to display the segmented examples.
  • Instance segmentation has been widely used in various tasks in the field of computer vision, such as automatic driving, robot control and other tasks.
  • instance segmentation it is found that the current instance segmentation methods always have the problem of poor instance segmentation accuracy.
  • a solution that is, construct new training samples by using the existing dataset of instance segmentation model, so as to expand the existing dataset quantitatively and qualitatively, and combine the new dataset with the new dataset. For instance segmentation model training.
  • the contour information of the instance in the image of the original data set is obtained, and the instance is obtained by cutting out the image based on the contour information; then the instance is pasted into the image with the The location of other regions with the highest pixel similarity of the instance, and the image inpainting technique is used to fill the background of the image content in the original location of the instance in the image, so as to obtain a new image, which can be used as an enhanced image to expand
  • the original data set is used to obtain a new data set; finally, the new data set is used to train the instance segmentation model, which can improve the segmentation accuracy of the instance segmentation model.
  • the present application provides a data enhancement method, a training method and a related device for an instance segmentation model, which can improve the segmentation accuracy of the instance segmentation model.
  • the present application provides a data enhancement method for an instance segmentation model.
  • the enhancement method includes: acquiring a first image; performing multiple affine transformations on the first instance in the first image to obtain multiple candidate instances, and the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations ; select the best candidate instance among the plurality of candidate instances, the best candidate instance has the largest grayscale difference among the plurality of candidate instances, the grayscale of each candidate instance in the plurality of candidate instances
  • the degree difference is the grayscale difference between each candidate instance and its neighborhood in the first image; adding the best candidate instance to the first image to A second image after the enhancement of the first image is obtained.
  • the one with the largest grayscale difference from the designated neighborhood in the first image is selected from a plurality of candidate instances of the first instance in the first image.
  • the best candidate instance and add the best candidate instance to the first image.
  • the contour information of the best candidate instance can be considered as very reasonable contour information
  • the second annotation information of the second image is obtained based on the reasonable contour information
  • the second annotation information is obtained based on the second image and the second annotation information.
  • the method may further include: acquiring first label information of the first image, where the first label information includes the first contour information of the first instance; and acquiring, according to the candidate instance corresponding to the maximum grayscale difference, Second label information of the second image, where the second label information includes second contour information of the candidate instance corresponding to the maximum grayscale difference.
  • the grayscale difference between the each candidate instance and the neighborhood of each candidate instance is based on the difference between the each candidate instance and each candidate instance.
  • the pixel value variance of the neighborhood is determined, wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:
  • u 0 represents the average gray level of each candidate instance
  • w 0 represents that the number of pixels of each candidate instance accounts for the total number of pixels in the neighborhood of each candidate instance and each candidate instance ratio
  • u 1 represents the average gray level of the neighborhood of each candidate instance
  • w 1 represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels
  • u represents the each candidate instance The average gray level of the instance and the neighborhood of each candidate instance
  • g represents the pixel value variance.
  • the method further includes: performing contrast enhancement processing on a first region of interest in the second image to obtain a third image, where the first region of interest includes a second instance within the second image and a neighborhood of the second instance.
  • the inter-class variance of the local image formed by the instance and the neighborhood can be larger, and the local image can be achieved more obvious effect. This makes the instance segmentation model trained based on the locally adaptive contrast-enhanced third image more robust and more resistant to interference.
  • the first region of interest is a partial region of the second image.
  • the second image may include multiple different regions of interest, different regions of interest may contain different instances, and each region of interest is a partial local area of the second image. Contrast enhancement processing is performed separately for different regions of interest.
  • An example of the neighborhood of the second instance is the bounding box neighborhood of the second instance.
  • the fact that the first region of interest includes the neighborhood of the second instance can be understood as the fact that the first region of interest is larger than the neighborhood of the second instance, and the degree of the larger degree can be preset.
  • the first region of interest may be 10 pixels larger in length and width than the neighborhood of the second example.
  • the method further includes: training an instance segmentation model according to the third image.
  • the first image is one of the first data sets.
  • the method further includes: training an instance segmentation model according to the fourth image in the first data set, and the training is performed simultaneously with the aforementioned process of obtaining a second image based on the first image; using the second image or The third image trains the instance segmentation model.
  • image processing and model training are performed at the same time, so that when the first data set changes, a processed image can still be obtained based on the latest first data set, and the instance segmentation model can be segmented based on the processed image. to train. That is to say, this implementation can improve the performance of the instance segmentation model without adding additional delay.
  • the present application provides a training method for an instance segmentation model.
  • the method includes: acquiring a first data set, the first data set including multiple images; while training an instance segmentation model based on the images in the multiple images, performing training on the first instance in the first image at the same time.
  • a variety of affine transformations to obtain multiple candidate instances are in one-to-one correspondence with the multiple affine transformations; the best candidate instance is selected from the multiple candidate instances, the best candidate instance Has the largest grayscale difference among the plurality of candidate instances, and the grayscale difference of each candidate instance in the plurality of candidate instances is the difference between the each candidate instance and the each candidate instance in the first The grayscale difference between neighborhoods in the image; adding the best candidate instance to the first image to obtain a second image enhanced for the first image; pairing according to the second image
  • the instance segmentation model is trained.
  • this training method image processing and model training are performed at the same time, so that when the first data set changes, a processed image can still be obtained based on the latest first data set, and the instance segmentation model can be segmented based on the processed image. to train. That is to say, this implementation can improve the performance of the instance segmentation model without adding additional delay.
  • the grayscale difference between each candidate instance and a neighborhood of each candidate instance is based on the difference between the each candidate instance and each candidate instance.
  • the pixel value variance of the neighborhood is determined, wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:
  • u 0 represents the average gray level of each candidate instance
  • w 0 represents that the number of pixels of each candidate instance accounts for the total number of pixels in the neighborhood of each candidate instance and each candidate instance ratio
  • u 1 represents the average gray level of the neighborhood of each candidate instance
  • w 1 represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels
  • u represents the each candidate instance The average gray level of the instance and the neighborhood of each candidate instance
  • g represents the pixel value variance.
  • the training of the instance segmentation model according to the second image includes: performing contrast enhancement processing on the first region of interest in the second image, to obtain A third image, the first region of interest includes a second instance in the second image and a neighborhood of the second instance; the instance segmentation model is trained according to the third image.
  • the present application provides an apparatus for data enhancement of an instance segmentation model.
  • the apparatus includes a module for executing the method in the first aspect or any one of the implementation manners.
  • the apparatus includes: an acquisition module for acquiring the first image.
  • a processing module configured to: perform multiple affine transformations on the first instance in the first image to obtain multiple candidate instances, and the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations;
  • the best candidate instance is selected from the candidate instances, the best candidate instance has the largest grayscale difference among the plurality of candidate instances, and the grayscale difference of each candidate instance in the plurality of candidate instances is the The grayscale difference between each candidate instance and the neighborhood of each candidate instance in the first image; adding the best candidate instance to the first image to obtain a A second image after image enhancement.
  • the grayscale difference between the neighborhood of each candidate instance and each candidate instance is determined according to the pixel value variance of the neighborhood of each candidate instance and each candidate instance , wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:
  • u 0 represents the average gray level of each candidate instance
  • w 0 represents that the number of pixels of each candidate instance accounts for the total number of pixels in the neighborhood of each candidate instance and each candidate instance.
  • ratio u 1 represents the average gray level of the neighborhood of each candidate instance
  • w 1 represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels
  • u represents the each candidate instance The average gray level of the instance and the neighborhood of each candidate instance
  • g represents the pixel value variance.
  • the processing module is further configured to: perform contrast enhancement processing on the first region of interest in the second image to obtain a third image, where the first region of interest includes the second image. A second instance of and a neighborhood of said second instance.
  • the apparatus further includes a training module configured to train an instance segmentation model according to the third image.
  • the first image is one of the first data sets.
  • the device further includes a training module for: training the instance segmentation model according to the fourth image in the first data set, and the training and the aforementioned processing module obtain the second image based on the first image. Processing proceeds concurrently; the instance segmentation model is trained using the second or third image.
  • the present application provides an apparatus for training an instance segmentation model, where the apparatus includes a module for executing the method in the second aspect or any one of the implementation manners.
  • the training device includes: an acquisition module for acquiring a first data set, where the first data set includes a plurality of images; a training module for training an instance segmentation model based on the images in the plurality of images A processing module for performing multiple affine transformations on the first instance in the first image to obtain multiple candidate instances while the training module trains the instance segmentation model based on the images in the multiple images , the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations; the best candidate instance is selected among the multiple candidate instances, and the best candidate instance has the largest number of candidate instances among the multiple candidate instances.
  • the grayscale difference, the grayscale difference of each candidate instance in the plurality of candidate instances is the grayscale difference between the each candidate instance and the neighborhood of each candidate instance in the first image ; adding the best candidate instance to the first image to obtain a second image enhanced by the first image.
  • the grayscale difference between the neighborhood of each candidate instance and each candidate instance is determined according to the pixel value variance of the neighborhood of each candidate instance and each candidate instance , wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:
  • u 0 represents the average gray level of each candidate instance
  • w 0 represents that the number of pixels of each candidate instance accounts for the total number of pixels in the neighborhood of each candidate instance and each candidate instance ratio
  • u 1 represents the average gray level of the neighborhood of each candidate implementation
  • w 1 represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels
  • u represents the each candidate The average gray level of the instance and the neighborhood of each candidate instance
  • g represents the pixel value variance.
  • the training module is specifically configured to: perform contrast enhancement processing on the first region of interest in the second image to obtain a third image, where the first region of interest includes the second image.
  • the second instance of and the neighborhood of the second instance; the instance segmentation model is trained according to the third image.
  • the present application provides a data enhancement device for an instance segmentation model, the device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is When executed, the processor is configured to execute the method in the first aspect or any one of the implementation manners.
  • the present application provides an apparatus for training an instance segmentation model, the apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed , the processor is configured to execute the method in the second aspect or any one of the implementation manners.
  • a computer-readable medium stores program code for device execution, where the program code is used to execute the method in the first aspect or any one of the implementation manners thereof.
  • a computer-readable medium stores program codes for device execution, where the program codes are used to execute the method in the second aspect or any one of the implementation manners thereof.
  • a computer program product comprising instructions, when the computer program product is run on a computer, the computer program product causes the computer to execute the method in the first aspect or any one of the implementation manners.
  • a tenth aspect provides a computer program product comprising instructions, which when the computer program product is run on a computer, causes the computer to execute the method of the second aspect or any one of the implementation manners.
  • a chip in an eleventh aspect, includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes the first aspect or any one of the implementation manners described above. Methods.
  • the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in the first aspect or any one of the implementation manners thereof.
  • a twelfth aspect provides a chip, the chip includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes the second aspect or any one of the implementation manners. Methods.
  • the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in the second aspect or any one of the implementations.
  • a thirteenth aspect provides a computing device, the computing device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed, the The processor is configured to execute the method in the first aspect or any one of the implementation manners.
  • a fourteenth aspect provides a computing device, the computing device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed, the The processor is configured to execute the method in the second aspect or any one of the implementation manners.
  • the present application provides an instance segmentation method, the method comprising: performing instance segmentation on an image by using the instance segmentation model trained in the first aspect or the second aspect.
  • the present application provides an instance segmentation apparatus, the apparatus including a module for performing the method in the fifteenth aspect or any one of the implementation manners.
  • the present application provides an instance partitioning device, the device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed,
  • the processor is configured to execute the method in the fifteenth aspect or any one of the implementation manners.
  • a computer-readable medium stores program code for execution by a device, where the program code is used for executing the method in the fifteenth aspect or any one of the implementation manners thereof.
  • a nineteenth aspect provides a computer program product containing instructions, which when the computer program product is run on a computer, causes the computer to execute the method in the fifteenth aspect or any one of the implementation manners.
  • a twentieth aspect provides a chip, the chip includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes the fifteenth aspect or any one of the implementations thereof method in .
  • the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in the fifteenth aspect or any one of the implementation manners thereof.
  • a twenty-first aspect provides a computing device, the computing device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed, the The processor is configured to execute the method in the fifteenth aspect or any one of the implementation manners.
  • FIG. 1 is a schematic diagram of a related concept of an embodiment of the present application.
  • FIG. 2 is a schematic scene diagram of an instance segmentation model to which an embodiment of the present application can be applied.
  • FIG. 3 is a schematic architecture diagram of a system to which the methods of various embodiments of the present application can be applied.
  • FIG. 4 is a schematic flowchart of a data enhancement method according to an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a data enhancement method according to another embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a data enhancement apparatus according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an apparatus according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a computer program product according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a region of interest in an embodiment of the present application.
  • FIG. 1 the picture contains instances such as 1 person, 2 sheep, and 1 dog. It should be understood that FIG. 1 is only an example and not a limitation.
  • image classification refers to the classification of an image to which an instance belongs.
  • image classification is to get (or output) which categories are contained in a given picture.
  • the output of the image classification task is to label the classification in the picture: person, sheep, dog.
  • target detection is simply to find out what targets are in the picture and the positions of these targets (for example, frame the target with a rectangular frame, which can be called a detection frame).
  • the output of the target detection task is to label the bounding boxes of 1 person, 2 sheep, and 1 dog in the picture (the rectangular box in the upper right corner of Figure 1).
  • semantic segmentation refers to the need to distinguish every pixel in the image, not just frame the target with a rectangular frame, but different instances of the same object do not need to be segmented separately.
  • the output of the semantic segmentation task is to label people, sheep, and dogs in the picture, but it is not necessary to label sheep 1 and 2.
  • Semantic segmentation is also object segmentation in the usual sense.
  • instance segmentation refers to the combination of object detection and semantic segmentation. Compared with the bounding box of target detection, instance segmentation can be accurate to the edge of the object. Compared with semantic segmentation, instance segmentation needs to label different instances of the same object on the map. For example, in the example in Figure 1, there is 1 instance of human, 2 instances of sheep, and 1 instance of dog. The instance segmentation task is to label all these instances.
  • the predicted result of instance segmentation can be called segmentation mask.
  • the segmentation mask quality can characterize the quality of the prediction results of instance segmentation.
  • FIG. 2 is a frame diagram of an exemplary application scenario of the instance segmentation model of the present application.
  • the exemplary application scenario is the smooth connection call service of the mobile phone.
  • the smooth connection call service is implemented by the user layer, the application layer and the computing layer. It should be understood that the following embodiments are introduced using a mobile phone as an application scenario. In fact, the solution is not limited to mobile phones, and can also be applied to other types of electronic devices such as computers, servers, or wearable devices.
  • the user layer may include a Changlian call interface, through which the user of the mobile phone can access the Changlian call service;
  • the application layer can provide users with basic call services and special services through the Changlian call interface, and the basic call service can It includes services such as logging in to an account, initiating a call, ending a call, and/or switching between front and rear cameras.
  • Featured services can include services such as skin beautifying effects, dark light HD, time-space transformation, and protagonist locking;
  • protagonist locking means that the protagonist portrait specified by the user is retained, and other portraits and backgrounds are removed. At this time, only the pixels of the protagonist portrait can be retained in the video.
  • Space-time transformation is the operation of retaining the portrait part in the video and replacing the background during the video call, so as to achieve the effect of space-time transformation.
  • the instance segmentation model of the present application can be applied to the application of time-space transformation and the application of protagonist locking in the mobile phone service smooth communication. These two applications rely on the instance segmentation results of the high-precision portrait instance segmentation algorithm, especially when multiple portraits are occluded interactively or the portrait is occluded by other objects, the instance segmentation accuracy of the instance segmentation algorithm will be higher.
  • FIG. 3 is an exemplary structural diagram of a system architecture 300 to which the instance segmentation model of the embodiment of the present application can be applied.
  • a data collection device 360 is used to collect training data.
  • the training data may include the training image and the contour information of the instance in the training image, wherein the contour information of the instance in the training image may be the result of manual pre-marking, and the contour information may be called Annotation information for training images.
  • the data collection device 360 After collecting the training data, the data collection device 360 stores the training data in the database 330 , and the training device 320 trains the target model 301 based on the training data maintained in the database 330 , and the target model 301 may be an instance segmentation model.
  • the target model in this application can also be replaced with target rules.
  • the data processing device 370 may further process the training data to improve the performance of the target model 301 .
  • the data processing device 370 can perform data enhancement on the training images in the database 330 to expand the training images in the database 330, so that the training device 320 can train an instance segmentation model with higher segmentation accuracy based on the expanded database 330.
  • the training device 320 obtains the target model 301 based on the training data.
  • the training device 320 performs instance segmentation on the input original image, compares the instance contour result obtained by segmentation with the label information of the original image, and adjusts the parameters of the target model 301 according to the comparison result, Until the difference between the contour information output by the training device 320 and the label information of the original image is less than a certain threshold, the training of the target model 301 is completed.
  • the training data maintained in the database 330 are not necessarily all collected by the data collection device 360, and may also be received from other devices.
  • the training device 320 does not necessarily have to completely train the target model 301 based on the training data maintained by the database 330, and can also obtain training data from the cloud or other places for model training.
  • the above description should not be taken as a limitation to the embodiments of the present application.
  • the training device 320 and the data processing device 370 may be the same device.
  • the target model 301 trained according to the training device 320 can be applied to different systems or devices, such as being applied to the execution device 310 shown in FIG. 3 .
  • the execution device 310 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (AR)/virtual reality (VR), a vehicle-mounted terminal, etc., or a server with limited resources or Resource-constrained cloud devices, etc.
  • an example of the execution device 310 may be a cell phone including the exemplary structure shown in FIG. 2 .
  • the execution device 310 configures an input/output (I/O) interface 312 for data interaction with external devices, and a user can input data to the I/O interface 312 through the client device 340 .
  • the input data may include: images collected by the camera of the client device 340 .
  • the execution device 310 and the client device 340 may be the same device.
  • the execution device 310 When the execution device 310 preprocesses the input data, or the calculation module 311 of the execution device 310 performs calculations and other related processing, the execution device 310 can call the data, codes, etc. in the data storage system 350 for corresponding processing , the data and instructions obtained by corresponding processing may also be stored in the data storage system 350 .
  • the I/O interface 312 returns the processing result, taking instance segmentation as an example, and returns the instance segmentation result of the image to be segmented to the client device 340 so as to be provided to the user.
  • the user can manually give input data, and the manual setting can be operated through the interface provided by the I/O interface 312 .
  • the client device 340 can automatically send the input data to the I/O interface 312 . If the user's authorization is required to request the client device 340 to automatically send the input data, the user can set the corresponding permission in the client device 340 .
  • the user can view the result output by the execution device 310 on the client device 340, and the specific presentation form can be a specific manner such as display, sound, and action.
  • the client device 340 can also be used as a data collection terminal to collect the input data of the input I/O interface 312 and the output result of the output I/O interface 312 as new sample data as shown in the figure, and store them in the database 330 .
  • FIG. 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 350 is an external memory relative to the execution device 310 , and in other cases, the data storage system 350 may also be placed in the execution device 310 .
  • FIG. 4 is an exemplary flowchart of a data augmentation method for an instance segmentation model according to an embodiment of the present application. As shown in FIG. 4 , the method may include S410 and S420. An example of an execution body of the method is the data processing device 370 in the system architecture shown in FIG. 3 .
  • the first image may be an image in an instance segmentation dataset such as the coco dataset or the cityscape dataset.
  • annotation information of the first image may also be acquired.
  • the annotation information of the first image may be referred to as first annotation information.
  • Outline information of each instance in the first image may be recorded in the first annotation information.
  • An example of the first annotation information is a set of coordinate points of the outer contour of each instance.
  • acquiring the first image may be understood as reading the first image from the image storage device.
  • the data processing device 370 can read the first image and the first annotation information from the database 330 .
  • contour information at the pixel level of each instance in the first image may be extracted according to the first annotation information. This process may be referred to as preprocessing of the first image.
  • the outer contour information of an instance is converted from a set of coordinate points into a mask matrix that can be efficiently processed by a computer.
  • the pixel corresponding to the instance can be set to 1
  • the pixel corresponding to the background can be set to 0.
  • an instance may also be referred to as a foreground or a foreground portion
  • a neighborhood of the instance may be referred to as a background or a background portion of the instance.
  • the neighborhood is an area adjacent to the instance, including but not limited to adjacent areas of various shapes, such as a subsequent enclosing rectangle neighborhood.
  • the data processing device 370 can use the "findContour()" function that extracts the outer contour that comes with opencv to extract the first image.
  • the pixel-level contour information of the instance is json text as an example.
  • S420 Perform multiple affine transformations on the first instance in the first image to obtain multiple candidate instances, where the multiple candidate instances are in one-to-one correspondence with the multiple affine changes.
  • the first instance may be any instance in the first image.
  • each instance can be regarded as the first instance, and then multiple affine transformations are performed on each instance to obtain multiple candidate instances of each instance.
  • Each candidate instance in the candidate instances is an instance obtained by processing the first instance by using the corresponding affine transformation among the multiple affine transformations.
  • each of a variety of affine transformations can be used to operate on the mask matrix of the first instance, so that a mask matrix corresponding to the affine transformation can be obtained, and the instance described by the mask matrix is the same as the mask matrix.
  • a candidate instance corresponding to an affine transformation can be used to operate on the mask matrix of the first instance, so that a mask matrix corresponding to the affine transformation can be obtained, and the instance described by the mask matrix is the same as the mask matrix.
  • a candidate instance corresponding to an affine transformation can be used to operate on the mask matrix of the first instance, so that a mask matrix corresponding to the affine transformation can be obtained, and the instance described by the mask matrix is the same as the mask matrix.
  • performing affine transformation on the instance may include performing one or more transformations on the instance in the first image by translation, rotation, scaling, reflection, shearing, and any combination of the foregoing transformations.
  • the above-mentioned various affine transformations may be pre-configured.
  • a rule for affine transformation can be preset, and multiple affine transformation matrices can be generated based on the rule, and each affine transformation matrix in the multiple affine transformation matrices corresponds to an affine transformation; as another
  • the multiple affine transformation matrices can be preset directly.
  • An exemplary representation of an affine transformation matrix is as follows:
  • t x represents the offset in the horizontal direction
  • ty represents the offset in the vertical direction (or vertical direction)
  • s represents the scaling scale
  • r represents the rotation angle
  • the width w in the horizontal direction of the circumscribed matrix frame of the first instance can be obtained, and the value range of t x is set to -20%*w to +20%*w, and the step size is set to 2 ;
  • ty can be fixed to 0;
  • the value range of the scaling scale s can be set to 0.8 to 1.2, and the step size is set to 0.05, that is, affine
  • the transformed instance is 80% to 120% of the original instance;
  • the value range of the rotation angle r can be set from -10 degrees to +10 degrees, and the step size can be set to 1 degree.
  • multiple affine transformation matrices can be obtained, and the multiple affine transformation matrices can form a
  • the fixation of ty to zero is only an example, and the fixed ty to zero, compared with not being fixed to zero, avoids that the difference between the enhanced image and the pre-enhanced image is too large, so that it can be avoided.
  • Using the enhanced image to train the instance segmentation model affects the training effect of the trained instance segmentation model, that is, it helps to improve the performance of the instance segmentation model.
  • the best candidate instance of the first instance in the first image should be selected from the multiple candidate instances.
  • the best candidate instance refers to the candidate instance with the largest grayscale difference between the multiple candidate instances and its neighborhood in the first image, that is, the best candidate instance and the best transformed instance are in the first
  • the gray level difference between the neighborhoods in the image is larger than the gray level difference between any other candidate instance in the multiple candidate instances and the neighborhood of the any one candidate instance in the first image.
  • the grayscale difference between each candidate instance in the multiple candidate instances and the specified neighborhood of the candidate instance in the first image can be obtained first, and finally the obtained Multiple grayscale differences corresponding to the multiple candidate instances one-to-one.
  • the circumscribed rectangle neighborhood of each candidate instance in the first image can be obtained, so as to obtain multiple neighborhoods corresponding to the multiple candidate instances one-to-one.
  • Multiple neighborhoods can be referred to as contour neighborhood sets.
  • the circumscribed rectangle neighborhood of each candidate instance can be understood as the neighborhood formed by pixels other than the transformed instance within the circumscribed rectangle of the candidate instance.
  • the first instance in the first image is a cloud
  • the oblique line portion in the circumscribed rectangular frame of the cloud outline represents the circumscribed rectangle neighborhood of the first instance of the cloud.
  • the first image may also include other content, for example, may include other instances, which are not shown in FIG. 9 .
  • the circumscribing rectangle neighborhood of each candidate instance is used as the field of the candidate instance is only an example, and this application does not limit the shape of the neighborhood of the candidate instance.
  • the neighborhood of the candidate instance in this application is not limited. It can also be the circumcircle neighborhood of the candidate instance.
  • the candidate instance corresponding to the neighborhood can be regarded as the foreground, and the neighborhood can be regarded as the background, and the pixel value variance between the foreground and the background can be calculated, The variance corresponding to the neighborhood is obtained, and the variance can be used as the grayscale difference between the neighborhood and the candidate instance.
  • u 0 represents the average gray level of the foreground
  • w 0 represents the ratio of the number of pixels in the foreground to the total number of pixels in the foreground and background
  • u 1 represents the average gray level of the background
  • w 1 represents the number of pixels in the background accounting for the total number of pixels.
  • the ratio of u represents the average gray level of the foreground and background
  • g represents the variance of the foreground and background.
  • the variance of the pixel values of the foreground and the background is regarded as the grayscale difference between the candidate instance and the neighborhood of the candidate instance, which is only an example.
  • the grayscale difference value between the foreground and the background can be obtained in other ways.
  • the 1 norm or the infinite norm of the pixel value of each foreground and its corresponding background can be calculated, and the difference between the foreground and the background can be calculated.
  • the 1 norm or infinity norm of the pixel value is regarded as the grayscale difference between the foreground and the background.
  • the maximum grayscale difference can be determined from the multiple grayscale differences, and the candidate corresponding to the maximum grayscale difference can be determined.
  • the instance is determined as the best candidate instance, and the neighborhood in the first image corresponding to the maximum grayscale difference can be determined as the optimal neighborhood, and the optimal neighborhood can also be called the target neighborhood.
  • the candidate instance corresponding to the maximum grayscale difference refers to the candidate instance on which the maximum grayscale difference is calculated, and the neighborhood corresponding to the maximum grayscale difference refers to the neighborhood based on which the maximum grayscale difference is calculated. area.
  • S440 adding the best candidate instance to the first image to obtain a second image enhanced by the first image.
  • the best candidate instance can be added to the first image, and the position of adding the best candidate instance should make the best candidate instance in the first image.
  • the neighborhood of is just the best neighborhood.
  • image inpainting technology may be used to process the image content at the location of the first instance in the first image, so as to fill the image content at the location as the background of the instance in the first image.
  • one or more instances in the first image may be regarded as the first instance, and the method shown in FIG. 4 is used to obtain the best transformed instance corresponding to each of the one or more instances, and The best transform instance is added to the first image, resulting in the second image.
  • annotation information of the second image may also be acquired.
  • the mask matrix can be transformed into the form of a coordinate point set, and the coordinate point set can be used as the contour information of the target instance.
  • the contour information of the best candidate instance may be recorded in the second annotation information.
  • the instance segmentation model can be trained by using the training data set obtained by processing. For example, after the data processing device 370 in FIG. 3 performs the first image processing on the first image in the database 330 , the training device 320 can use the processed second image to train the instance segmentation model to obtain the target model 301 .
  • the instance segmentation model can be used to perform instance segmentation.
  • the execution device 310 may execute an instance segmentation service based on the instance segmentation model. Taking the execution device 310 including the architecture shown in FIG. 2 as an example, the execution device 310 can implement the protagonist locking and spatiotemporal transformation services based on the instance segmentation model.
  • the largest grayscale difference value between the designated neighborhood in the first image and one's own designated neighborhood in the first image is selected from the multiple candidate instances of the first instance in the first image.
  • the best candidate instance of , and the best candidate instance is added to the first image.
  • the contour information of the best candidate instance can be considered as very reasonable contour information
  • the second annotation information of the second image is obtained based on the reasonable contour information
  • the second annotation information is obtained based on the second image and the second annotation information.
  • FIG. 5 is an exemplary flowchart of a data enhancement method according to another embodiment of the present application. As shown in Fig. 5, the method may include S450 in addition to S410 to S440. S450: Perform contrast enhancement processing on the first region of interest in the second image to obtain a third image, where the first region of interest includes the second instance and the second instance in the second image the neighborhood.
  • the second image may include one or more regions of interest (region of interest, ROI), and each region of interest may include an instance and a neighborhood of the instance.
  • the regions of interest of the second image may form a set of regions of interest.
  • the region of interest in the second image is referred to as the first region of interest
  • the instance in the first region of interest is referred to as the second instance
  • the neighborhood of the second instance is referred to as the second neighborhood.
  • the second neighborhood may include the circumscribed rectangle neighborhood of the second instance, or, in other words, the circumscribed rectangle of the second instance is located in the second neighborhood, or, in other words, the second neighborhood may not include the circumscribed rectangle of the second instance.
  • the pixel points outside the circumscribed rectangle neighborhood of the third example are also included.
  • a low-frequency part in the first region of interest may be acquired through a Gaussian low-pass filter.
  • a Gaussian low-pass filter can be implemented using the "GaussianBlur()" function in the open-source opencv algorithm.
  • the pixels in the first region of interest can be passed to a Gaussian low-pass filter, so that the high-frequency part of each pixel can be filtered out, and the low-frequency part of each pixel can be obtained.
  • the low-frequency part in the first region of interest After the low-frequency part in the first region of interest is obtained through the Gaussian low-pass filter, the low-frequency part can be subtracted from the original pixels of the first region of interest, so as to obtain the high-frequency part of the first region of interest.
  • the enhanced pixel of the high-frequency part can be calculated according to the gain value of the high-frequency part.
  • the pixel value of the pixel point after the enhancement of the low frequency part is calculated according to the gain value of the low frequency part, so as to obtain the image after contrast enhancement.
  • the pixel value of the pixel and the surrounding pixels have a larger mean square error.
  • you can The pixel value of the pixel is reduced by a smaller gain value to alleviate the phenomenon that the pixel is too bright; if the pixel is a pixel in the low frequency part, it can be considered that the pixel value of the pixel and the surrounding pixels
  • the mean square error of is small.
  • the high-frequency part can be amplified by a larger gain value to make the detailed features around the pixel more obvious, thereby alleviating the problem of image blurring.
  • the gain values of the high frequency part and the low frequency part can be preset as required.
  • the gain value of the high frequency part can be set to 0.5
  • the gain value of the low frequency part can be set to 2.
  • An implementation manner of calculating the enhanced pixel value of the high-frequency part of the pixels according to the gain value of the high-frequency part or calculating the enhanced pixel value of the low-frequency part according to the gain value of the low-frequency part is: combining the gain value with the The product of the pixel values of the pixel points is used as the pixel value after the pixel point is enhanced.
  • a third image that has undergone local adaptive contrast enhancement can be obtained.
  • a training data set with enhanced image contrast can be obtained.
  • the method may further include: using the training data set to train the instance segmentation model.
  • the training device 320 can use the processed third image to train the instance segmentation model to obtain the target model 301 .
  • the method may further include: using the instance segmentation model obtained by training to perform instance segmentation.
  • the execution device 310 may execute an instance segmentation service based on the instance segmentation model.
  • the execution device 310 can implement the protagonist locking and spatiotemporal transformation services based on the instance segmentation model.
  • the instance segmentation model trained by using the training data set with image contrast enhancement in this embodiment has high robustness, and has strong anti-interference ability and high tolerance for noise in images in noisy scenes Spend.
  • the data enhancement method shown in FIG. 4 or FIG. 5 can be used to process the images in the original training data set, and after obtaining the processed training data set, the training data set obtained by processing can be used for training Instance segmentation model.
  • the processing of the original training dataset in this embodiment may be referred to as offline data augmentation.
  • the method of FIG. 4 or FIG. 5 may be used to perform image processing on the original training data set. After the instance segmentation model is trained using the original training data set and the image processing is completed on the original training data set using the method in FIG. 4 or FIG. 5 , the instance segmentation model is trained using the training data set obtained by image processing.
  • the method of this embodiment may be referred to as an online data augmentation training method.
  • the method of this embodiment since the duration of one training iteration of the instance segmentation model is usually much longer than the duration of image processing, the method of this embodiment does not increase additional time-consuming, and the latest enhanced training data set can be obtained in real time, thereby The accuracy of the instance segmentation model can be further improved. For example, no matter how the original training data set of the instance segmentation model changes, the method of this embodiment can obtain the latest enhanced training data set based on the original training data set, so that an instance segmentation model with better performance can be obtained.
  • the image processing operations are completed in the constructor of the data_generator object.
  • first read the original training data set use the "imread” function in the opencv library to read the first image and the first annotation information in the original training data set, and use the "loadAnns” of the coco data set
  • the function reads the first annotation information, and implements the data enhancement method in Figure 4 or Figure 5 in the constructor of the data_generator object to output a data_generator object, and the above operations are performed by the image processing thread; then, the data_generator object is transmitted as a parameter to The training thread of the Tensorflow model, where the training thread and the image processing thread are executed in parallel, and the image processing thread will execute independently, output the processed second image or third image, and output the second annotation information to the public storage area, training Before each training iteration, the process goes to the public storage area to read the processed second image or the third image and output the second annotation information, so as to realize the training of the
  • FIG. 6 is a schematic structural diagram of a data enhancement apparatus 600 for an instance segmentation model according to an embodiment of the present application.
  • the apparatus 600 may be an example of the data processing device 370 in the system architecture shown in FIG. 3 .
  • the apparatus 600 may include an acquisition module 610 and a processing module 620, and optionally, may also include a training module.
  • the apparatus 600 may be used to implement the data augmentation method of the instance segmentation model in any of the foregoing embodiments, for example, may be used to implement the method shown in FIG. 4 or FIG. 5 .
  • the acquiring module 610 may be used to perform S410
  • the processing module 620 may be used to perform S420 to S440.
  • the processing module 620 may also be configured to perform S450.
  • the schematic structure of the apparatus for training an instance segmentation model according to an embodiment of the present application is similar to the structure of the apparatus 600 including a training module, and details are not repeated here.
  • the training device can be used to perform the aforementioned online data augmentation training method.
  • FIG. 7 is a schematic structural diagram of an apparatus 700 according to an embodiment of the present application.
  • the apparatus 700 includes a processor 702 , a communication interface 703 and a memory 704 .
  • the apparatus 700 may be a chip or a computing device.
  • the apparatus 700 may be the data processing device 370 in the system architecture shown in FIG. 3 or may be an example of a chip that can be applied to the data processing device 370 .
  • the apparatus 700 may be the training device 320 in the system architecture shown in FIG. 3 or may be an example of a chip that can be applied to the training device 320 .
  • the processor 702, the memory 704 and the communication interface 703 can communicate through a bus.
  • Executable code is stored in the memory 704, and the processor 702 reads the executable code in the memory 704 to execute the corresponding method.
  • the memory 704 may also include other software modules required for running processes such as an operating system.
  • the operating system can be LINUX TM , UNIX TM , WINDOWS TM and the like.
  • the executable code in the memory 704 is used to implement the method described in any one of the foregoing embodiments (the method shown in FIG. 4 or FIG. 5 ), and the processor 702 reads the executable code in the memory 704 to execute the foregoing method The method described in any one of the embodiments (the method shown in FIG. 4 or FIG. 5 ).
  • the processor 702 may include a CPU.
  • Memory 704 may include volatile memory, such as random access memory (RAM).
  • RAM random access memory
  • the memory 704 may also include non-volatile memory (2non-volatile memory, 2NVM), such as 2read-only memory (2ROM), flash memory, hard disk drive (HDD) or solid state drive ( solid state disk, SSD).
  • 2NVM non-volatile memory
  • the disclosed methods may be implemented as computer program instructions encoded on a computer-readable storage medium in a machine-readable format or on other non-transitory media or articles of manufacture.
  • Figure 8 schematically illustrates a conceptual partial view of an example computer program product comprising a computer program for executing a computer process on a computing device, arranged in accordance with any of the above-described embodiments.
  • example computer program product 800 is provided using signal bearing medium 801 .
  • the signal bearing medium 801 may include one or more program instructions 802 which, when executed by one or more processors, may provide the functions or part of the functions described in the methods described in any of the above embodiments.
  • one or more of the features of S410 to S430 may be undertaken by one or more instructions associated with the signal bearing medium 801 .
  • the signal bearing medium 801 may include a computer readable medium 803 such as, but not limited to, a hard drive, a compact disc (CD), a digital video disc (DVD), a digital tape, a memory, a read only memory (read only memory) -only memory, ROM) or random access memory (RAM), etc.
  • the signal bearing medium 801 may include a computer recordable medium 804 such as, but not limited to, memory, read/write (R/W) CD, R/W DVD, and the like.
  • signal bearing medium 801 may include communication medium 805, such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.).
  • signal bearing medium 801 may be conveyed by a wireless form of communication medium 805 (eg, a wireless communication medium that conforms to the IEEE 802.11 standard or other transmission protocol).
  • the one or more program instructions 802 may be, for example, computer-executable instructions or logic-implemented instructions.
  • the aforementioned computing devices may be configured to, in response to program instructions 802 communicated to the computing device via one or more of computer-readable media 803 , computer-recordable media 804 , and/or communication media 805 , Provides various operations, functions, or actions. It should be understood that the arrangements described herein are for illustrative purposes only.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: a U disk, a removable hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Provided in the present application are a data enhancement method and a training method for an instance segmentation model and a related apparatus in the field of computer vision. In the technical solution provided in the present application, the data enhancement method comprises: performing multiple affine transformations on a first instance in a first image, and selecting the best candidate instance from multiple candidate instances, wherein the best candidate instance has the maximum grayscale difference, and the grayscale difference of each of the multiple candidate instances is the grayscale difference between each candidate instance and the neighborhood of each candidate instance in the first image; and adding the best candidate instance to the first image to obtain an image after the first image is enhanced. In the technical solution of the present application, the grayscale difference between a candidate instance added to a first image and the neighborhood of the candidate instance is the greatest, and therefore, the contrast between the candidate instance and the background thereof is clearer. Re-training an instance segmentation model on the basis of the enhanced image can improve the segmentation accuracy of the instance segmentation model.

Description

实例分割模型的数据增强方法、训练方法和相关装置Data augmentation method, training method and related device for instance segmentation model 技术领域technical field
本申请涉及计算机视觉领域,并且,更具体地,涉及实例分割模型的数据增强方法、训练方法和相关装置。The present application relates to the field of computer vision, and, more particularly, to data augmentation methods, training methods and related apparatuses for instance segmentation models.
背景技术Background technique
实例分割任务是当前深度学习在计算机视觉领域的终点研究方向,主要研究检测图像的实例(例如人像、动物或指定的物体)的位置和类别,并从图像中分割出实例,以及输出像素级别的掩码(mask)来显示分割得到的示例。The task of instance segmentation is the current research direction of deep learning in the field of computer vision. mask to display the segmented examples.
实例分割已经被广泛应用于计算机视觉领域各种任务中,例如应用于自动驾驶、机器人控制等任务中。但是,在实例分割的应用中,发现,目前的实例分割方法总是会出现示例分割精度差的问题。Instance segmentation has been widely used in various tasks in the field of computer vision, such as automatic driving, robot control and other tasks. However, in the application of instance segmentation, it is found that the current instance segmentation methods always have the problem of poor instance segmentation accuracy.
为了解决上述问题,本领域技术人员提出一种解决方法,即利用现有的实例分割模型的数据集构造新的训练样本,从而从数量上和质量上扩充现有数据集,并将新数据集用于实例分割模型的训练。In order to solve the above problems, those skilled in the art propose a solution, that is, construct new training samples by using the existing dataset of instance segmentation model, so as to expand the existing dataset quantitatively and qualitatively, and combine the new dataset with the new dataset. For instance segmentation model training.
具体地,本领域技术人员提出的方法中,获取原始数据集的图像中的实例的轮廓信息,并基于该轮廓信息从该图像中剪切得到实例;然后将该实例粘贴到该图像中与该实例像素相似度最高的其他区域位置,并使用图像修复技术对该图像中该实例原来位置中的图像内容进行背景填充,从而得到新的图像,该新的图像即可作为增强后的图像来扩充原始数据集以得到新数据集;最后使用新数据集来训练实例分割模型,可以提高实例分割模型的分割精度。Specifically, in the method proposed by those skilled in the art, the contour information of the instance in the image of the original data set is obtained, and the instance is obtained by cutting out the image based on the contour information; then the instance is pasted into the image with the The location of other regions with the highest pixel similarity of the instance, and the image inpainting technique is used to fill the background of the image content in the original location of the instance in the image, so as to obtain a new image, which can be used as an enhanced image to expand The original data set is used to obtain a new data set; finally, the new data set is used to train the instance segmentation model, which can improve the segmentation accuracy of the instance segmentation model.
经过分析发现,上述方法虽然在一定程度上可以提高实例分割模型的分割精度,但是提升效果有限,在很多场景下还是不能满足计算机视觉领域各个任务对实例分割精度的需要。After analysis, it is found that although the above method can improve the segmentation accuracy of the instance segmentation model to a certain extent, the improvement effect is limited, and in many scenarios, it still cannot meet the needs of instance segmentation accuracy for various tasks in the field of computer vision.
发明内容SUMMARY OF THE INVENTION
本申请提供实例分割模型的数据增强方法、训练方法和相关装置,能够提高实例分割模型的分割精确度。The present application provides a data enhancement method, a training method and a related device for an instance segmentation model, which can improve the segmentation accuracy of the instance segmentation model.
第一方面,本申请提供一种实例分割模型的数据增强方法。所述增强方法包括:获取第一图像;对第一图像中的第一实例做多种仿射变换以得到多个候选实例,所述多个候选实例与所述多种仿射变换一一对应;在所述多个候选实例中选择最佳候选实例,所述最佳候选实例在所述多个候选实例中具有最大的灰度差异,所述多个候选实例中的每个候选实例的灰度差异是所述每个候选实例与所述每个候选实例在所述第一图像中的邻域之间的灰度差异;将所述最佳候选实例添加至所述第一图像中,以得到所述第一图像进行增强后的第二图像。In a first aspect, the present application provides a data enhancement method for an instance segmentation model. The enhancement method includes: acquiring a first image; performing multiple affine transformations on the first instance in the first image to obtain multiple candidate instances, and the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations ; select the best candidate instance among the plurality of candidate instances, the best candidate instance has the largest grayscale difference among the plurality of candidate instances, the grayscale of each candidate instance in the plurality of candidate instances The degree difference is the grayscale difference between each candidate instance and its neighborhood in the first image; adding the best candidate instance to the first image to A second image after the enhancement of the first image is obtained.
该方法中,对第一图像进行增强时,从第一图像中的第一实例的多个候选实例中选择了与自己在第一图像中的指定邻域之间的灰度差值最大的最佳候选实例,并将该最佳候选实例添加至第一图像。这使得第一图像进行数据增强得到的第二图像中,该最佳候选实例与其邻域之间的对比度较大,从而使得该最佳候选实例在第一图像中比较清晰。这种情况下,该最佳候选实例的轮廓信息可以认为是很合理的轮廓信息,基于该合理的轮廓信息来获取第二图像的第二标注信息,以及基于第二图像和该第二标注信息来训练实例分割模型,可以得到分割精度更准确的实例分割模型,或者说可以显著提高实例分割模型的分割精度。In this method, when the first image is enhanced, the one with the largest grayscale difference from the designated neighborhood in the first image is selected from a plurality of candidate instances of the first instance in the first image. the best candidate instance and add the best candidate instance to the first image. This results in a larger contrast between the best candidate instance and its neighborhood in the second image obtained by performing data enhancement on the first image, thereby making the best candidate instance clearer in the first image. In this case, the contour information of the best candidate instance can be considered as very reasonable contour information, and the second annotation information of the second image is obtained based on the reasonable contour information, and the second annotation information is obtained based on the second image and the second annotation information. To train the instance segmentation model, an instance segmentation model with more accurate segmentation accuracy can be obtained, or the segmentation accuracy of the instance segmentation model can be significantly improved.
其中,可选地,所述方法还可以包括:获取第一图像的第一标签信息,第一标签信息包含第一实例的第一轮廓信息;根据所述最大灰度差异对应的候选实例,获取第二图像的第二标签信息,第二标签信息包含所述最大灰度差异对应的候选实例的第二轮廓信息。Wherein, optionally, the method may further include: acquiring first label information of the first image, where the first label information includes the first contour information of the first instance; and acquiring, according to the candidate instance corresponding to the maximum grayscale difference, Second label information of the second image, where the second label information includes second contour information of the candidate instance corresponding to the maximum grayscale difference.
在第一方面的一些可能的实现方式中,所述每个候选实例与所述每个候选实例的邻域之间的灰度差异是根据所述每个候选实例与所述每个候选实例的邻域的像素值方差确定的,其中,所述每个候选实例与所述每个候选实例的邻域的像素值方差的计算方式如下:In some possible implementations of the first aspect, the grayscale difference between the each candidate instance and the neighborhood of each candidate instance is based on the difference between the each candidate instance and each candidate instance. The pixel value variance of the neighborhood is determined, wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:
g=w 0×(u 0-u) 2+w 1×(u 1-u) 2=w 0×w 1×(u 0-u 1) 2 g=w 0 ×(u 0 -u) 2 +w 1 ×(u 1 -u) 2 =w 0 ×w 1 ×(u 0 -u 1 ) 2
u=w 0×u 0+w 1×u 1 u=w 0 ×u 0 +w 1 ×u 1
其中,u 0表示所述每个候选实例的平均灰度,w 0表示所述每个候选实例的像素点数占所述每个候选实例与所述每个候选实例的邻域的总像素点数之比,u 1表示所述每个候选实例的邻域的平均灰度,w 1表示所述每个候选实例的邻域的像素点数占所述总像素点数之比,u表示所述每个候选实例和所述每个候选实例的邻域的平均灰度,g表示所述像素值方差。 Wherein, u 0 represents the average gray level of each candidate instance, and w 0 represents that the number of pixels of each candidate instance accounts for the total number of pixels in the neighborhood of each candidate instance and each candidate instance ratio, u 1 represents the average gray level of the neighborhood of each candidate instance, w 1 represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels, u represents the each candidate instance The average gray level of the instance and the neighborhood of each candidate instance, and g represents the pixel value variance.
在第一方面的一些可能的实现方式中,所述方法还包括:对所述第二图像内的第一感兴趣区域进行对比度增强处理,得到第三图像,所述第一感兴趣区域内包含所述第二图像内的第二实例和所述第二实例的邻域。In some possible implementations of the first aspect, the method further includes: performing contrast enhancement processing on a first region of interest in the second image to obtain a third image, where the first region of interest includes a second instance within the second image and a neighborhood of the second instance.
该实现方式中,使用对比度增强技术,对第二图像中的实例和其邻域进行局部自适应对比度增强,可以使得该实例和该邻域形成的局部图像的类间方差更大,达到局部图像对比更明显的效果。这使得基于局部自适应对比度增强后的第三图像进行训练的实例分割模型的鲁棒性更高,抗干扰能力更强。In this implementation, using the contrast enhancement technology to perform local adaptive contrast enhancement on the instance and its neighborhood in the second image, the inter-class variance of the local image formed by the instance and the neighborhood can be larger, and the local image can be achieved more obvious effect. This makes the instance segmentation model trained based on the locally adaptive contrast-enhanced third image more robust and more resistant to interference.
其中,第一感兴趣区域为第二图像的部分区域。第二图像可以包括多个不同的感兴趣区域,不同的感兴趣区域可以包含不同的实例,每个感兴趣区域为第二图像的部分局域。针对不同的感兴趣区域,分别进行对比度增强处理。The first region of interest is a partial region of the second image. The second image may include multiple different regions of interest, different regions of interest may contain different instances, and each region of interest is a partial local area of the second image. Contrast enhancement processing is performed separately for different regions of interest.
第二实例的邻域的一种示例为第二实例的外接矩形框邻域。第一感兴趣区域包含第二实例的邻域可以理解为第一感兴趣区域比第二实例的邻域大一些,大的程度可以预先设置。例如,第一感兴趣区域在长和宽上可以比第二实例的邻域大10个像素点。An example of the neighborhood of the second instance is the bounding box neighborhood of the second instance. The fact that the first region of interest includes the neighborhood of the second instance can be understood as the fact that the first region of interest is larger than the neighborhood of the second instance, and the degree of the larger degree can be preset. For example, the first region of interest may be 10 pixels larger in length and width than the neighborhood of the second example.
在第一方面的一些可能的实现方式中,所述方法还包括:根据所述第三图像对实例分割模型进行训练。In some possible implementations of the first aspect, the method further includes: training an instance segmentation model according to the third image.
在第一方面的一些可能的实现方式中,所述第一图像为第一数据集中的一个。所述方法还包括:根据所述第一数据集中的第四图像对实例分割模型进行训练,且所述训练与前述基于第一图像获得第二图像的过程同时进行;使用所述第二图像或第三图像对所述实例 分割模型进行训练。In some possible implementations of the first aspect, the first image is one of the first data sets. The method further includes: training an instance segmentation model according to the fourth image in the first data set, and the training is performed simultaneously with the aforementioned process of obtaining a second image based on the first image; using the second image or The third image trains the instance segmentation model.
该实现方式中,图像处理与模型训练同时进行,使得第一数据集发生变化的情况下,依然可以基于最新的第一数据集获取处理后的图像,并基于该处理后的图像对实例分割模型进行训练。也就是说,该实现方式可以在不额外增加时延的情况下,提高实例分割模型的性能。In this implementation, image processing and model training are performed at the same time, so that when the first data set changes, a processed image can still be obtained based on the latest first data set, and the instance segmentation model can be segmented based on the processed image. to train. That is to say, this implementation can improve the performance of the instance segmentation model without adding additional delay.
第二方面,本申请提供一种实例分割模型的训练方法。所述方法包括:获取第一数据集,所述第一数据集中包含多个图像;基于所述多个图像中的图像对实例分割模型进行训练的同时,对第一图像中的第一实例做多种仿射变换以得到多个候选实例,所述多个候选实例与所述多种仿射变换一一对应;在所述多个候选实例中选择最佳候选实例,所述最佳候选实例在所述多个候选实例中具有最大的灰度差异,所述多个候选实例中的每个候选实例的灰度差异是所述每个候选实例与所述每个候选实例在所述第一图像中的邻域之间的灰度差异;将所述最佳候选实例添加至所述第一图像中,以得到对所述第一图像增强后的第二图像;根据所述第二图像对所述实例分割模型进行训练。In a second aspect, the present application provides a training method for an instance segmentation model. The method includes: acquiring a first data set, the first data set including multiple images; while training an instance segmentation model based on the images in the multiple images, performing training on the first instance in the first image at the same time. A variety of affine transformations to obtain multiple candidate instances, the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations; the best candidate instance is selected from the multiple candidate instances, the best candidate instance Has the largest grayscale difference among the plurality of candidate instances, and the grayscale difference of each candidate instance in the plurality of candidate instances is the difference between the each candidate instance and the each candidate instance in the first The grayscale difference between neighborhoods in the image; adding the best candidate instance to the first image to obtain a second image enhanced for the first image; pairing according to the second image The instance segmentation model is trained.
该训练方法中,图像处理与模型训练同时进行,使得第一数据集发生变化的情况下,依然可以基于最新的第一数据集获取处理后的图像,并基于该处理后的图像对实例分割模型进行训练。也就是说,该实现方式可以在不额外增加时延的情况下,提高实例分割模型的性能。In this training method, image processing and model training are performed at the same time, so that when the first data set changes, a processed image can still be obtained based on the latest first data set, and the instance segmentation model can be segmented based on the processed image. to train. That is to say, this implementation can improve the performance of the instance segmentation model without adding additional delay.
在第二方面的一些可能的实现方式中,所述每个候选实例与所述每个候选实例的邻域之间的灰度差异是根据所述每个候选实例与所述每个候选实例的邻域的像素值方差确定的,其中,所述每个候选实例与所述每个候选实例的邻域的像素值方差的计算方式如下:In some possible implementations of the second aspect, the grayscale difference between each candidate instance and a neighborhood of each candidate instance is based on the difference between the each candidate instance and each candidate instance. The pixel value variance of the neighborhood is determined, wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:
g=w 0×(u 0-u) 2+w 1×(u 1-u) 2=w 0×w 1×(u 0-u 1) 2 g=w 0 ×(u 0 -u) 2 +w 1 ×(u 1 -u) 2 =w 0 ×w 1 ×(u 0 -u 1 ) 2
u=w 0×u 0+w 1×u 1 u=w 0 ×u 0 +w 1 ×u 1
其中,u 0表示所述每个候选实例的平均灰度,w 0表示所述每个候选实例的像素点数占所述每个候选实例与所述每个候选实例的邻域的总像素点数之比,u 1表示所述每个候选实例的邻域的平均灰度,w 1表示所述每个候选实例的邻域的像素点数占所述总像素点数之比,u表示所述每个候选实例和所述每个候选实例邻域的平均灰度,g表示所述像素值方差。 Wherein, u 0 represents the average gray level of each candidate instance, and w 0 represents that the number of pixels of each candidate instance accounts for the total number of pixels in the neighborhood of each candidate instance and each candidate instance ratio, u 1 represents the average gray level of the neighborhood of each candidate instance, w 1 represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels, u represents the each candidate instance The average gray level of the instance and the neighborhood of each candidate instance, and g represents the pixel value variance.
在第二方面的一些可能的实现方式中,所述根据所述第二图像对所述实例分割模型进行训练,包括:对所述第二图像内的第一感兴趣区域进行对比度增强处理,得到第三图像,所述第一感兴趣区域内包含所述第二图像内的第二实例和所述第二实例的邻域;根据所述第三图像对所述实例分割模型进行训练。In some possible implementations of the second aspect, the training of the instance segmentation model according to the second image includes: performing contrast enhancement processing on the first region of interest in the second image, to obtain A third image, the first region of interest includes a second instance in the second image and a neighborhood of the second instance; the instance segmentation model is trained according to the third image.
第三方面,本申请提供了一种实例分割模型的数据增强装置,该装置包括用于执行上述第一方面或其中任意一种实现方式中的方法的模块。In a third aspect, the present application provides an apparatus for data enhancement of an instance segmentation model. The apparatus includes a module for executing the method in the first aspect or any one of the implementation manners.
例如,所述装置包括:获取模块,用于获取第一图像。处理模块,用于:对第一图像中的第一实例做多种仿射变换以得到多个候选实例,所述多个候选实例与所述多种仿射变换一一对应;在所述多个候选实例中选择最佳候选实例,所述最佳候选实例在所述多个候选实例中具有最大的灰度差异,所述多个候选实例中的每个候选实例的灰度差异是所述每个候选实例与所述每个候选实例在所述第一图像中的邻域之间的灰度差异;将所述最佳候选实例添加至所述第一图像中,以得到对所述第一图像增强后的第二图像。For example, the apparatus includes: an acquisition module for acquiring the first image. a processing module, configured to: perform multiple affine transformations on the first instance in the first image to obtain multiple candidate instances, and the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations; The best candidate instance is selected from the candidate instances, the best candidate instance has the largest grayscale difference among the plurality of candidate instances, and the grayscale difference of each candidate instance in the plurality of candidate instances is the The grayscale difference between each candidate instance and the neighborhood of each candidate instance in the first image; adding the best candidate instance to the first image to obtain a A second image after image enhancement.
可选地,所述每个候选实例与所述每个候选实例的邻域之间的灰度差异是根据所述每 个候选实例与所述每个候选实例的邻域的像素值方差确定的,其中,所述每个候选实例与所述每个候选实例的邻域的像素值方差的计算方式如下:Optionally, the grayscale difference between the neighborhood of each candidate instance and each candidate instance is determined according to the pixel value variance of the neighborhood of each candidate instance and each candidate instance , wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:
g=w 0×(u 0-u) 2+w 1×(u 1-u) 2=w 0×w 1×(u 0-u 1) 2 g=w 0 ×(u 0 -u) 2 +w 1 ×(u 1 -u) 2 =w 0 ×w 1 ×(u 0 -u 1 ) 2
u=w 0×u 0+w 1×u 1 u=w 0 ×u 0 +w 1 ×u 1
其中,u 0表示所述每个候选实例的平均灰度,w 0表示所述每个候选实例的像素点数占所述每个候选实例和所述每个候选实例的邻域的总像素点数之比,u 1表示所述每个候选实例的邻域的平均灰度,w 1表示所述每个候选实例的邻域的像素点数占所述总像素点数之比,u表示所述每个候选实例和所述每个候选实例的邻域的平均灰度,g表示所述像素值方差。 Wherein, u 0 represents the average gray level of each candidate instance, and w 0 represents that the number of pixels of each candidate instance accounts for the total number of pixels in the neighborhood of each candidate instance and each candidate instance. ratio, u 1 represents the average gray level of the neighborhood of each candidate instance, w 1 represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels, u represents the each candidate instance The average gray level of the instance and the neighborhood of each candidate instance, and g represents the pixel value variance.
可选地,所述处理模块还用于:对所述第二图像内的第一感兴趣区域进行对比度增强处理,得到第三图像,所述第一感兴趣区域内包含所述第二图像内的第二实例和所述第二实例的邻域。Optionally, the processing module is further configured to: perform contrast enhancement processing on the first region of interest in the second image to obtain a third image, where the first region of interest includes the second image. A second instance of and a neighborhood of said second instance.
可选地,所述装置还包括训练模块,所述训练模块用于根据所述第三图像对实例分割模型进行训练。Optionally, the apparatus further includes a training module configured to train an instance segmentation model according to the third image.
可选地,所述第一图像为第一数据集中的一个。所述装置还包括训练模块,所述训练模块用于:根据所述第一数据集中的第四图像对实例分割模型进行训练,且所述训练与前述处理模块基于第一图像获得第二图像的处理过程同时进行;使用所述第二图像或第三图像对所述实例分割模型进行训练。Optionally, the first image is one of the first data sets. The device further includes a training module for: training the instance segmentation model according to the fourth image in the first data set, and the training and the aforementioned processing module obtain the second image based on the first image. Processing proceeds concurrently; the instance segmentation model is trained using the second or third image.
第四方面,本申请提供了一种实例分割模型的训练装置,该装置包括用于执行上述第二方面或其中任意一种实现方式中的方法的模块。In a fourth aspect, the present application provides an apparatus for training an instance segmentation model, where the apparatus includes a module for executing the method in the second aspect or any one of the implementation manners.
例如,所述训练装置包括:获取模块,用于获取第一数据集,所述第一数据集中包含多个图像;训练模块,用于基于所述多个图像中的图像对实例分割模型进行训练;处理模块,用于在所述训练模块基于所述多个图像中的图像对实例分割模型进行训练的同时,对第一图像中的第一实例做多种仿射变换以得到多个候选实例,所述多个候选实例与所述多种仿射变换一一对应;在所述多个候选实例中选择最佳候选实例,所述最佳候选实例在所述多个候选实例中具有最大的灰度差异,所述多个候选实例中的每个候选实例的灰度差异是所述每个候选实例与所述每个候选实例在所述第一图像中的邻域之间的灰度差异;将所述最佳候选实例添加至所述第一图像中,得到所述第一图像进行增强后的第二图像。For example, the training device includes: an acquisition module for acquiring a first data set, where the first data set includes a plurality of images; a training module for training an instance segmentation model based on the images in the plurality of images A processing module for performing multiple affine transformations on the first instance in the first image to obtain multiple candidate instances while the training module trains the instance segmentation model based on the images in the multiple images , the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations; the best candidate instance is selected among the multiple candidate instances, and the best candidate instance has the largest number of candidate instances among the multiple candidate instances. The grayscale difference, the grayscale difference of each candidate instance in the plurality of candidate instances is the grayscale difference between the each candidate instance and the neighborhood of each candidate instance in the first image ; adding the best candidate instance to the first image to obtain a second image enhanced by the first image.
可选地,所述每个候选实例与所述每个候选实例的邻域之间的灰度差异是根据所述每个候选实例与所述每个候选实例的邻域的像素值方差确定的,其中,所述每个候选实例与所述每个候选实例的邻域的像素值方差的计算方式如下:Optionally, the grayscale difference between the neighborhood of each candidate instance and each candidate instance is determined according to the pixel value variance of the neighborhood of each candidate instance and each candidate instance , wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:
g=w 0×(u 0-u) 2+w 1×(u 1-u) 2=w 0×w 1×(u 0-u 1) 2 g=w 0 ×(u 0 -u) 2 +w 1 ×(u 1 -u) 2 =w 0 ×w 1 ×(u 0 -u 1 ) 2
u=w 0×u 0+w 1×u 1 u=w 0 ×u 0 +w 1 ×u 1
其中,u 0表示所述每个候选实例的平均灰度,w 0表示所述每个候选实例的像素点数占所述每个候选实例与所述每个候选实例的邻域的总像素点数之比,u 1表示所述每个候选实施的邻域的平均灰度,w 1表示所述每个候选实例的邻域的像素点数占所述总像素点数之比,u表示所述每个候选实例和所述每个候选实例的邻域的平均灰度,g表示所述像素值方差。 Wherein, u 0 represents the average gray level of each candidate instance, and w 0 represents that the number of pixels of each candidate instance accounts for the total number of pixels in the neighborhood of each candidate instance and each candidate instance ratio, u 1 represents the average gray level of the neighborhood of each candidate implementation, w 1 represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels, u represents the each candidate The average gray level of the instance and the neighborhood of each candidate instance, and g represents the pixel value variance.
可选地,所述训练模块具体用于:对所述第二图像内的第一感兴趣区域进行对比度增 强处理,得到第三图像,所述第一感兴趣区域内包含所述第二图像内的第二实例和所述第二实例的邻域;根据所述第三图像对所述实例分割模型进行训练。Optionally, the training module is specifically configured to: perform contrast enhancement processing on the first region of interest in the second image to obtain a third image, where the first region of interest includes the second image. The second instance of and the neighborhood of the second instance; the instance segmentation model is trained according to the third image.
第五方面,本申请提供了一种实例分割模型的数据增强装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第一方面或者其中任意一种实现方式中的方法。In a fifth aspect, the present application provides a data enhancement device for an instance segmentation model, the device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is When executed, the processor is configured to execute the method in the first aspect or any one of the implementation manners.
第六方面,本申请提供了一种实例分割模型的训练装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第二方面或者其中任意一种实现方式中的方法。In a sixth aspect, the present application provides an apparatus for training an instance segmentation model, the apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed , the processor is configured to execute the method in the second aspect or any one of the implementation manners.
第七方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码用于执行第一方面或其中任意一种实现方式中的方法。In a seventh aspect, a computer-readable medium is provided, where the computer-readable medium stores program code for device execution, where the program code is used to execute the method in the first aspect or any one of the implementation manners thereof.
第八方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码用于执行第二方面或其中任意一种实现方式中的方法。In an eighth aspect, a computer-readable medium is provided, where the computer-readable medium stores program codes for device execution, where the program codes are used to execute the method in the second aspect or any one of the implementation manners thereof.
第九方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面或其中任意一种实现方式中的方法。In a ninth aspect, there is provided a computer program product comprising instructions, when the computer program product is run on a computer, the computer program product causes the computer to execute the method in the first aspect or any one of the implementation manners.
第十方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第二方面或其中任意一种实现方式中的方法。A tenth aspect provides a computer program product comprising instructions, which when the computer program product is run on a computer, causes the computer to execute the method of the second aspect or any one of the implementation manners.
第十一方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面或其中任意一种实现方式中的方法。In an eleventh aspect, a chip is provided, the chip includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes the first aspect or any one of the implementation manners described above. Methods.
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面或其中任意一种实现方式中的方法。Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in the first aspect or any one of the implementation manners thereof.
第十二方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第二方面或其中任意一种实现方式中的方法。A twelfth aspect provides a chip, the chip includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes the second aspect or any one of the implementation manners. Methods.
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第二方面或其中任意一种实现方式中的方法。Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in the second aspect or any one of the implementations.
第十三方面,提供了一种计算设备,该计算设备包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第一方面或者其中任意一种实现方式中的方法。A thirteenth aspect provides a computing device, the computing device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed, the The processor is configured to execute the method in the first aspect or any one of the implementation manners.
第十四方面,提供了一种计算设备,该计算设备包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第二方面或者其中任意一种实现方式中的方法。A fourteenth aspect provides a computing device, the computing device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed, the The processor is configured to execute the method in the second aspect or any one of the implementation manners.
第十五方面,本申请提供一种实例分割方法,该方法包括:使用第一方面或第二方面中训练得到的实例分割模型对图像进行实例分割。In a fifteenth aspect, the present application provides an instance segmentation method, the method comprising: performing instance segmentation on an image by using the instance segmentation model trained in the first aspect or the second aspect.
第十六方面,本申请提供了一种实例分割装置,该装置包括用于执行上述第十五方面或其中任意一种实现方式中的方法的模块。In a sixteenth aspect, the present application provides an instance segmentation apparatus, the apparatus including a module for performing the method in the fifteenth aspect or any one of the implementation manners.
第十七方面,本申请提供了一种实例分割装置,该装置包括:存储器,用于存储程序; 处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第十五方面或者其中任意一种实现方式中的方法。In a seventeenth aspect, the present application provides an instance partitioning device, the device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed, The processor is configured to execute the method in the fifteenth aspect or any one of the implementation manners.
第十八方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码用于执行第十五方面或其中任意一种实现方式中的方法。In an eighteenth aspect, a computer-readable medium is provided, where the computer-readable medium stores program code for execution by a device, where the program code is used for executing the method in the fifteenth aspect or any one of the implementation manners thereof.
第十九方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第十五方面或其中任意一种实现方式中的方法。A nineteenth aspect provides a computer program product containing instructions, which when the computer program product is run on a computer, causes the computer to execute the method in the fifteenth aspect or any one of the implementation manners.
第二十方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第十五方面或其中任意一种实现方式中的方法。A twentieth aspect provides a chip, the chip includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes the fifteenth aspect or any one of the implementations thereof method in .
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第十五方面或其中任意一种实现方式中的方法。Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in the fifteenth aspect or any one of the implementation manners thereof.
第二十一方面,提供了一种计算设备,该计算设备包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第十五方面或者其中任意一种实现方式中的方法。A twenty-first aspect provides a computing device, the computing device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed, the The processor is configured to execute the method in the fifteenth aspect or any one of the implementation manners.
附图说明Description of drawings
图1是本申请一个实施例的相关概念的示意图。FIG. 1 is a schematic diagram of a related concept of an embodiment of the present application.
图2是可以应用本申请实施例的实例分割模型的示意性场景图。FIG. 2 is a schematic scene diagram of an instance segmentation model to which an embodiment of the present application can be applied.
图3是可以应用本申请各个实施例的方法的系统的示意性架构图。FIG. 3 is a schematic architecture diagram of a system to which the methods of various embodiments of the present application can be applied.
图4是本申请一个实施例的数据增强方法的示意性流程图。FIG. 4 is a schematic flowchart of a data enhancement method according to an embodiment of the present application.
图5是本申请另一个实施例的数据增强方法的示意性流程图。FIG. 5 is a schematic flowchart of a data enhancement method according to another embodiment of the present application.
图6是本申请一个实施例的数据增强装置的示意性结构图。FIG. 6 is a schematic structural diagram of a data enhancement apparatus according to an embodiment of the present application.
图7是本申请一个实施例的装置的示意性结构图。FIG. 7 is a schematic structural diagram of an apparatus according to an embodiment of the present application.
图8是本申请一个实施例的计算机程序产品的示意性结构图。FIG. 8 is a schematic structural diagram of a computer program product according to an embodiment of the present application.
图9是本申请一个实施例的感兴趣区域的示意图。FIG. 9 is a schematic diagram of a region of interest in an embodiment of the present application.
具体实施方式detailed description
为了便于理解本申请实施例,下面先结合图1介绍几个与本申请实施例相关的概念。在图1的例子中,图片中含有1个人、2只羊与1只狗等实例。应理解,图1仅作为示例而非限定。In order to facilitate understanding of the embodiments of the present application, the following first introduces several concepts related to the embodiments of the present application with reference to FIG. 1 . In the example of Fig. 1, the picture contains instances such as 1 person, 2 sheep, and 1 dog. It should be understood that FIG. 1 is only an example and not a limitation.
如图1左上角所示,图像分类指的是,对图像判断出实例所属的分类。例如,在数据集中有人(person)、羊(sheep)、狗(dog)和猫(cat)四种分类,图像分类就是要得到(或输出)给定的一个图片中含有哪些分类。例如,在图1的例子中,图像分类任务的输出是标注出图片中的分类:人、羊、狗。As shown in the upper left corner of Figure 1, image classification refers to the classification of an image to which an instance belongs. For example, in the dataset, there are four categories of people (person), sheep (sheep), dog (dog) and cat (cat), and image classification is to get (or output) which categories are contained in a given picture. For example, in the example of Figure 1, the output of the image classification task is to label the classification in the picture: person, sheep, dog.
如图1右上角所示,目标检测简单来说,就是求出图片里面有什么目标,以及这些目标的位置(例如,把目标用矩形框框住,这个矩形框可以称为检测框)。例如,在图1的例子中,目标检测任务的输出为,标注图片中1个人、2只羊、1只狗各自的边界框(如图1右上角图片中的矩形框)。As shown in the upper right corner of Figure 1, target detection is simply to find out what targets are in the picture and the positions of these targets (for example, frame the target with a rectangular frame, which can be called a detection frame). For example, in the example of Figure 1, the output of the target detection task is to label the bounding boxes of 1 person, 2 sheep, and 1 dog in the picture (the rectangular box in the upper right corner of Figure 1).
如图1左下角所示,语义分割指的是,需要区分到图片中的每一点像素点,而不仅仅是用矩形框框住目标,但是同一物体的不同实例不需要单独分割出来。例如,在图1的例子中,语义分割任务的输出是,标注出图片中的人,羊,狗,但不需要标注出羊1与羊2。语义分割也是通常意义上的目标分割。As shown in the lower left corner of Figure 1, semantic segmentation refers to the need to distinguish every pixel in the image, not just frame the target with a rectangular frame, but different instances of the same object do not need to be segmented separately. For example, in the example of Figure 1, the output of the semantic segmentation task is to label people, sheep, and dogs in the picture, but it is not necessary to label sheep 1 and 2. Semantic segmentation is also object segmentation in the usual sense.
如图1右下角所示,实例分割指的是,目标检测和语义分割的结合。相对于目标检测的边界框,实例分割可精确到物体的边缘,相对于语义分割,实例分割需要标注出图上同一物体的不同实例。例如,在图1的例子中,人有1个实例,羊有2个实例,狗有1个示例,实例分割任务就是要把这些实例都标注出来。实例分割的预测结果可以称为分割掩模。分割掩模质量可以表征实例分割的预测结果的优劣。As shown in the lower right corner of Figure 1, instance segmentation refers to the combination of object detection and semantic segmentation. Compared with the bounding box of target detection, instance segmentation can be accurate to the edge of the object. Compared with semantic segmentation, instance segmentation needs to label different instances of the same object on the map. For example, in the example in Figure 1, there is 1 instance of human, 2 instances of sheep, and 1 instance of dog. The instance segmentation task is to label all these instances. The predicted result of instance segmentation can be called segmentation mask. The segmentation mask quality can characterize the quality of the prediction results of instance segmentation.
图2为本申请的实例分割模型的一种示例性应用场景的框架图。该示例性应用场景为手机的畅连通话业务。由如图2所示,畅连通话业务由用户层、应用层和计算层协同实现。需要理解,后续实施例以手机为应用场景进行介绍,实际上所述方案不限于手机,也可应用于电脑、服务器或可穿戴设备等其他类型的电子设备。FIG. 2 is a frame diagram of an exemplary application scenario of the instance segmentation model of the present application. The exemplary application scenario is the smooth connection call service of the mobile phone. As shown in Figure 2, the smooth connection call service is implemented by the user layer, the application layer and the computing layer. It should be understood that the following embodiments are introduced using a mobile phone as an application scenario. In fact, the solution is not limited to mobile phones, and can also be applied to other types of electronic devices such as computers, servers, or wearable devices.
其中,用户层可以包含畅连通话接口,手机的用户可以通过该畅连通话接口接入畅连通话业务;应用层可以通过畅连通话接口为用户提供基础通话服务和特色服务,基础通话服务可以包含登录账户、发起通话、结束通话和/或前后置摄像头转换等服务,特色服务可以包含美肤特效、暗光高清、时空转换和主角锁定等服务;计算层包含多种芯片底层接口,并通过这些接口向应用层提供功能,以实现应用层的各种服务。Among them, the user layer may include a Changlian call interface, through which the user of the mobile phone can access the Changlian call service; the application layer can provide users with basic call services and special services through the Changlian call interface, and the basic call service can It includes services such as logging in to an account, initiating a call, ending a call, and/or switching between front and rear cameras. Featured services can include services such as skin beautifying effects, dark light HD, time-space transformation, and protagonist locking; These interfaces provide functions to the application layer to implement various services of the application layer.
例如,主角锁定,是指将用户指定的主角人像保留,其他人像和背景均去除,此时视频中只能保留主角人像的像素。时空变换,是在视频通话过程中保留视频中人像部分而替换背景的操作,达到时空变换的效果。For example, protagonist locking means that the protagonist portrait specified by the user is retained, and other portraits and backgrounds are removed. At this time, only the pixels of the protagonist portrait can be retained in the video. Space-time transformation is the operation of retaining the portrait part in the video and replacing the background during the video call, so as to achieve the effect of space-time transformation.
作为一种实例,本申请的实例分割模型可以应用于手机业务畅连通话中的时空变换应用和主角锁定应用中。这两个应用依赖于高精度的人像实例分割算法的实例分割结果,尤其在多个人像交互遮挡或者人像被其他物体遮挡的情况下,对实例分割算法的实例分割精确度的要求会更高。As an example, the instance segmentation model of the present application can be applied to the application of time-space transformation and the application of protagonist locking in the mobile phone service smooth communication. These two applications rely on the instance segmentation results of the high-precision portrait instance segmentation algorithm, especially when multiple portraits are occluded interactively or the portrait is occluded by other objects, the instance segmentation accuracy of the instance segmentation algorithm will be higher.
图3为可以应用本申请实施例的实例分割模型的一种系统架构300的示例性结构图。在图3中,数据采集设备360用于采集训练数据。以该系统架构用于实例分割为例,训练数据可以包括训练图像以及训练图像中的实例的轮廓信息,其中,训练图像中的实例的轮廓信息可以是人工预先标注的结果,该轮廓信息可以称为训练图像的标注信息。FIG. 3 is an exemplary structural diagram of a system architecture 300 to which the instance segmentation model of the embodiment of the present application can be applied. In Figure 3, a data collection device 360 is used to collect training data. Taking the system architecture for instance segmentation as an example, the training data may include the training image and the contour information of the instance in the training image, wherein the contour information of the instance in the training image may be the result of manual pre-marking, and the contour information may be called Annotation information for training images.
在采集到训练数据之后,数据采集设备360将这些训练数据存入数据库330,训练设备320基于数据库330中维护的训练数据训练得到目标模型301,该目标模型301可以为实例分割模型。本申请中的目标模型也可以替换为目标规则。After collecting the training data, the data collection device 360 stores the training data in the database 330 , and the training device 320 trains the target model 301 based on the training data maintained in the database 330 , and the target model 301 may be an instance segmentation model. The target model in this application can also be replaced with target rules.
在一些实现方式中,在数据采集设备360采集到训练数据之后,数据处理设备370可以对这些训练数据进行进一步的处理,以提高目标模型301的性能。例如,数据处理设备370可以对数据库330中的训练图像进行数据增强,以扩充数据库330中的训练图像,从而使得训练设备320基于扩充后的数据库330能够训练得到分割精度更高的实例分割模型。In some implementations, after the data collection device 360 collects the training data, the data processing device 370 may further process the training data to improve the performance of the target model 301 . For example, the data processing device 370 can perform data enhancement on the training images in the database 330 to expand the training images in the database 330, so that the training device 320 can train an instance segmentation model with higher segmentation accuracy based on the expanded database 330.
下面对训练设备320基于训练数据得到目标模型301进行描述。以该系统架构用于实例分割为例,训练设备320对输入的原始图像进行实例分割,将分割得到的实例轮廓结果 与原始图像的标注信息进行对比,并根据对比结果调整目标模型301的参数,直到训练设备320输出的轮廓信息与原始图像的标注信息之间的差值小于一定的阈值,从而完成目标模型301的训练。The following describes how the training device 320 obtains the target model 301 based on the training data. Taking the system architecture for instance segmentation as an example, the training device 320 performs instance segmentation on the input original image, compares the instance contour result obtained by segmentation with the label information of the original image, and adjusts the parameters of the target model 301 according to the comparison result, Until the difference between the contour information output by the training device 320 and the label information of the original image is less than a certain threshold, the training of the target model 301 is completed.
可以理解的是,在实际的应用中,所述数据库330中维护的训练数据不一定都是数据采集设备360采集的,也有可能是从其他设备接收得到的。此外,训练设备320也不一定必须完全基于数据库330维护的训练数据进行目标模型301的训练,也可以从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。It can be understood that, in practical applications, the training data maintained in the database 330 are not necessarily all collected by the data collection device 360, and may also be received from other devices. In addition, the training device 320 does not necessarily have to completely train the target model 301 based on the training data maintained by the database 330, and can also obtain training data from the cloud or other places for model training. The above description should not be taken as a limitation to the embodiments of the present application.
可以理解的是,该系统架构中,训练设备320与数据处理设备370可以是同一个设备。It can be understood that, in the system architecture, the training device 320 and the data processing device 370 may be the same device.
根据训练设备320训练得到的目标模型301可以应用于不同的系统或设备中,如应用于图3所示的执行设备310中。所述执行设备310可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端等,还可以是资源受限的服务器或者资源受限云端设备等。例如,执行设备310的一种示例可以是包含图2所示的示例性结构的手机。The target model 301 trained according to the training device 320 can be applied to different systems or devices, such as being applied to the execution device 310 shown in FIG. 3 . The execution device 310 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (AR)/virtual reality (VR), a vehicle-mounted terminal, etc., or a server with limited resources or Resource-constrained cloud devices, etc. For example, an example of the execution device 310 may be a cell phone including the exemplary structure shown in FIG. 2 .
在图3中,执行设备310配置输入/输出(input/output,I/O)接口312,用于与外部设备进行数据交互,用户可以通过客户设备340向I/O接口312输入数据。以该系统架构用于实例分割为例,所述输入数据可以包括:通过客户设备340的摄像头采集的图像。In FIG. 3 , the execution device 310 configures an input/output (I/O) interface 312 for data interaction with external devices, and a user can input data to the I/O interface 312 through the client device 340 . Taking the system architecture for instance segmentation as an example, the input data may include: images collected by the camera of the client device 340 .
可以理解的是,图3所示的系统架构中,执行设备310与客户设备340可以是同一个设备。It can be understood that, in the system architecture shown in FIG. 3 , the execution device 310 and the client device 340 may be the same device.
在执行设备310对输入数据进行预处理,或者在执行设备310的计算模块311执行计算等相关的处理过程中,执行设备310可以调用数据存储系统350中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统350中。When the execution device 310 preprocesses the input data, or the calculation module 311 of the execution device 310 performs calculations and other related processing, the execution device 310 can call the data, codes, etc. in the data storage system 350 for corresponding processing , the data and instructions obtained by corresponding processing may also be stored in the data storage system 350 .
最后,I/O接口312将处理结果,以实例分割为例,将待分割图像的实例分割结果返回给客户设备340,从而提供给用户。Finally, the I/O interface 312 returns the processing result, taking instance segmentation as an example, and returns the instance segmentation result of the image to be segmented to the client device 340 so as to be provided to the user.
在图3所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口312提供的界面进行操作。另一种情况下,客户设备340可以自动地向I/O接口312发送输入数据,如果要求客户设备340自动发送输入数据需要获得用户的授权,则用户可以在客户设备340中设置相应权限。用户可以在客户设备340查看执行设备310输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备340也可以作为数据采集端,采集如图所示输入I/O接口312的输入数据及输出I/O接口312的输出结果作为新的样本数据,并存入数据库330。当然,也可以不经过客户设备340进行采集,而是由I/O接口312直接将输入I/O接口312的输入数据及输出I/O接口312的输出结果,作为新的样本数据存入数据库330。In the case shown in FIG. 3 , the user can manually give input data, and the manual setting can be operated through the interface provided by the I/O interface 312 . In another case, the client device 340 can automatically send the input data to the I/O interface 312 . If the user's authorization is required to request the client device 340 to automatically send the input data, the user can set the corresponding permission in the client device 340 . The user can view the result output by the execution device 310 on the client device 340, and the specific presentation form can be a specific manner such as display, sound, and action. The client device 340 can also be used as a data collection terminal to collect the input data of the input I/O interface 312 and the output result of the output I/O interface 312 as new sample data as shown in the figure, and store them in the database 330 . Of course, it is also possible to collect the input data from the I/O interface 312 and the output result from the output I/O interface 312 directly by the I/O interface 312 without going through the client device 340 to store the data as new sample data in the database. 330.
可以理解的是,图3仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图3中,数据存储系统350相对执行设备310是外部存储器,在其它情况下,也可以将数据存储系统350置于执行设备310中。It can be understood that FIG. 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 3 , the data The storage system 350 is an external memory relative to the execution device 310 , and in other cases, the data storage system 350 may also be placed in the execution device 310 .
图4为本申请一个实施例的实例分割模型的数据增强方法的示例性流程图。如图4所示,该方法可以包含S410和S420。该方法的执行主体的一种示例为图3所示系统架构中的数据处理设备370。FIG. 4 is an exemplary flowchart of a data augmentation method for an instance segmentation model according to an embodiment of the present application. As shown in FIG. 4 , the method may include S410 and S420. An example of an execution body of the method is the data processing device 370 in the system architecture shown in FIG. 3 .
S410,获取第一图像。本实施例中,第一图像中可以包含一个或多个实例。第一图像可以是coco数据集或cityscape数据集等实例分割数据集中的图像。S410, acquiring a first image. In this embodiment, one or more instances may be included in the first image. The first image may be an image in an instance segmentation dataset such as the coco dataset or the cityscape dataset.
本实施例中,获取第一图像的同时,还可以获取第一图像的标注信息。第一图像的标注信息可以称为第一标注信息。第一标注信息中可以记录第一图像中的每个实例的轮廓信息。第一标注信息的一种示例为每个实例的外轮廓的坐标点集合。In this embodiment, while acquiring the first image, annotation information of the first image may also be acquired. The annotation information of the first image may be referred to as first annotation information. Outline information of each instance in the first image may be recorded in the first annotation information. An example of the first annotation information is a set of coordinate points of the outer contour of each instance.
本实施例中,获取第一图像可以理解为从图像存储设备中读取第一图像。以该方法的执行主体为图3中的数据处理设备370为例,数据处理设备370可以从数据库330中读取第一图像和第一标注信息。In this embodiment, acquiring the first image may be understood as reading the first image from the image storage device. Taking the execution body of the method as the data processing device 370 in FIG. 3 as an example, the data processing device 370 can read the first image and the first annotation information from the database 330 .
本实施例的一些实现方式中,获取第一图像和第一标注信息之后,可以根据第一标注信息提取第一图像中每个实例像素级别的轮廓信息。该处理可以称为第一图像的预处理。In some implementations of this embodiment, after acquiring the first image and the first annotation information, contour information at the pixel level of each instance in the first image may be extracted according to the first annotation information. This process may be referred to as preprocessing of the first image.
例如,将实例的外轮廓信息由坐标点集合转换为计算机可以高效处理的掩膜(mask)矩阵。mask矩阵中,与实例对应的像素可以设为1,与背景对应的像素可以设为0。本实施例中,实例也可以称为前景或前景部分,该实例的邻域可以称为该实例的背景或背景部分。所述邻域是所述实例相邻的区域,包括但不限于各种形状的相邻的区域,例如后续的外接矩形邻域。For example, the outer contour information of an instance is converted from a set of coordinate points into a mask matrix that can be efficiently processed by a computer. In the mask matrix, the pixel corresponding to the instance can be set to 1, and the pixel corresponding to the background can be set to 0. In this embodiment, an instance may also be referred to as a foreground or a foreground portion, and a neighborhood of the instance may be referred to as a background or a background portion of the instance. The neighborhood is an area adjacent to the instance, including but not limited to adjacent areas of various shapes, such as a subsequent enclosing rectangle neighborhood.
以第一图像为coco数据集,且该数据集中的图像的标注信息为json文本为例,数据处理设备370可以使用opencv自带的提取外轮廓的“findContour()”函数来提取第一图像中的实例的像素级别的轮廓信息。Taking the first image as the coco data set, and the annotation information of the images in the data set is json text as an example, the data processing device 370 can use the "findContour()" function that extracts the outer contour that comes with opencv to extract the first image. The pixel-level contour information of the instance.
S420,对第一图像中的第一实例做多种仿射变换以得到多个候选实例,所述多个候选实例与所述多种仿射变化一一对应。本实施例中,第一实例可以是第一图像中的任意一个实例。在第一图像中包含多个实例的情况下,可以将每个实例看作第一实例,然后对每个实例进行多种仿射变换,以得到每个实例的多个候选实例,这多个候选实例中的每个候选实例是使用这多中仿射变换中对应的仿射变换对第一实例经过处理得到的实例。S420: Perform multiple affine transformations on the first instance in the first image to obtain multiple candidate instances, where the multiple candidate instances are in one-to-one correspondence with the multiple affine changes. In this embodiment, the first instance may be any instance in the first image. When the first image contains multiple instances, each instance can be regarded as the first instance, and then multiple affine transformations are performed on each instance to obtain multiple candidate instances of each instance. Each candidate instance in the candidate instances is an instance obtained by processing the first instance by using the corresponding affine transformation among the multiple affine transformations.
例如,可以使用多种仿射变换中的每种仿射变换对第一实例的mask矩阵进行操作,从而可以得到与该种仿射变换对应的mask矩阵,该mask矩阵描述的实例即为与该种仿射变换对应的候选实例。For example, each of a variety of affine transformations can be used to operate on the mask matrix of the first instance, so that a mask matrix corresponding to the affine transformation can be obtained, and the instance described by the mask matrix is the same as the mask matrix. A candidate instance corresponding to an affine transformation.
本实施例中,对实例进行仿射变换可以包含对实例在第一图像中做进行平移、旋转、缩放、反射、剪切以及前述这些变换的任意组合中的一种或多种变换。本实施例中,上述多种仿射变换可以是预先配置好的。作为一个示例,可以预设设置好仿射变换的规则,基于该规则可以生成多个仿射变换矩阵,这多个仿射变换矩阵中每个仿射变换矩阵对应一种仿射变换;作为另一个示例,可以直接预设好这多个仿射变换矩阵。In this embodiment, performing affine transformation on the instance may include performing one or more transformations on the instance in the first image by translation, rotation, scaling, reflection, shearing, and any combination of the foregoing transformations. In this embodiment, the above-mentioned various affine transformations may be pre-configured. As an example, a rule for affine transformation can be preset, and multiple affine transformation matrices can be generated based on the rule, and each affine transformation matrix in the multiple affine transformation matrices corresponds to an affine transformation; as another As an example, the multiple affine transformation matrices can be preset directly.
仿射变换矩阵的一种示例性表现形式如下:An exemplary representation of an affine transformation matrix is as follows:
Figure PCTCN2020106112-appb-000001
Figure PCTCN2020106112-appb-000001
其中,t x表示水平方向上的偏移量,t y表示竖直方向(或称为垂直方向)上的偏移量,s表示缩放尺度,r表示旋转角度。 Among them, t x represents the offset in the horizontal direction, ty represents the offset in the vertical direction (or vertical direction), s represents the scaling scale, and r represents the rotation angle.
以上述形式的仿射变换矩阵为例,下面介绍获取多个仿射变换矩阵的示例性实现方式。该实现方式中,可以获取第一实例的外接矩阵框在水平方向上的宽度w,并将t x的取 值范围设置为-20%*w至+20%*w,且步长设置为2;为了避免图像像素语义上的歧义问题所导致的像素伪像问题,可以将t y固定为0;可以将缩放尺度s的取值范围设置为0.8至1.2,步长设置为0.05,即仿射变换后的实例为原实例的80%至120%;可以将旋转角度r的取值范围设置为-10度至+10度,且将步长设置为1度。根据上述仿射变换规则可以获取到多个仿射变换矩阵,这多个仿射变换矩阵可以形成仿射变换候选矩阵集合。 Taking the affine transformation matrix in the above form as an example, the following describes an exemplary implementation manner of acquiring multiple affine transformation matrices. In this implementation, the width w in the horizontal direction of the circumscribed matrix frame of the first instance can be obtained, and the value range of t x is set to -20%*w to +20%*w, and the step size is set to 2 ; In order to avoid the pixel artifact problem caused by the ambiguity of image pixel semantics, ty can be fixed to 0; the value range of the scaling scale s can be set to 0.8 to 1.2, and the step size is set to 0.05, that is, affine The transformed instance is 80% to 120% of the original instance; the value range of the rotation angle r can be set from -10 degrees to +10 degrees, and the step size can be set to 1 degree. According to the above-mentioned affine transformation rule, multiple affine transformation matrices can be obtained, and the multiple affine transformation matrices can form a set of affine transformation candidate matrices.
可以理解的是,t y固定为零仅是一种示例,t y固定为零与不固定为零相比,可以避免增强后的图像与增强前的图像之间的差异过大,从而可以避免使用增强后的图像对实例分割模型进行训练时影响训练后的实例分割模型的训练效果,即有助于提高实例分割模型的性能。 It can be understood that the fixation of ty to zero is only an example, and the fixed ty to zero, compared with not being fixed to zero, avoids that the difference between the enhanced image and the pre-enhanced image is too large, so that it can be avoided. Using the enhanced image to train the instance segmentation model affects the training effect of the trained instance segmentation model, that is, it helps to improve the performance of the instance segmentation model.
S430,在所述多个候选实例中选择最佳候选实例,所述最佳候选实例在所述多个候选实例中具有最大的灰度差异,所述多个候选实例中的每个候选实例的灰度差异是所述每个候选实例与所述每个候选实例在所述第一图像中的邻域之间的灰度差异。S430. Select the best candidate instance from the plurality of candidate instances, the best candidate instance has the largest grayscale difference among the plurality of candidate instances, and the value of each candidate instance in the plurality of candidate instances is The grayscale difference is the grayscale difference between the each candidate instance and the neighborhood of each candidate instance in the first image.
本实施例中,对第一实例进行多种仿射变换得到多个候选实例之后,应从这多个候选实例中选择第一实例在第一图像中的最佳候选实例。本实施例中,最佳候选实例是指多个候选实例中与自己在第一图像中的邻域之间的灰度差异最大的候选实例,即最佳候选实例与最佳变换实例在第一图像中的邻域之间的灰度差异,比这多个候选实例中其他任意一个候选实例与该任意一个候选实例在第一图像中的邻域之间的灰度差异都大。In this embodiment, after performing multiple affine transformations on the first instance to obtain multiple candidate instances, the best candidate instance of the first instance in the first image should be selected from the multiple candidate instances. In this embodiment, the best candidate instance refers to the candidate instance with the largest grayscale difference between the multiple candidate instances and its neighborhood in the first image, that is, the best candidate instance and the best transformed instance are in the first The gray level difference between the neighborhoods in the image is larger than the gray level difference between any other candidate instance in the multiple candidate instances and the neighborhood of the any one candidate instance in the first image.
为了从这多个候选实例中确定出最佳候选实例,可以先获取多个候选实例中每个候选实例与该候选实例在第一图像中的指定邻域之间的灰度差异,最终可以得到与这多个候选实例一一对应的多个灰度差异。In order to determine the best candidate instance from the multiple candidate instances, the grayscale difference between each candidate instance in the multiple candidate instances and the specified neighborhood of the candidate instance in the first image can be obtained first, and finally the obtained Multiple grayscale differences corresponding to the multiple candidate instances one-to-one.
例如,针对这多个候选实例中的每个候选实例,可以获取每个候选实例在第一图像中的外接矩形邻域,从而得到与这多个候选实例一一对应的多个邻域,这多个邻域可以称为轮廓邻域集合。For example, for each candidate instance in the multiple candidate instances, the circumscribed rectangle neighborhood of each candidate instance in the first image can be obtained, so as to obtain multiple neighborhoods corresponding to the multiple candidate instances one-to-one. Multiple neighborhoods can be referred to as contour neighborhood sets.
每个候选实例的外接矩形邻域可以理解为该候选实例的外接矩形框内除该变换后实例以外的像素点形成的邻域。如图9所示,第一图像中的第一实例为云朵,云朵轮廓的外接矩形框内斜线部分表示云朵这一第一实例的外接矩形邻域。第一图像中还可以包括其他内容,例如可以包含其他实例,图9中未示出。可以理解的是,将每个候选实例的外接矩形邻域作为该候选实例的领域仅是一种示例,本申请对候选实例的邻域形状并不限定,例如,本申请中候选实例的邻域也可以是该候选实例的外接圆邻域。The circumscribed rectangle neighborhood of each candidate instance can be understood as the neighborhood formed by pixels other than the transformed instance within the circumscribed rectangle of the candidate instance. As shown in FIG. 9 , the first instance in the first image is a cloud, and the oblique line portion in the circumscribed rectangular frame of the cloud outline represents the circumscribed rectangle neighborhood of the first instance of the cloud. The first image may also include other content, for example, may include other instances, which are not shown in FIG. 9 . It can be understood that the circumscribing rectangle neighborhood of each candidate instance is used as the field of the candidate instance is only an example, and this application does not limit the shape of the neighborhood of the candidate instance. For example, the neighborhood of the candidate instance in this application is not limited. It can also be the circumcircle neighborhood of the candidate instance.
在一种可能的实现方式中,针对轮廓邻域集合中的每个邻域,可以将该邻域对应的候选实例作为前景,将该邻域作为背景,计算该前景与背景的像素值方差,得到该邻域对应的方差,该方差可以作为该邻域与该候选实例的灰度差值。In a possible implementation manner, for each neighborhood in the contour neighborhood set, the candidate instance corresponding to the neighborhood can be regarded as the foreground, and the neighborhood can be regarded as the background, and the pixel value variance between the foreground and the background can be calculated, The variance corresponding to the neighborhood is obtained, and the variance can be used as the grayscale difference between the neighborhood and the candidate instance.
针对每个邻域,计算其对应的方差的一种示例性公式如下:For each neighborhood, an exemplary formula for calculating its corresponding variance is as follows:
g=w 0×(u 0-u) 2+w 1×(u 1-u) 2=w 0×w 1×(u 0-u 1) 2 g=w 0 ×(u 0 -u) 2 +w 1 ×(u 1 -u) 2 =w 0 ×w 1 ×(u 0 -u 1 ) 2
u=w 0×u 0+w 1×u 1 u=w 0 ×u 0 +w 1 ×u 1
其中,u 0表示前景的平均灰度,w 0表示前景的像素点数占前景和背景总像素点数之比,u 1表示背景的平均灰度,w 1表示背景的像素点数占所述总像素点数之比,u表示前景和背景的平均灰度,g表示前景和背景的方差。 Among them, u 0 represents the average gray level of the foreground, w 0 represents the ratio of the number of pixels in the foreground to the total number of pixels in the foreground and background, u 1 represents the average gray level of the background, and w 1 represents the number of pixels in the background accounting for the total number of pixels. The ratio of u represents the average gray level of the foreground and background, and g represents the variance of the foreground and background.
上述示例中,将前景和背景的像素值方差当作候选实例与候选实例的邻域的灰度差值,仅是一种示例。本实施例中,可以通过其他方式获取前景与背景之间的灰度差值,例 如,可以计算每个前景与其对应的背景的像素值的1范数或者无穷范数,并将前景与背景的像素值的1范数或者无穷范数当作前景与背景之间的灰度差值。In the above example, the variance of the pixel values of the foreground and the background is regarded as the grayscale difference between the candidate instance and the neighborhood of the candidate instance, which is only an example. In this embodiment, the grayscale difference value between the foreground and the background can be obtained in other ways. For example, the 1 norm or the infinite norm of the pixel value of each foreground and its corresponding background can be calculated, and the difference between the foreground and the background can be calculated. The 1 norm or infinity norm of the pixel value is regarded as the grayscale difference between the foreground and the background.
在前述步骤中,获取到与多个候选实例一一对应的多个灰度差异之后,可以从这多个灰度差异中确定出最大灰度差异,并可以将该最大灰度差异对应的候选实例确定为最佳候选实例,同时可以第一图像中与将该最大灰度差异对应的邻域确定为最佳邻域,该最佳邻域也可以称为目标邻域。本实施例中,与最大灰度差异对应的候选实例是指计算得到该最大灰度差异所依据的候选实例,与最大灰度差异对应的邻域是指计算该最大灰度差异所基于的邻域。In the foregoing steps, after obtaining multiple grayscale differences corresponding to multiple candidate instances one-to-one, the maximum grayscale difference can be determined from the multiple grayscale differences, and the candidate corresponding to the maximum grayscale difference can be determined. The instance is determined as the best candidate instance, and the neighborhood in the first image corresponding to the maximum grayscale difference can be determined as the optimal neighborhood, and the optimal neighborhood can also be called the target neighborhood. In this embodiment, the candidate instance corresponding to the maximum grayscale difference refers to the candidate instance on which the maximum grayscale difference is calculated, and the neighborhood corresponding to the maximum grayscale difference refers to the neighborhood based on which the maximum grayscale difference is calculated. area.
S440,将所述最佳候选实例添加至所述第一图像中,以得到所述第一图像进行增强后的第二图像。获取第一实例的最佳候选实例和最佳邻域之后,可以将该最佳候选实例添加至第一图像中,添加该最佳候选实例的位置应使得该最佳候选实例在第一图像中的邻域刚好是该最佳邻域。S440, adding the best candidate instance to the first image to obtain a second image enhanced by the first image. After obtaining the best candidate instance and the best neighborhood of the first instance, the best candidate instance can be added to the first image, and the position of adding the best candidate instance should make the best candidate instance in the first image. The neighborhood of is just the best neighborhood.
本实施例中,第一图像中第一实例所在位置处的图像内容的处理方式可以参考现有技术。例如,可以使用图像修复技术对第一图像中第一实例所在位置处的图像内容进行处理,以便于将该位置处的图像内容填充为第一图像中的实例的背景。In this embodiment, reference may be made to the prior art for the processing manner of the image content at the location of the first instance in the first image. For example, image inpainting technology may be used to process the image content at the location of the first instance in the first image, so as to fill the image content at the location as the background of the instance in the first image.
本实施例中,可以将第一图像中的一个或多个实例当作第一实例,使用图4所示的方法,获取这一个或多个实例中每个实例对应的最佳变换实例,并将最佳变换实例添加至第一图像中,最终得到第二图像。In this embodiment, one or more instances in the first image may be regarded as the first instance, and the method shown in FIG. 4 is used to obtain the best transformed instance corresponding to each of the one or more instances, and The best transform instance is added to the first image, resulting in the second image.
本实施例中,在获取第二图像的情况下,还可以获取第二图像的标注信息。例如,该最佳候选实例的实现形式为mask矩阵时,可以将该mask矩阵变换为坐标点集合的形式,该坐标点集合即可以作为该目标实例的轮廓信息。若将第二图像的标注信息称为第二标注信息,则第二标注信息中可以记录该最佳候选实例的轮廓信息。In this embodiment, when the second image is acquired, annotation information of the second image may also be acquired. For example, when the implementation form of the best candidate instance is a mask matrix, the mask matrix can be transformed into the form of a coordinate point set, and the coordinate point set can be used as the contour information of the target instance. If the annotation information of the second image is referred to as the second annotation information, the contour information of the best candidate instance may be recorded in the second annotation information.
使用本实施例的方法对实例分割模型的训练数据集进行处理之后,可以使用处理得到的训练数据集对实例分割模型进行训练。例如,图3中的数据处理设备370对数据库330中的第一图像进行第一图像处理之后,训练设备320可以使用处理得到的第二图像对实例分割模型进行训练,以得到目标模型301。After the training data set of the instance segmentation model is processed by using the method of this embodiment, the instance segmentation model can be trained by using the training data set obtained by processing. For example, after the data processing device 370 in FIG. 3 performs the first image processing on the first image in the database 330 , the training device 320 can use the processed second image to train the instance segmentation model to obtain the target model 301 .
进一步地,使用第二图像训练得到实例分割模型之后,可以使用该实例分割模型进行实例分割。例如,图3中的训练设备320使用第二图像训练得到实例分割模型之后,执行设备310可以基于该实例分割模型执行实例分割业务。以执行设备310包含图2所示的架构为例,执行设备310可以基于该实例分割模型实现主角锁定以及时空变换业务。Further, after the instance segmentation model is obtained by training using the second image, the instance segmentation model can be used to perform instance segmentation. For example, after the training device 320 in FIG. 3 uses the second image to train to obtain an instance segmentation model, the execution device 310 may execute an instance segmentation service based on the instance segmentation model. Taking the execution device 310 including the architecture shown in FIG. 2 as an example, the execution device 310 can implement the protagonist locking and spatiotemporal transformation services based on the instance segmentation model.
本实施例的方法,对第一图像进行增强时,从第一图像中的第一实例的多个候选实例中选择了与自己在第一图像中的指定邻域之间的灰度差值最大的最佳候选实例,并该最佳候选实例添加至第一图像。这使得第一图像进行数据增强得到的第二图像中,该最佳候选实例与其邻域之间的对比度较大,从而使得该最佳候选实例在第一图像中比较清晰。这种情况下,该最佳候选实例的轮廓信息可以认为是很合理的轮廓信息,基于该合理的轮廓信息来获取第二图像的第二标注信息,以及基于第二图像和该第二标注信息来训练实例分割模型,可以得到分割精度更准确的实例分割模型,或者说可以显著提高实例分割模型的分割精度。In the method of this embodiment, when the first image is enhanced, the largest grayscale difference value between the designated neighborhood in the first image and one's own designated neighborhood in the first image is selected from the multiple candidate instances of the first instance in the first image. The best candidate instance of , and the best candidate instance is added to the first image. This results in a larger contrast between the best candidate instance and its neighborhood in the second image obtained by performing data enhancement on the first image, thereby making the best candidate instance clearer in the first image. In this case, the contour information of the best candidate instance can be considered as very reasonable contour information, and the second annotation information of the second image is obtained based on the reasonable contour information, and the second annotation information is obtained based on the second image and the second annotation information. To train the instance segmentation model, an instance segmentation model with more accurate segmentation accuracy can be obtained, or the segmentation accuracy of the instance segmentation model can be significantly improved.
图5为本申请另一个实施例的数据增强方法的示例性流程图。如图5所示,该方法除 了可以包含S410至S440,还可以包含S450。S450,对所述第二图像内的第一感兴趣区域进行对比度增强处理,得到第三图像,所述第一感兴趣区域内包含所述第二图像内的第二实例和所述第二实例的邻域。FIG. 5 is an exemplary flowchart of a data enhancement method according to another embodiment of the present application. As shown in Fig. 5, the method may include S450 in addition to S410 to S440. S450: Perform contrast enhancement processing on the first region of interest in the second image to obtain a third image, where the first region of interest includes the second instance and the second instance in the second image the neighborhood.
本实施例中,第二图像内可以包含一个或多个感兴趣区域(region of interest,ROI),每个感兴趣区域内可以包含一个实例以及该实例的邻域。第二图像的感兴趣区域可以形成一个感兴趣区域集合。In this embodiment, the second image may include one or more regions of interest (region of interest, ROI), and each region of interest may include an instance and a neighborhood of the instance. The regions of interest of the second image may form a set of regions of interest.
为了描述方便,将第二图像内的感兴趣区域称为第一感兴趣区域,将第一感兴趣区域内的实例称为第二实例,将第二实例的邻域称为第二邻域。For convenience of description, the region of interest in the second image is referred to as the first region of interest, the instance in the first region of interest is referred to as the second instance, and the neighborhood of the second instance is referred to as the second neighborhood.
本实施例中,第二邻域可以包含第二实例的外接矩形邻域,或者说,第二实例的外接矩形框位于第二邻域中,或者说,第二邻域除了包含第二实例的外接矩形邻域内的像素点以外还包含第三实例的外接矩形邻域外的像素点。In this embodiment, the second neighborhood may include the circumscribed rectangle neighborhood of the second instance, or, in other words, the circumscribed rectangle of the second instance is located in the second neighborhood, or, in other words, the second neighborhood may not include the circumscribed rectangle of the second instance. In addition to the pixel points within the circumscribed rectangle neighborhood, the pixel points outside the circumscribed rectangle neighborhood of the third example are also included.
针对第二图像的感兴趣区域集合中的每一个第一感兴趣区域,作为一种实例,可以通过高斯低通滤波器获取第一感兴趣区域中的低频部分。例如,可以使用开源opencv算法中的“GaussianBlur()”函数来实现高斯低通滤波器。具体地,可以将第一感兴趣区域内的像素传入高斯低通滤波器,从而可以过滤掉每个像素的高频部分,得到每个像素的低频部分。For each first region of interest in the set of regions of interest in the second image, as an example, a low-frequency part in the first region of interest may be acquired through a Gaussian low-pass filter. For example, a Gaussian low-pass filter can be implemented using the "GaussianBlur()" function in the open-source opencv algorithm. Specifically, the pixels in the first region of interest can be passed to a Gaussian low-pass filter, so that the high-frequency part of each pixel can be filtered out, and the low-frequency part of each pixel can be obtained.
通过高斯低通滤波器获取到第一感兴趣区域中的低频部分之后,可以将第一感兴趣区域的原像素减去低频部分,从而获得第一感兴趣区域的高频部分。After the low-frequency part in the first region of interest is obtained through the Gaussian low-pass filter, the low-frequency part can be subtracted from the original pixels of the first region of interest, so as to obtain the high-frequency part of the first region of interest.
获取到第一感兴趣区域中的低频部分和高频部分之后,针对第一感兴趣区域内的每一个像素点,可以根据高频部分的增益值计算出高频部分的像素点增强后的像素值,根据低频部分的增益值计算出低频部分的像素点增强后的像素值,以得到对比度增强后的图像。After acquiring the low-frequency part and the high-frequency part in the first region of interest, for each pixel in the first region of interest, the enhanced pixel of the high-frequency part can be calculated according to the gain value of the high-frequency part. value, the pixel value of the pixel point after the enhancement of the low frequency part is calculated according to the gain value of the low frequency part, so as to obtain the image after contrast enhancement.
例如,遍历第一感兴趣区域内的每一个像素点,若该像素点为高频部分的像素点,可以认为该像素点与周围的像素点的像素值的均方差较大,此时,可以通过较小的增益值来减小该像素点的像素值,以缓解该像素点过亮的现象;若该像素点为低频部分的像素点,可以认为该像素点与周围的像素点的像素值的均方差较小,此时,可以通过较大的增益值放大高频部分,以使得该像素点周围的细节特征更加明显,从而缓解图像模糊问题。For example, traverse each pixel in the first region of interest, if the pixel is a pixel in the high-frequency part, it can be considered that the pixel value of the pixel and the surrounding pixels have a larger mean square error. At this time, you can The pixel value of the pixel is reduced by a smaller gain value to alleviate the phenomenon that the pixel is too bright; if the pixel is a pixel in the low frequency part, it can be considered that the pixel value of the pixel and the surrounding pixels The mean square error of , is small. At this time, the high-frequency part can be amplified by a larger gain value to make the detailed features around the pixel more obvious, thereby alleviating the problem of image blurring.
在一种示例中,高频部分和低频部分的增益值可以根据需要预设设置,例如,可以将高频部分的增益值设置为0.5,将低频部分的增益值设置为2。根据高频部分的增益值计算高频部分的像素点增强后的像素值或根据低频部分的增益值计算低频部分的像素点增强后的像素值的一种实现方式为:将该增益值与该像素点的像素值的乘积作为该像素点增强后的像素值。In an example, the gain values of the high frequency part and the low frequency part can be preset as required. For example, the gain value of the high frequency part can be set to 0.5, and the gain value of the low frequency part can be set to 2. An implementation manner of calculating the enhanced pixel value of the high-frequency part of the pixels according to the gain value of the high-frequency part or calculating the enhanced pixel value of the low-frequency part according to the gain value of the low-frequency part is: combining the gain value with the The product of the pixel values of the pixel points is used as the pixel value after the pixel point is enhanced.
本实施例中,针对第二图像中每个感兴趣区域进行上述自适应对比度增强处理之后,可以得到经过局部自适应对比度增强的第三图像。使用图5所示的方法对每个第二图像均进行处理之后,可以得到经过图像对比度增强的训练数据集。In this embodiment, after the above-mentioned adaptive contrast enhancement processing is performed on each region of interest in the second image, a third image that has undergone local adaptive contrast enhancement can be obtained. After each second image is processed using the method shown in FIG. 5 , a training data set with enhanced image contrast can be obtained.
进一步地,本实施例的方法中,还可以包含:使用该训练数据集对实例分割模型进行训练。例如,图3中的数据处理设备370对数据库330中的第二图像进行第二图像处理之后,训练设备320可以使用处理得到的第三图像对实例分割模型进行训练,以得到目标模型301。Further, in the method of this embodiment, the method may further include: using the training data set to train the instance segmentation model. For example, after the data processing device 370 in FIG. 3 performs second image processing on the second image in the database 330 , the training device 320 can use the processed third image to train the instance segmentation model to obtain the target model 301 .
更进一步地,本实施例的方法中,还可以包含:使用训练得到的实例分割模型进行实 例分割。例如,图3中的训练设备320使用第三图像训练得到实例分割模型之后,执行设备310可以基于该实例分割模型执行实例分割业务。以执行设备310包含图2所示的架构为例,执行设备310可以基于该实例分割模型实现主角锁定以及时空变换业务。Further, in the method of this embodiment, the method may further include: using the instance segmentation model obtained by training to perform instance segmentation. For example, after the training device 320 in FIG. 3 uses the third image to train to obtain an instance segmentation model, the execution device 310 may execute an instance segmentation service based on the instance segmentation model. Taking the execution device 310 including the architecture shown in FIG. 2 as an example, the execution device 310 can implement the protagonist locking and spatiotemporal transformation services based on the instance segmentation model.
使用本实施例中经过图像对比度增强的训练数据集训练得到的实例分割模型,具有较高的鲁棒性,对于噪声场景中的图像中的噪声有较强的抗干扰能力以及有较高的容忍度。The instance segmentation model trained by using the training data set with image contrast enhancement in this embodiment has high robustness, and has strong anti-interference ability and high tolerance for noise in images in noisy scenes Spend.
本申请的一个实施例中,可以使用图4或图5所示的数据增强方法对原始训练数据集中的图像进行处理,得到处理后的训练数据集之后,在使用处理得到的训练数据集来训练实例分割模型。这个实施例中对原始训练数据集的处理可以称为离线数据增强。In an embodiment of the present application, the data enhancement method shown in FIG. 4 or FIG. 5 can be used to process the images in the original training data set, and after obtaining the processed training data set, the training data set obtained by processing can be used for training Instance segmentation model. The processing of the original training dataset in this embodiment may be referred to as offline data augmentation.
本申请的另一个实施例中,可以在使用原始训练数据集对实例分割模型进行训练的同时,使用图4或图5的方法对原始训练数据集进行图像处理。在使用原始训练数据集对实例分割模型进行训练完毕以及使用图4或图5的方法对原始训练数据集进行图像处理完毕之后,再使用图像处理得到的训练数据集对实例分割模型进行训练。本实施例的方法可以称为在线数据增强训练方法。In another embodiment of the present application, while using the original training data set to train the instance segmentation model, the method of FIG. 4 or FIG. 5 may be used to perform image processing on the original training data set. After the instance segmentation model is trained using the original training data set and the image processing is completed on the original training data set using the method in FIG. 4 or FIG. 5 , the instance segmentation model is trained using the training data set obtained by image processing. The method of this embodiment may be referred to as an online data augmentation training method.
其中,由于实例分割模型的一个训练迭代时长通常远远高于图像处理的时长,因此,本实施例的方法不会增加额外的耗时,而且可以实时得到最新的增强后的训练数据集,从而可以进一步提高实例分割模型的精确度。例如,不论实例分割模型的原始训练数据集如何变化,本实施例的方法可以基于该原始训练数据集得到最新的增强后的训练数据集,从而可以得到性能更好的实例分割模型。Among them, since the duration of one training iteration of the instance segmentation model is usually much longer than the duration of image processing, the method of this embodiment does not increase additional time-consuming, and the latest enhanced training data set can be obtained in real time, thereby The accuracy of the instance segmentation model can be further improved. For example, no matter how the original training data set of the instance segmentation model changes, the method of this embodiment can obtain the latest enhanced training data set based on the original training data set, so that an instance segmentation model with better performance can be obtained.
下面以基于TensorFlow开源框架来实现图4或图5的数据增强方法为例,介绍在线数据增强训练方法的一种示例性实现方式。An exemplary implementation of the online data augmentation training method is described below by taking the implementation of the data augmentation method in Fig. 4 or Fig. 5 based on the TensorFlow open source framework as an example.
在现有的开源训练框架中,图像处理的操作均在data_generator对象的构造函数中完成。使用本实施例的方法时,首先读取原始训练数据集,使用opencv库中的“imread”函数读取原始训练数据集中的第一图像和第一标注信息,以及使用coco数据集的“loadAnns”函数读取第一标注信息,在data_generator对象的构造函数中实现图4或图5的数据增强方法,即可输出一个data_generator对象,上述操作由图像处理线程执行;然后,将data_generator对象作为参数传输给Tensorflow模型的训练线程,其中,训练线程与图像处理线程是并行执行,且图像处理线程会独立执行,输出处理后的第二图像或第三图像,以及输出第二标注信息到公共存储区域,训练进程在每次训练迭代之前去公共存储区域读取处理后的第二图像或第三图像以及输出第二标注信息,以实现实例分割模型的训练。In the existing open source training framework, the image processing operations are completed in the constructor of the data_generator object. When using the method of this embodiment, first read the original training data set, use the "imread" function in the opencv library to read the first image and the first annotation information in the original training data set, and use the "loadAnns" of the coco data set The function reads the first annotation information, and implements the data enhancement method in Figure 4 or Figure 5 in the constructor of the data_generator object to output a data_generator object, and the above operations are performed by the image processing thread; then, the data_generator object is transmitted as a parameter to The training thread of the Tensorflow model, where the training thread and the image processing thread are executed in parallel, and the image processing thread will execute independently, output the processed second image or third image, and output the second annotation information to the public storage area, training Before each training iteration, the process goes to the public storage area to read the processed second image or the third image and output the second annotation information, so as to realize the training of the instance segmentation model.
图6是本申请一个实施例的实例分割模型的数据增强装置600的示意性结构图。装置600可以是图3所示系统架构中的数据处理设备370的一种示例。装置600可以包括获取模块610和处理模块620,可选地,还可以包括训练模块。装置600可以用于实现前述任意一个实施例中的实例分割模型的数据增强方法,例如可以用于实现图4或图5所示的方法。例如,获取模块610可以用于执行S410,处理模块620可以用于执行S420至S440。可选地,处理模块620还可以用于执行S450。FIG. 6 is a schematic structural diagram of a data enhancement apparatus 600 for an instance segmentation model according to an embodiment of the present application. The apparatus 600 may be an example of the data processing device 370 in the system architecture shown in FIG. 3 . The apparatus 600 may include an acquisition module 610 and a processing module 620, and optionally, may also include a training module. The apparatus 600 may be used to implement the data augmentation method of the instance segmentation model in any of the foregoing embodiments, for example, may be used to implement the method shown in FIG. 4 or FIG. 5 . For example, the acquiring module 610 may be used to perform S410, and the processing module 620 may be used to perform S420 to S440. Optionally, the processing module 620 may also be configured to perform S450.
本申请一个实施例的实例分割模型的训练装置的示意性结构与包含训练模块的装置600的结构相似,此处不再赘述。该训练装置可以用于执行前述在线数据增强训练方法。The schematic structure of the apparatus for training an instance segmentation model according to an embodiment of the present application is similar to the structure of the apparatus 600 including a training module, and details are not repeated here. The training device can be used to perform the aforementioned online data augmentation training method.
图7为本申请一个实施例的装置700的示意性结构图。装置700包括处理器702、通信接口703和存储器704。FIG. 7 is a schematic structural diagram of an apparatus 700 according to an embodiment of the present application. The apparatus 700 includes a processor 702 , a communication interface 703 and a memory 704 .
装置700可以是芯片,也可以是计算设备。例如,装置700可以是图3所示的系统架构中的数据处理设备370或者可以是能够应用于数据处理设备370的芯片的一种示例。又如,装置700可以是图3所示的系统架构中的训练设备320或者可以是能够应用于训练设备320的芯片的一种示例。The apparatus 700 may be a chip or a computing device. For example, the apparatus 700 may be the data processing device 370 in the system architecture shown in FIG. 3 or may be an example of a chip that can be applied to the data processing device 370 . For another example, the apparatus 700 may be the training device 320 in the system architecture shown in FIG. 3 or may be an example of a chip that can be applied to the training device 320 .
处理器702、存储器704和通信接口703之间可以通过总线通信。存储器704中存储有可执行代码,处理器702读取存储器704中的可执行代码以执行对应的方法。存储器704中还可以包括操作系统等其他运行进程所需的软件模块。操作系统可以为LINUX TM,UNIX TM,WINDOWS TM等。 The processor 702, the memory 704 and the communication interface 703 can communicate through a bus. Executable code is stored in the memory 704, and the processor 702 reads the executable code in the memory 704 to execute the corresponding method. The memory 704 may also include other software modules required for running processes such as an operating system. The operating system can be LINUX , UNIX , WINDOWS and the like.
例如,存储器704中的可执行代码用于实现前述任意一个实施例所述的方法(如图4或图5所示的方法),处理器702读取存储器704中的该可执行代码以执行前述任意一个实施例所述的方法(如图4或图5所示的方法)。For example, the executable code in the memory 704 is used to implement the method described in any one of the foregoing embodiments (the method shown in FIG. 4 or FIG. 5 ), and the processor 702 reads the executable code in the memory 704 to execute the foregoing method The method described in any one of the embodiments (the method shown in FIG. 4 or FIG. 5 ).
其中,处理器702可以包括CPU。存储器704可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器704还可以包括非易失性存储器(2non-volatile memory,2NVM),例如只读存储器(2read-only memory,2ROM),快闪存储器,硬盘驱动器(hard disk drive,HDD)或固态启动器(solid state disk,SSD)。Among them, the processor 702 may include a CPU. Memory 704 may include volatile memory, such as random access memory (RAM). The memory 704 may also include non-volatile memory (2non-volatile memory, 2NVM), such as 2read-only memory (2ROM), flash memory, hard disk drive (HDD) or solid state drive ( solid state disk, SSD).
在本申请的一些实施例中,所公开的方法可以实施为以机器可读格式被编码在计算机可读存储介质上的或者被编码在其它非瞬时性介质或者制品上的计算机程序指令。图8示意性地示出根据上述任意一个实施例而布置的示例计算机程序产品的概念性局部视图,所述示例计算机程序产品包括用于在计算设备上执行计算机进程的计算机程序。在一个实施例中,示例计算机程序产品800是使用信号承载介质801来提供的。所述信号承载介质801可以包括一个或多个程序指令802,其当被一个或多个处理器运行时可以提供以上任意一个实施例所述的方法中描述的功能或者部分功能。因此,例如,图5中所示的实施例,S410至S430的一个或多个特征可以由与信号承载介质801相关联的一个或多个指令来承担。In some embodiments of the present application, the disclosed methods may be implemented as computer program instructions encoded on a computer-readable storage medium in a machine-readable format or on other non-transitory media or articles of manufacture. Figure 8 schematically illustrates a conceptual partial view of an example computer program product comprising a computer program for executing a computer process on a computing device, arranged in accordance with any of the above-described embodiments. In one embodiment, example computer program product 800 is provided using signal bearing medium 801 . The signal bearing medium 801 may include one or more program instructions 802 which, when executed by one or more processors, may provide the functions or part of the functions described in the methods described in any of the above embodiments. Thus, for example, in the embodiment shown in FIG. 5 , one or more of the features of S410 to S430 may be undertaken by one or more instructions associated with the signal bearing medium 801 .
在一些示例中,信号承载介质801可以包含计算机可读介质803,诸如但不限于,硬盘驱动器、紧密盘(CD)、数字视频光盘(DVD)、数字磁带、存储器、只读存储记忆体(read-only memory,ROM)或随机存储记忆体(random access memory,RAM)等等。在一些实施方式中,信号承载介质801可以包含计算机可记录介质804,诸如但不限于,存储器、读/写(R/W)CD、R/W DVD、等等。在一些实施方式中,信号承载介质801可以包含通信介质805,诸如但不限于,数字和/或模拟通信介质(例如,光纤电缆、波导、有线通信链路、无线通信链路、等等)。因此,例如,信号承载介质801可以由无线形式的通信介质805(例如,遵守IEEE 802.11标准或者其它传输协议的无线通信介质)来传达。一个或多个程序指令802可以是,例如,计算机可执行指令或者逻辑实施指令。在一些示例中,前述的计算设备可以被配置为,响应于通过计算机可读介质803、计算机可记录介质804、和/或通信介质805中的一个或多个传达到计算设备的程序指令802,提供各种操作、功能、或者动作。应该理解,这里描述的布置仅仅是用于示例的目的。因而,本领域技术人员将理解,其它布置和其它元素(例如,机器、接口、功能、顺序、和功能组等等)能够被取而代之地使用,并且一些元素可以根据所期望的结果而一并省略。另外,所描述的元素中的许多是可以被实现为离散的或者分布式的组件的、或者以任何适当的组合和位置来结合 其它组件实施的功能词条。In some examples, the signal bearing medium 801 may include a computer readable medium 803 such as, but not limited to, a hard drive, a compact disc (CD), a digital video disc (DVD), a digital tape, a memory, a read only memory (read only memory) -only memory, ROM) or random access memory (RAM), etc. In some implementations, the signal bearing medium 801 may include a computer recordable medium 804 such as, but not limited to, memory, read/write (R/W) CD, R/W DVD, and the like. In some embodiments, signal bearing medium 801 may include communication medium 805, such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.). Thus, for example, signal bearing medium 801 may be conveyed by a wireless form of communication medium 805 (eg, a wireless communication medium that conforms to the IEEE 802.11 standard or other transmission protocol). The one or more program instructions 802 may be, for example, computer-executable instructions or logic-implemented instructions. In some examples, the aforementioned computing devices may be configured to, in response to program instructions 802 communicated to the computing device via one or more of computer-readable media 803 , computer-recordable media 804 , and/or communication media 805 , Provides various operations, functions, or actions. It should be understood that the arrangements described herein are for illustrative purposes only. Thus, those skilled in the art will understand that other arrangements and other elements (eg, machines, interfaces, functions, sequences, and groups of functions, etc.) can be used instead and that some elements may be omitted altogether depending on the desired results . Additionally, many of the described elements are functional terms that can be implemented as discrete or distributed components, or in conjunction with other components in any suitable combination and position.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: a U disk, a removable hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk and other media that can store program codes.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims (16)

  1. 一种实例分割模型的数据增强方法,其特征在于,包括:A data enhancement method for an instance segmentation model, comprising:
    获取第一图像;get the first image;
    对所述第一图像中的第一实例做多种仿射变换以得到多个候选实例,所述多个候选实例与所述多种仿射变换一一对应;Perform multiple affine transformations on the first instance in the first image to obtain multiple candidate instances, and the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations;
    在所述多个候选实例中选择最佳候选实例,所述最佳候选实例在所述多个候选实例中具有最大的灰度差异,所述多个候选实例中的每个候选实例的灰度差异是所述每个候选实例与所述每个候选实例在所述第一图像中的邻域之间的灰度差异;selecting the best candidate instance among the plurality of candidate instances, the best candidate instance having the largest grayscale difference among the plurality of candidate instances, the grayscale of each candidate instance in the plurality of candidate instances the difference is the grayscale difference between each candidate instance and the neighborhood of each candidate instance in the first image;
    将所述最佳候选实例添加至所述第一图像中,以得到对所述第一图像增强后的第二图像。The best candidate instance is added to the first image to obtain a second image enhanced from the first image.
  2. 根据权利要求1所述的方法,其特征在于,所述每个候选实例与所述每个候选实例的邻域之间的灰度差异是根据所述每个候选实例与所述每个候选实例的邻域的像素值方差确定的,其中,所述每个候选实例与所述每个候选实例的邻域的像素值方差的计算方式如下:The method according to claim 1, wherein the grayscale difference between the each candidate instance and the neighborhood of each candidate instance is based on the difference between the each candidate instance and the each candidate instance. The pixel value variance of the neighborhood is determined, wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:
    g=w 0×(u 0-u) 2+w 1×(u 1-u) 2=w 0×w 1×(u 0-u 1) 2 g=w 0 ×(u 0 -u) 2 +w 1 ×(u 1 -u) 2 =w 0 ×w 1 ×(u 0 -u 1 ) 2
    u=w 0×u 0+w 1×u 1 u=w 0 ×u 0 +w 1 ×u 1
    其中,u 0表示所述每个候选实例的平均灰度,w 0表示所述每个候选实例的像素点数占所述所述每个候选实例与所述每个候选实例的邻域的总像素点数之比,u 1表示所述每个候选实例的邻域的平均灰度,w 1表示所述每个候选实例的邻域的像素点数占所述总像素点数之比,u表示所述每个候选实例和所述每个候选实例的邻域的平均灰度,g表示所述像素值方差。 Wherein, u 0 represents the average gray level of each candidate instance, and w 0 represents that the number of pixels of each candidate instance accounts for the total pixels in the neighborhood of each candidate instance and each candidate instance The ratio of points, u 1 represents the average gray level of the neighborhood of each candidate instance, w 1 represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels, and u represents the number of pixels in the neighborhood of each candidate instance. The average gray level of each candidate instance and the neighborhood of each candidate instance, and g represents the pixel value variance.
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:The method according to claim 1 or 2, wherein the method further comprises:
    对所述第二图像内的第一感兴趣区域进行对比度增强处理,得到第三图像,所述第一感兴趣区域内包含所述第二图像内的第二实例和所述第二实例的邻域。Performing contrast enhancement processing on the first region of interest in the second image to obtain a third image, where the first region of interest includes the second instance in the second image and the neighbors of the second instance area.
  4. 一种实例分割模型的训练方法,其特征在于,包括如权利要求1至3中任一项所述的方法,其中,在执行所述方法的同时,还包括:利用多个图像对实例分割模型进行训练,所述多个图像包括所述第一图像。A method for training an instance segmentation model, comprising the method according to any one of claims 1 to 3, wherein, while executing the method, the method further comprises: using a plurality of images to pair the instance segmentation model For training, the plurality of images includes the first image.
  5. 根据权利要求4所述的训练方法,其特征在于,所述训练方法还包括:The training method according to claim 4, wherein the training method further comprises:
    利用所述第二图像对所述实例分割模型进行训练。The instance segmentation model is trained using the second image.
  6. 一种实例分割模型的数据增强装置,其特征在于,包括:A data enhancement device for an instance segmentation model, comprising:
    获取模块,用于获取第一图像;an acquisition module for acquiring the first image;
    处理模块,用于:对第一图像中的第一实例做多种仿射变换以得到多个候选实例,所述多个候选实例与所述多种仿射变换一一对应;在所述多个候选实例中选择最佳候选实例,所述最佳候选实例在所述多个候选实例中具有最大的灰度差异,所述多个候选实例中的每个候选实例的灰度差异是所述每个候选实例与所述每个候选实例在所述第一图像中的邻域之间的灰度差异;将所述最佳候选实例添加至所述第一图像中,得到所述第一图像进行增强后的第二图像。a processing module, configured to: perform multiple affine transformations on the first instance in the first image to obtain multiple candidate instances, and the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations; The best candidate instance is selected from the candidate instances, the best candidate instance has the largest grayscale difference among the plurality of candidate instances, and the grayscale difference of each candidate instance in the plurality of candidate instances is the The grayscale difference between each candidate instance and the neighborhood of each candidate instance in the first image; adding the best candidate instance to the first image to obtain the first image The second image after enhancement is performed.
  7. 根据权利要求6所述的装置,其特征在于,所述每个候选实例与所述每个候选实例的邻域之间的灰度差异是根据所述每个候选实例与所述每个候选实例的邻域的像素值方差确定的,其中,所述每个候选实例与所述每个候选实例的邻域的像素值方差的计算方式如下:The device according to claim 6, wherein the grayscale difference between the each candidate instance and the neighborhood of each candidate instance is based on the difference between the each candidate instance and the each candidate instance. The pixel value variance of the neighborhood is determined, wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:
    g=w 0×(u 0-u) 2+w 1×(u 1-u) 2=w 0×w 1×(u 0-u 1) 2 g=w 0 ×(u 0 -u) 2 +w 1 ×(u 1 -u) 2 =w 0 ×w 1 ×(u 0 -u 1 ) 2
    u=w 0×u 0+w 1×u 1 u=w 0 ×u 0 +w 1 ×u 1
    其中,u 0表示所述每个候选实例的平均灰度,w 0表示所述每个候选实例的像素点数占所述每个候选实例与所述每个候选实例的邻域的总像素点数之比,u 1表示所述每个候选实例的邻域的平均灰度,w 1表示所述每个候选实例的邻域的像素点数占所述总像素点数之比,u表示所述每个候选实例和所述每个候选实例的邻域的平均灰度,g表示所述像素值方差。 Wherein, u 0 represents the average gray level of each candidate instance, and w 0 represents that the number of pixels of each candidate instance accounts for the total number of pixels in the neighborhood of each candidate instance and each candidate instance ratio, u 1 represents the average gray level of the neighborhood of each candidate instance, w 1 represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels, u represents the each candidate instance The average gray level of the instance and the neighborhood of each candidate instance, and g represents the pixel value variance.
  8. 根据权利要求6或7所述的装置,其特征在于,所述处理模块还用于:The device according to claim 6 or 7, wherein the processing module is further configured to:
    对所述第二图像内的第一感兴趣区域进行对比度增强处理,得到第三图像,所述第一感兴趣区域内包含所述第二图像内的第二实例和所述第二实例的邻域。Performing contrast enhancement processing on the first region of interest in the second image to obtain a third image, where the first region of interest includes the second instance in the second image and the neighbors of the second instance area.
  9. 一种实例分割模型的训练装置,其特征在于,包括如权利要求6至8中任一项所述装置和训练模块,其中,在所述装置实现所述装置的功能的同时,所述训练模块用于利用多个图像对实例分割模型进行训练,所述多个图像包括所述第一图像。A training device for instance segmentation model, characterized in that it comprises the device according to any one of claims 6 to 8 and a training module, wherein, while the device implements the function of the device, the training module for training an instance segmentation model using a plurality of images, the plurality of images including the first image.
  10. 根据权利要求9所述的训练装置,其特征在于,所述训练模块还用于根据所述第二图像对所述实例分割模型进行训练。The training device according to claim 9, wherein the training module is further configured to train the instance segmentation model according to the second image.
  11. 一种实例分割模型的数据增强装置,其特征在于,包括:处理器,所述处理器与存储器耦合;A data enhancement device for instance segmentation model, characterized by comprising: a processor, wherein the processor is coupled to a memory;
    所述存储器用于存储指令;the memory is used to store instructions;
    所述处理器用于执行所述存储器中存储的指令,以使得所述装置实现如权利要求1至3中任一项所述的方法。The processor is adapted to execute instructions stored in the memory to cause the apparatus to implement the method of any one of claims 1 to 3.
  12. 一种实例分割模型的训练装置,其特征在于,包括:处理器,所述处理器与存储器耦合;An apparatus for training an instance segmentation model, comprising: a processor, wherein the processor is coupled to a memory;
    所述存储器用于存储指令;the memory is used to store instructions;
    所述处理器用于执行所述存储器中存储的指令,以使得所述装置实现如权利要求4或5所述的方法。The processor is adapted to execute instructions stored in the memory to cause the apparatus to implement the method of claim 4 or 5.
  13. 一种计算机可读介质,其特征在于,包括指令,当所述指令在处理器上运行时,使得所述处理器实现如权利要求1至3中任一项所述的方法。A computer-readable medium comprising instructions which, when executed on a processor, cause the processor to implement the method of any one of claims 1 to 3.
  14. 一种计算机可读介质,其特征在于,包括指令,当所述指令在处理器上运行时,使得所述处理器实现如权利要求4或5所述的方法。A computer-readable medium comprising instructions which, when executed on a processor, cause the processor to implement the method of claim 4 or 5.
  15. 一种计算机程序产品,其特征在于,包括指令,当所述计算机程序产品在计算机上运行时,使得所述计算机实现如权利要求1至3中任一项所述的方法。A computer program product comprising instructions, when the computer program product is run on a computer, causing the computer to implement the method of any one of claims 1 to 3.
  16. 一种计算机程序产品,其特征在于,包括指令,当所述计算机程序产品在计算机上运行时,使得所述计算机实现如权利要求4或5所述的方法。A computer program product comprising instructions, when the computer program product is run on a computer, causing the computer to implement the method of claim 4 or 5.
PCT/CN2020/106112 2020-07-31 2020-07-31 Data enhancement method and training method for instance segmentation model, and related apparatus WO2022021287A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080006082.4A CN114375460A (en) 2020-07-31 2020-07-31 Data enhancement method and training method of instance segmentation model and related device
PCT/CN2020/106112 WO2022021287A1 (en) 2020-07-31 2020-07-31 Data enhancement method and training method for instance segmentation model, and related apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/106112 WO2022021287A1 (en) 2020-07-31 2020-07-31 Data enhancement method and training method for instance segmentation model, and related apparatus

Publications (1)

Publication Number Publication Date
WO2022021287A1 true WO2022021287A1 (en) 2022-02-03

Family

ID=80037267

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/106112 WO2022021287A1 (en) 2020-07-31 2020-07-31 Data enhancement method and training method for instance segmentation model, and related apparatus

Country Status (2)

Country Link
CN (1) CN114375460A (en)
WO (1) WO2022021287A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091874A (en) * 2023-04-10 2023-05-09 成都数之联科技股份有限公司 Image verification method, training method, device, medium, equipment and program product
CN116596928A (en) * 2023-07-18 2023-08-15 山东金胜粮油食品有限公司 Quick peanut oil impurity detection method based on image characteristics

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583509A (en) * 2018-12-12 2019-04-05 南京旷云科技有限公司 Data creation method, device and electronic equipment
US20190171903A1 (en) * 2017-12-03 2019-06-06 Facebook, Inc. Optimizations for Dynamic Object Instance Detection, Segmentation, and Structure Mapping
CN110910334A (en) * 2018-09-15 2020-03-24 北京市商汤科技开发有限公司 Instance segmentation method, image processing device and computer readable storage medium
CN111091167A (en) * 2020-03-25 2020-05-01 同盾控股有限公司 Mark recognition training data synthesis method and device, electronic equipment and storage medium
CN111415364A (en) * 2020-03-29 2020-07-14 中国科学院空天信息创新研究院 Method, system and storage medium for converting image segmentation samples in computer vision

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190171903A1 (en) * 2017-12-03 2019-06-06 Facebook, Inc. Optimizations for Dynamic Object Instance Detection, Segmentation, and Structure Mapping
CN110910334A (en) * 2018-09-15 2020-03-24 北京市商汤科技开发有限公司 Instance segmentation method, image processing device and computer readable storage medium
CN109583509A (en) * 2018-12-12 2019-04-05 南京旷云科技有限公司 Data creation method, device and electronic equipment
CN111091167A (en) * 2020-03-25 2020-05-01 同盾控股有限公司 Mark recognition training data synthesis method and device, electronic equipment and storage medium
CN111415364A (en) * 2020-03-29 2020-07-14 中国科学院空天信息创新研究院 Method, system and storage medium for converting image segmentation samples in computer vision

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Master Thesis", 28 May 2019, SHANGHAI JIAO TONG UNIVERSITY, CN, article XU WENQIANG: "Individual-level Instance Segmentation", pages: 1 - 89, XP055891001, DOI: 10.27307/d.cnki.gsjtu.2019.001681 *
JO HYUNJUN, KIM DAWIT, SONG JAE-BOK: "Automatic Dataset Generation of Object Detection and Instance Segmentation using Mask R-CNN", JOURNAL OF KOREA ROBOTICS SOCIETY, vol. 14, no. 1, 30 March 2019 (2019-03-30), pages 31 - 39, XP055890991, ISSN: 1975-6291, DOI: 10.7746/jkros.2019.14.1.031 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091874A (en) * 2023-04-10 2023-05-09 成都数之联科技股份有限公司 Image verification method, training method, device, medium, equipment and program product
CN116091874B (en) * 2023-04-10 2023-07-18 成都数之联科技股份有限公司 Image verification method, training method, device, medium, equipment and program product
CN116596928A (en) * 2023-07-18 2023-08-15 山东金胜粮油食品有限公司 Quick peanut oil impurity detection method based on image characteristics
CN116596928B (en) * 2023-07-18 2023-10-03 山东金胜粮油食品有限公司 Quick peanut oil impurity detection method based on image characteristics

Also Published As

Publication number Publication date
CN114375460A (en) 2022-04-19

Similar Documents

Publication Publication Date Title
US10373380B2 (en) 3-dimensional scene analysis for augmented reality operations
CN108701376B (en) Recognition-based object segmentation of three-dimensional images
Xiao et al. Fast image dehazing using guided joint bilateral filter
JP7446457B2 (en) Image optimization method and device, computer storage medium, computer program, and electronic equipment
CN115699114B (en) Method and apparatus for image augmentation for analysis
CN112308095A (en) Picture preprocessing and model training method and device, server and storage medium
US9639943B1 (en) Scanning of a handheld object for 3-dimensional reconstruction
US20190206117A1 (en) Image processing method, intelligent terminal, and storage device
CN111402170A (en) Image enhancement method, device, terminal and computer readable storage medium
WO2022021287A1 (en) Data enhancement method and training method for instance segmentation model, and related apparatus
CN114511041B (en) Model training method, image processing method, device, equipment and storage medium
CN111382647B (en) Picture processing method, device, equipment and storage medium
CN114049499A (en) Target object detection method, apparatus and storage medium for continuous contour
CN112562056A (en) Control method, device, medium and equipment for virtual light in virtual studio
US20140198177A1 (en) Realtime photo retouching of live video
WO2019200785A1 (en) Fast hand tracking method, device, terminal, and storage medium
Ahn et al. Implement of an automated unmanned recording system for tracking objects on mobile phones by image processing method
CN117011137A (en) Image stitching method, device and equipment based on RGB similarity feature matching
US20230131418A1 (en) Two-dimensional (2d) feature database generation
Liu et al. Fog effect for photography using stereo vision
US11182634B2 (en) Systems and methods for modifying labeled content
KR20230162010A (en) Real-time machine learning-based privacy filter to remove reflective features from images and videos
CN114612976A (en) Key point detection method and device, computer readable medium and electronic equipment
Kim et al. Real-time human segmentation from RGB-D video sequence based on adaptive geodesic distance computation
CN113095176A (en) Method and device for background reduction of video data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20947163

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20947163

Country of ref document: EP

Kind code of ref document: A1