WO2022021287A1

WO2022021287A1 - Data enhancement method and training method for instance segmentation model, and related apparatus

Info

Publication number: WO2022021287A1
Application number: PCT/CN2020/106112
Authority: WO
Inventors: 张昕; 胡杰
Original assignee: 华为技术有限公司
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2022-02-03
Also published as: CN114375460A

Abstract

Provided in the present application are a data enhancement method and a training method for an instance segmentation model and a related apparatus in the field of computer vision. In the technical solution provided in the present application, the data enhancement method comprises: performing multiple affine transformations on a first instance in a first image, and selecting the best candidate instance from multiple candidate instances, wherein the best candidate instance has the maximum grayscale difference, and the grayscale difference of each of the multiple candidate instances is the grayscale difference between each candidate instance and the neighborhood of each candidate instance in the first image; and adding the best candidate instance to the first image to obtain an image after the first image is enhanced. In the technical solution of the present application, the grayscale difference between a candidate instance added to a first image and the neighborhood of the candidate instance is the greatest, and therefore, the contrast between the candidate instance and the background thereof is clearer. Re-training an instance segmentation model on the basis of the enhanced image can improve the segmentation accuracy of the instance segmentation model.

Description

Data augmentation method, training method and related device for instance segmentation model

technical field

The present application relates to the field of computer vision, and, more particularly, to data augmentation methods, training methods and related apparatuses for instance segmentation models.

Background technique

The task of instance segmentation is the current research direction of deep learning in the field of computer vision. mask to display the segmented examples.

Instance segmentation has been widely used in various tasks in the field of computer vision, such as automatic driving, robot control and other tasks. However, in the application of instance segmentation, it is found that the current instance segmentation methods always have the problem of poor instance segmentation accuracy.

In order to solve the above problems, those skilled in the art propose a solution, that is, construct new training samples by using the existing dataset of instance segmentation model, so as to expand the existing dataset quantitatively and qualitatively, and combine the new dataset with the new dataset. For instance segmentation model training.

Specifically, in the method proposed by those skilled in the art, the contour information of the instance in the image of the original data set is obtained, and the instance is obtained by cutting out the image based on the contour information; then the instance is pasted into the image with the The location of other regions with the highest pixel similarity of the instance, and the image inpainting technique is used to fill the background of the image content in the original location of the instance in the image, so as to obtain a new image, which can be used as an enhanced image to expand The original data set is used to obtain a new data set; finally, the new data set is used to train the instance segmentation model, which can improve the segmentation accuracy of the instance segmentation model.

After analysis, it is found that although the above method can improve the segmentation accuracy of the instance segmentation model to a certain extent, the improvement effect is limited, and in many scenarios, it still cannot meet the needs of instance segmentation accuracy for various tasks in the field of computer vision.

SUMMARY OF THE INVENTION

The present application provides a data enhancement method, a training method and a related device for an instance segmentation model, which can improve the segmentation accuracy of the instance segmentation model.

In a first aspect, the present application provides a data enhancement method for an instance segmentation model. The enhancement method includes: acquiring a first image; performing multiple affine transformations on the first instance in the first image to obtain multiple candidate instances, and the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations ; select the best candidate instance among the plurality of candidate instances, the best candidate instance has the largest grayscale difference among the plurality of candidate instances, the grayscale of each candidate instance in the plurality of candidate instances The degree difference is the grayscale difference between each candidate instance and its neighborhood in the first image; adding the best candidate instance to the first image to A second image after the enhancement of the first image is obtained.

In this method, when the first image is enhanced, the one with the largest grayscale difference from the designated neighborhood in the first image is selected from a plurality of candidate instances of the first instance in the first image. the best candidate instance and add the best candidate instance to the first image. This results in a larger contrast between the best candidate instance and its neighborhood in the second image obtained by performing data enhancement on the first image, thereby making the best candidate instance clearer in the first image. In this case, the contour information of the best candidate instance can be considered as very reasonable contour information, and the second annotation information of the second image is obtained based on the reasonable contour information, and the second annotation information is obtained based on the second image and the second annotation information. To train the instance segmentation model, an instance segmentation model with more accurate segmentation accuracy can be obtained, or the segmentation accuracy of the instance segmentation model can be significantly improved.

Wherein, optionally, the method may further include: acquiring first label information of the first image, where the first label information includes the first contour information of the first instance; and acquiring, according to the candidate instance corresponding to the maximum grayscale difference, Second label information of the second image, where the second label information includes second contour information of the candidate instance corresponding to the maximum grayscale difference.

In some possible implementations of the first aspect, the grayscale difference between the each candidate instance and the neighborhood of each candidate instance is based on the difference between the each candidate instance and each candidate instance. The pixel value variance of the neighborhood is determined, wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:

g=w ₀ ×(u ₀ -u) ² +w ₁ ×(u ₁ -u) ² =w ₀ ×w ₁ ×(u ₀ -u ₁ ) ²

u=w ₀ ×u ₀ +w ₁ ×u ₁

Wherein, u ₀ represents the average gray level of each candidate instance, and w ₀ represents that the number of pixels of each candidate instance accounts for the total number of pixels in the neighborhood of each candidate instance and each candidate instance ratio, u ₁ represents the average gray level of the neighborhood of each candidate instance, w ₁ represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels, u represents the each candidate instance The average gray level of the instance and the neighborhood of each candidate instance, and g represents the pixel value variance.

In some possible implementations of the first aspect, the method further includes: performing contrast enhancement processing on a first region of interest in the second image to obtain a third image, where the first region of interest includes a second instance within the second image and a neighborhood of the second instance.

In this implementation, using the contrast enhancement technology to perform local adaptive contrast enhancement on the instance and its neighborhood in the second image, the inter-class variance of the local image formed by the instance and the neighborhood can be larger, and the local image can be achieved more obvious effect. This makes the instance segmentation model trained based on the locally adaptive contrast-enhanced third image more robust and more resistant to interference.

The first region of interest is a partial region of the second image. The second image may include multiple different regions of interest, different regions of interest may contain different instances, and each region of interest is a partial local area of the second image. Contrast enhancement processing is performed separately for different regions of interest.

An example of the neighborhood of the second instance is the bounding box neighborhood of the second instance. The fact that the first region of interest includes the neighborhood of the second instance can be understood as the fact that the first region of interest is larger than the neighborhood of the second instance, and the degree of the larger degree can be preset. For example, the first region of interest may be 10 pixels larger in length and width than the neighborhood of the second example.

In some possible implementations of the first aspect, the method further includes: training an instance segmentation model according to the third image.

In some possible implementations of the first aspect, the first image is one of the first data sets. The method further includes: training an instance segmentation model according to the fourth image in the first data set, and the training is performed simultaneously with the aforementioned process of obtaining a second image based on the first image; using the second image or The third image trains the instance segmentation model.

In this implementation, image processing and model training are performed at the same time, so that when the first data set changes, a processed image can still be obtained based on the latest first data set, and the instance segmentation model can be segmented based on the processed image. to train. That is to say, this implementation can improve the performance of the instance segmentation model without adding additional delay.

In a second aspect, the present application provides a training method for an instance segmentation model. The method includes: acquiring a first data set, the first data set including multiple images; while training an instance segmentation model based on the images in the multiple images, performing training on the first instance in the first image at the same time. A variety of affine transformations to obtain multiple candidate instances, the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations; the best candidate instance is selected from the multiple candidate instances, the best candidate instance Has the largest grayscale difference among the plurality of candidate instances, and the grayscale difference of each candidate instance in the plurality of candidate instances is the difference between the each candidate instance and the each candidate instance in the first The grayscale difference between neighborhoods in the image; adding the best candidate instance to the first image to obtain a second image enhanced for the first image; pairing according to the second image The instance segmentation model is trained.

In this training method, image processing and model training are performed at the same time, so that when the first data set changes, a processed image can still be obtained based on the latest first data set, and the instance segmentation model can be segmented based on the processed image. to train. That is to say, this implementation can improve the performance of the instance segmentation model without adding additional delay.

In some possible implementations of the second aspect, the grayscale difference between each candidate instance and a neighborhood of each candidate instance is based on the difference between the each candidate instance and each candidate instance. The pixel value variance of the neighborhood is determined, wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:

u=w ₀ ×u ₀ +w ₁ ×u ₁

In some possible implementations of the second aspect, the training of the instance segmentation model according to the second image includes: performing contrast enhancement processing on the first region of interest in the second image, to obtain A third image, the first region of interest includes a second instance in the second image and a neighborhood of the second instance; the instance segmentation model is trained according to the third image.

In a third aspect, the present application provides an apparatus for data enhancement of an instance segmentation model. The apparatus includes a module for executing the method in the first aspect or any one of the implementation manners.

For example, the apparatus includes: an acquisition module for acquiring the first image. a processing module, configured to: perform multiple affine transformations on the first instance in the first image to obtain multiple candidate instances, and the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations; The best candidate instance is selected from the candidate instances, the best candidate instance has the largest grayscale difference among the plurality of candidate instances, and the grayscale difference of each candidate instance in the plurality of candidate instances is the The grayscale difference between each candidate instance and the neighborhood of each candidate instance in the first image; adding the best candidate instance to the first image to obtain a A second image after image enhancement.

Optionally, the grayscale difference between the neighborhood of each candidate instance and each candidate instance is determined according to the pixel value variance of the neighborhood of each candidate instance and each candidate instance , wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:

u=w ₀ ×u ₀ +w ₁ ×u ₁

Wherein, u ₀ represents the average gray level of each candidate instance, and w ₀ represents that the number of pixels of each candidate instance accounts for the total number of pixels in the neighborhood of each candidate instance and each candidate instance. ratio, u ₁ represents the average gray level of the neighborhood of each candidate instance, w ₁ represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels, u represents the each candidate instance The average gray level of the instance and the neighborhood of each candidate instance, and g represents the pixel value variance.

Optionally, the processing module is further configured to: perform contrast enhancement processing on the first region of interest in the second image to obtain a third image, where the first region of interest includes the second image. A second instance of and a neighborhood of said second instance.

Optionally, the apparatus further includes a training module configured to train an instance segmentation model according to the third image.

Optionally, the first image is one of the first data sets. The device further includes a training module for: training the instance segmentation model according to the fourth image in the first data set, and the training and the aforementioned processing module obtain the second image based on the first image. Processing proceeds concurrently; the instance segmentation model is trained using the second or third image.

In a fourth aspect, the present application provides an apparatus for training an instance segmentation model, where the apparatus includes a module for executing the method in the second aspect or any one of the implementation manners.

For example, the training device includes: an acquisition module for acquiring a first data set, where the first data set includes a plurality of images; a training module for training an instance segmentation model based on the images in the plurality of images A processing module for performing multiple affine transformations on the first instance in the first image to obtain multiple candidate instances while the training module trains the instance segmentation model based on the images in the multiple images , the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations; the best candidate instance is selected among the multiple candidate instances, and the best candidate instance has the largest number of candidate instances among the multiple candidate instances. The grayscale difference, the grayscale difference of each candidate instance in the plurality of candidate instances is the grayscale difference between the each candidate instance and the neighborhood of each candidate instance in the first image ; adding the best candidate instance to the first image to obtain a second image enhanced by the first image.

u=w ₀ ×u ₀ +w ₁ ×u ₁

Wherein, u ₀ represents the average gray level of each candidate instance, and w ₀ represents that the number of pixels of each candidate instance accounts for the total number of pixels in the neighborhood of each candidate instance and each candidate instance ratio, u ₁ represents the average gray level of the neighborhood of each candidate implementation, w ₁ represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels, u represents the each candidate The average gray level of the instance and the neighborhood of each candidate instance, and g represents the pixel value variance.

Optionally, the training module is specifically configured to: perform contrast enhancement processing on the first region of interest in the second image to obtain a third image, where the first region of interest includes the second image. The second instance of and the neighborhood of the second instance; the instance segmentation model is trained according to the third image.

In a fifth aspect, the present application provides a data enhancement device for an instance segmentation model, the device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is When executed, the processor is configured to execute the method in the first aspect or any one of the implementation manners.

In a sixth aspect, the present application provides an apparatus for training an instance segmentation model, the apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed , the processor is configured to execute the method in the second aspect or any one of the implementation manners.

In a seventh aspect, a computer-readable medium is provided, where the computer-readable medium stores program code for device execution, where the program code is used to execute the method in the first aspect or any one of the implementation manners thereof.

In an eighth aspect, a computer-readable medium is provided, where the computer-readable medium stores program codes for device execution, where the program codes are used to execute the method in the second aspect or any one of the implementation manners thereof.

In a ninth aspect, there is provided a computer program product comprising instructions, when the computer program product is run on a computer, the computer program product causes the computer to execute the method in the first aspect or any one of the implementation manners.

A tenth aspect provides a computer program product comprising instructions, which when the computer program product is run on a computer, causes the computer to execute the method of the second aspect or any one of the implementation manners.

In an eleventh aspect, a chip is provided, the chip includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes the first aspect or any one of the implementation manners described above. Methods.

Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in the first aspect or any one of the implementation manners thereof.

A twelfth aspect provides a chip, the chip includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes the second aspect or any one of the implementation manners. Methods.

Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in the second aspect or any one of the implementations.

A thirteenth aspect provides a computing device, the computing device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed, the The processor is configured to execute the method in the first aspect or any one of the implementation manners.

A fourteenth aspect provides a computing device, the computing device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed, the The processor is configured to execute the method in the second aspect or any one of the implementation manners.

In a fifteenth aspect, the present application provides an instance segmentation method, the method comprising: performing instance segmentation on an image by using the instance segmentation model trained in the first aspect or the second aspect.

In a sixteenth aspect, the present application provides an instance segmentation apparatus, the apparatus including a module for performing the method in the fifteenth aspect or any one of the implementation manners.

In a seventeenth aspect, the present application provides an instance partitioning device, the device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed, The processor is configured to execute the method in the fifteenth aspect or any one of the implementation manners.

In an eighteenth aspect, a computer-readable medium is provided, where the computer-readable medium stores program code for execution by a device, where the program code is used for executing the method in the fifteenth aspect or any one of the implementation manners thereof.

A nineteenth aspect provides a computer program product containing instructions, which when the computer program product is run on a computer, causes the computer to execute the method in the fifteenth aspect or any one of the implementation manners.

A twentieth aspect provides a chip, the chip includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes the fifteenth aspect or any one of the implementations thereof method in .

Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in the fifteenth aspect or any one of the implementation manners thereof.

A twenty-first aspect provides a computing device, the computing device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed, the The processor is configured to execute the method in the fifteenth aspect or any one of the implementation manners.

Description of drawings

FIG. 1 is a schematic diagram of a related concept of an embodiment of the present application.

FIG. 2 is a schematic scene diagram of an instance segmentation model to which an embodiment of the present application can be applied.

FIG. 3 is a schematic architecture diagram of a system to which the methods of various embodiments of the present application can be applied.

FIG. 4 is a schematic flowchart of a data enhancement method according to an embodiment of the present application.

FIG. 5 is a schematic flowchart of a data enhancement method according to another embodiment of the present application.

FIG. 6 is a schematic structural diagram of a data enhancement apparatus according to an embodiment of the present application.

FIG. 7 is a schematic structural diagram of an apparatus according to an embodiment of the present application.

FIG. 8 is a schematic structural diagram of a computer program product according to an embodiment of the present application.

FIG. 9 is a schematic diagram of a region of interest in an embodiment of the present application.

detailed description

In order to facilitate understanding of the embodiments of the present application, the following first introduces several concepts related to the embodiments of the present application with reference to FIG. 1 . In the example of Fig. 1, the picture contains instances such as 1 person, 2 sheep, and 1 dog. It should be understood that FIG. 1 is only an example and not a limitation.

As shown in the upper left corner of Figure 1, image classification refers to the classification of an image to which an instance belongs. For example, in the dataset, there are four categories of people (person), sheep (sheep), dog (dog) and cat (cat), and image classification is to get (or output) which categories are contained in a given picture. For example, in the example of Figure 1, the output of the image classification task is to label the classification in the picture: person, sheep, dog.

As shown in the upper right corner of Figure 1, target detection is simply to find out what targets are in the picture and the positions of these targets (for example, frame the target with a rectangular frame, which can be called a detection frame). For example, in the example of Figure 1, the output of the target detection task is to label the bounding boxes of 1 person, 2 sheep, and 1 dog in the picture (the rectangular box in the upper right corner of Figure 1).

As shown in the lower left corner of Figure 1, semantic segmentation refers to the need to distinguish every pixel in the image, not just frame the target with a rectangular frame, but different instances of the same object do not need to be segmented separately. For example, in the example of Figure 1, the output of the semantic segmentation task is to label people, sheep, and dogs in the picture, but it is not necessary to label sheep 1 and 2. Semantic segmentation is also object segmentation in the usual sense.

As shown in the lower right corner of Figure 1, instance segmentation refers to the combination of object detection and semantic segmentation. Compared with the bounding box of target detection, instance segmentation can be accurate to the edge of the object. Compared with semantic segmentation, instance segmentation needs to label different instances of the same object on the map. For example, in the example in Figure 1, there is 1 instance of human, 2 instances of sheep, and 1 instance of dog. The instance segmentation task is to label all these instances. The predicted result of instance segmentation can be called segmentation mask. The segmentation mask quality can characterize the quality of the prediction results of instance segmentation.

FIG. 2 is a frame diagram of an exemplary application scenario of the instance segmentation model of the present application. The exemplary application scenario is the smooth connection call service of the mobile phone. As shown in Figure 2, the smooth connection call service is implemented by the user layer, the application layer and the computing layer. It should be understood that the following embodiments are introduced using a mobile phone as an application scenario. In fact, the solution is not limited to mobile phones, and can also be applied to other types of electronic devices such as computers, servers, or wearable devices.

Among them, the user layer may include a Changlian call interface, through which the user of the mobile phone can access the Changlian call service; the application layer can provide users with basic call services and special services through the Changlian call interface, and the basic call service can It includes services such as logging in to an account, initiating a call, ending a call, and/or switching between front and rear cameras. Featured services can include services such as skin beautifying effects, dark light HD, time-space transformation, and protagonist locking; These interfaces provide functions to the application layer to implement various services of the application layer.

For example, protagonist locking means that the protagonist portrait specified by the user is retained, and other portraits and backgrounds are removed. At this time, only the pixels of the protagonist portrait can be retained in the video. Space-time transformation is the operation of retaining the portrait part in the video and replacing the background during the video call, so as to achieve the effect of space-time transformation.

As an example, the instance segmentation model of the present application can be applied to the application of time-space transformation and the application of protagonist locking in the mobile phone service smooth communication. These two applications rely on the instance segmentation results of the high-precision portrait instance segmentation algorithm, especially when multiple portraits are occluded interactively or the portrait is occluded by other objects, the instance segmentation accuracy of the instance segmentation algorithm will be higher.

FIG. 3 is an exemplary structural diagram of a system architecture 300 to which the instance segmentation model of the embodiment of the present application can be applied. In Figure 3, a data collection device 360 is used to collect training data. Taking the system architecture for instance segmentation as an example, the training data may include the training image and the contour information of the instance in the training image, wherein the contour information of the instance in the training image may be the result of manual pre-marking, and the contour information may be called Annotation information for training images.

After collecting the training data, the data collection device 360 stores the training data in the database 330 , and the training device 320 trains the target model 301 based on the training data maintained in the database 330 , and the target model 301 may be an instance segmentation model. The target model in this application can also be replaced with target rules.

In some implementations, after the data collection device 360 collects the training data, the data processing device 370 may further process the training data to improve the performance of the target model 301 . For example, the data processing device 370 can perform data enhancement on the training images in the database 330 to expand the training images in the database 330, so that the training device 320 can train an instance segmentation model with higher segmentation accuracy based on the expanded database 330.

The following describes how the training device 320 obtains the target model 301 based on the training data. Taking the system architecture for instance segmentation as an example, the training device 320 performs instance segmentation on the input original image, compares the instance contour result obtained by segmentation with the label information of the original image, and adjusts the parameters of the target model 301 according to the comparison result, Until the difference between the contour information output by the training device 320 and the label information of the original image is less than a certain threshold, the training of the target model 301 is completed.

It can be understood that, in practical applications, the training data maintained in the database 330 are not necessarily all collected by the data collection device 360, and may also be received from other devices. In addition, the training device 320 does not necessarily have to completely train the target model 301 based on the training data maintained by the database 330, and can also obtain training data from the cloud or other places for model training. The above description should not be taken as a limitation to the embodiments of the present application.

It can be understood that, in the system architecture, the training device 320 and the data processing device 370 may be the same device.

The target model 301 trained according to the training device 320 can be applied to different systems or devices, such as being applied to the execution device 310 shown in FIG. 3 . The execution device 310 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (AR)/virtual reality (VR), a vehicle-mounted terminal, etc., or a server with limited resources or Resource-constrained cloud devices, etc. For example, an example of the execution device 310 may be a cell phone including the exemplary structure shown in FIG. 2 .

In FIG. 3 , the execution device 310 configures an input/output (I/O) interface 312 for data interaction with external devices, and a user can input data to the I/O interface 312 through the client device 340 . Taking the system architecture for instance segmentation as an example, the input data may include: images collected by the camera of the client device 340 .

It can be understood that, in the system architecture shown in FIG. 3 , the execution device 310 and the client device 340 may be the same device.

When the execution device 310 preprocesses the input data, or the calculation module 311 of the execution device 310 performs calculations and other related processing, the execution device 310 can call the data, codes, etc. in the data storage system 350 for corresponding processing , the data and instructions obtained by corresponding processing may also be stored in the data storage system 350 .

Finally, the I/O interface 312 returns the processing result, taking instance segmentation as an example, and returns the instance segmentation result of the image to be segmented to the client device 340 so as to be provided to the user.

In the case shown in FIG. 3 , the user can manually give input data, and the manual setting can be operated through the interface provided by the I/O interface 312 . In another case, the client device 340 can automatically send the input data to the I/O interface 312 . If the user's authorization is required to request the client device 340 to automatically send the input data, the user can set the corresponding permission in the client device 340 . The user can view the result output by the execution device 310 on the client device 340, and the specific presentation form can be a specific manner such as display, sound, and action. The client device 340 can also be used as a data collection terminal to collect the input data of the input I/O interface 312 and the output result of the output I/O interface 312 as new sample data as shown in the figure, and store them in the database 330 . Of course, it is also possible to collect the input data from the I/O interface 312 and the output result from the output I/O interface 312 directly by the I/O interface 312 without going through the client device 340 to store the data as new sample data in the database. 330.

It can be understood that FIG. 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 3 , the data The storage system 350 is an external memory relative to the execution device 310 , and in other cases, the data storage system 350 may also be placed in the execution device 310 .

FIG. 4 is an exemplary flowchart of a data augmentation method for an instance segmentation model according to an embodiment of the present application. As shown in FIG. 4 , the method may include S410 and S420. An example of an execution body of the method is the data processing device 370 in the system architecture shown in FIG. 3 .

S410, acquiring a first image. In this embodiment, one or more instances may be included in the first image. The first image may be an image in an instance segmentation dataset such as the coco dataset or the cityscape dataset.

In this embodiment, while acquiring the first image, annotation information of the first image may also be acquired. The annotation information of the first image may be referred to as first annotation information. Outline information of each instance in the first image may be recorded in the first annotation information. An example of the first annotation information is a set of coordinate points of the outer contour of each instance.

In this embodiment, acquiring the first image may be understood as reading the first image from the image storage device. Taking the execution body of the method as the data processing device 370 in FIG. 3 as an example, the data processing device 370 can read the first image and the first annotation information from the database 330 .

In some implementations of this embodiment, after acquiring the first image and the first annotation information, contour information at the pixel level of each instance in the first image may be extracted according to the first annotation information. This process may be referred to as preprocessing of the first image.

For example, the outer contour information of an instance is converted from a set of coordinate points into a mask matrix that can be efficiently processed by a computer. In the mask matrix, the pixel corresponding to the instance can be set to 1, and the pixel corresponding to the background can be set to 0. In this embodiment, an instance may also be referred to as a foreground or a foreground portion, and a neighborhood of the instance may be referred to as a background or a background portion of the instance. The neighborhood is an area adjacent to the instance, including but not limited to adjacent areas of various shapes, such as a subsequent enclosing rectangle neighborhood.

Taking the first image as the coco data set, and the annotation information of the images in the data set is json text as an example, the data processing device 370 can use the "findContour()" function that extracts the outer contour that comes with opencv to extract the first image. The pixel-level contour information of the instance.

S420: Perform multiple affine transformations on the first instance in the first image to obtain multiple candidate instances, where the multiple candidate instances are in one-to-one correspondence with the multiple affine changes. In this embodiment, the first instance may be any instance in the first image. When the first image contains multiple instances, each instance can be regarded as the first instance, and then multiple affine transformations are performed on each instance to obtain multiple candidate instances of each instance. Each candidate instance in the candidate instances is an instance obtained by processing the first instance by using the corresponding affine transformation among the multiple affine transformations.

For example, each of a variety of affine transformations can be used to operate on the mask matrix of the first instance, so that a mask matrix corresponding to the affine transformation can be obtained, and the instance described by the mask matrix is the same as the mask matrix. A candidate instance corresponding to an affine transformation.

In this embodiment, performing affine transformation on the instance may include performing one or more transformations on the instance in the first image by translation, rotation, scaling, reflection, shearing, and any combination of the foregoing transformations. In this embodiment, the above-mentioned various affine transformations may be pre-configured. As an example, a rule for affine transformation can be preset, and multiple affine transformation matrices can be generated based on the rule, and each affine transformation matrix in the multiple affine transformation matrices corresponds to an affine transformation; as another As an example, the multiple affine transformation matrices can be preset directly.

An exemplary representation of an affine transformation matrix is as follows:

Among them, t _x represents the offset in the horizontal direction, _ty represents the offset in the vertical direction (or vertical direction), s represents the scaling scale, and r represents the rotation angle.

Taking the affine transformation matrix in the above form as an example, the following describes an exemplary implementation manner of acquiring multiple affine transformation matrices. In this implementation, the width w in the horizontal direction of the circumscribed matrix frame of the first instance can be obtained, and the value range of t _x is set to -20%*w to +20%*w, and the step size is set to 2 ; In order to avoid the pixel artifact problem caused by the ambiguity of image pixel semantics, _ty can be fixed to 0; the value range of the scaling scale s can be set to 0.8 to 1.2, and the step size is set to 0.05, that is, affine The transformed instance is 80% to 120% of the original instance; the value range of the rotation angle r can be set from -10 degrees to +10 degrees, and the step size can be set to 1 degree. According to the above-mentioned affine transformation rule, multiple affine transformation matrices can be obtained, and the multiple affine transformation matrices can form a set of affine transformation candidate matrices.

It can be understood that the fixation of _ty to zero is only an example, and the fixed _ty to zero, compared with not being fixed to zero, avoids that the difference between the enhanced image and the pre-enhanced image is too large, so that it can be avoided. Using the enhanced image to train the instance segmentation model affects the training effect of the trained instance segmentation model, that is, it helps to improve the performance of the instance segmentation model.

S430. Select the best candidate instance from the plurality of candidate instances, the best candidate instance has the largest grayscale difference among the plurality of candidate instances, and the value of each candidate instance in the plurality of candidate instances is The grayscale difference is the grayscale difference between the each candidate instance and the neighborhood of each candidate instance in the first image.

In this embodiment, after performing multiple affine transformations on the first instance to obtain multiple candidate instances, the best candidate instance of the first instance in the first image should be selected from the multiple candidate instances. In this embodiment, the best candidate instance refers to the candidate instance with the largest grayscale difference between the multiple candidate instances and its neighborhood in the first image, that is, the best candidate instance and the best transformed instance are in the first The gray level difference between the neighborhoods in the image is larger than the gray level difference between any other candidate instance in the multiple candidate instances and the neighborhood of the any one candidate instance in the first image.

In order to determine the best candidate instance from the multiple candidate instances, the grayscale difference between each candidate instance in the multiple candidate instances and the specified neighborhood of the candidate instance in the first image can be obtained first, and finally the obtained Multiple grayscale differences corresponding to the multiple candidate instances one-to-one.

For example, for each candidate instance in the multiple candidate instances, the circumscribed rectangle neighborhood of each candidate instance in the first image can be obtained, so as to obtain multiple neighborhoods corresponding to the multiple candidate instances one-to-one. Multiple neighborhoods can be referred to as contour neighborhood sets.

The circumscribed rectangle neighborhood of each candidate instance can be understood as the neighborhood formed by pixels other than the transformed instance within the circumscribed rectangle of the candidate instance. As shown in FIG. 9 , the first instance in the first image is a cloud, and the oblique line portion in the circumscribed rectangular frame of the cloud outline represents the circumscribed rectangle neighborhood of the first instance of the cloud. The first image may also include other content, for example, may include other instances, which are not shown in FIG. 9 . It can be understood that the circumscribing rectangle neighborhood of each candidate instance is used as the field of the candidate instance is only an example, and this application does not limit the shape of the neighborhood of the candidate instance. For example, the neighborhood of the candidate instance in this application is not limited. It can also be the circumcircle neighborhood of the candidate instance.

In a possible implementation manner, for each neighborhood in the contour neighborhood set, the candidate instance corresponding to the neighborhood can be regarded as the foreground, and the neighborhood can be regarded as the background, and the pixel value variance between the foreground and the background can be calculated, The variance corresponding to the neighborhood is obtained, and the variance can be used as the grayscale difference between the neighborhood and the candidate instance.

For each neighborhood, an exemplary formula for calculating its corresponding variance is as follows:

u=w ₀ ×u ₀ +w ₁ ×u ₁

Among them, u ₀ represents the average gray level of the foreground, w ₀ represents the ratio of the number of pixels in the foreground to the total number of pixels in the foreground and background, u ₁ represents the average gray level of the background, and w ₁ represents the number of pixels in the background accounting for the total number of pixels. The ratio of u represents the average gray level of the foreground and background, and g represents the variance of the foreground and background.

In the above example, the variance of the pixel values of the foreground and the background is regarded as the grayscale difference between the candidate instance and the neighborhood of the candidate instance, which is only an example. In this embodiment, the grayscale difference value between the foreground and the background can be obtained in other ways. For example, the 1 norm or the infinite norm of the pixel value of each foreground and its corresponding background can be calculated, and the difference between the foreground and the background can be calculated. The 1 norm or infinity norm of the pixel value is regarded as the grayscale difference between the foreground and the background.

In the foregoing steps, after obtaining multiple grayscale differences corresponding to multiple candidate instances one-to-one, the maximum grayscale difference can be determined from the multiple grayscale differences, and the candidate corresponding to the maximum grayscale difference can be determined. The instance is determined as the best candidate instance, and the neighborhood in the first image corresponding to the maximum grayscale difference can be determined as the optimal neighborhood, and the optimal neighborhood can also be called the target neighborhood. In this embodiment, the candidate instance corresponding to the maximum grayscale difference refers to the candidate instance on which the maximum grayscale difference is calculated, and the neighborhood corresponding to the maximum grayscale difference refers to the neighborhood based on which the maximum grayscale difference is calculated. area.

S440, adding the best candidate instance to the first image to obtain a second image enhanced by the first image. After obtaining the best candidate instance and the best neighborhood of the first instance, the best candidate instance can be added to the first image, and the position of adding the best candidate instance should make the best candidate instance in the first image. The neighborhood of is just the best neighborhood.

In this embodiment, reference may be made to the prior art for the processing manner of the image content at the location of the first instance in the first image. For example, image inpainting technology may be used to process the image content at the location of the first instance in the first image, so as to fill the image content at the location as the background of the instance in the first image.

In this embodiment, one or more instances in the first image may be regarded as the first instance, and the method shown in FIG. 4 is used to obtain the best transformed instance corresponding to each of the one or more instances, and The best transform instance is added to the first image, resulting in the second image.

In this embodiment, when the second image is acquired, annotation information of the second image may also be acquired. For example, when the implementation form of the best candidate instance is a mask matrix, the mask matrix can be transformed into the form of a coordinate point set, and the coordinate point set can be used as the contour information of the target instance. If the annotation information of the second image is referred to as the second annotation information, the contour information of the best candidate instance may be recorded in the second annotation information.

After the training data set of the instance segmentation model is processed by using the method of this embodiment, the instance segmentation model can be trained by using the training data set obtained by processing. For example, after the data processing device 370 in FIG. 3 performs the first image processing on the first image in the database 330 , the training device 320 can use the processed second image to train the instance segmentation model to obtain the target model 301 .

Further, after the instance segmentation model is obtained by training using the second image, the instance segmentation model can be used to perform instance segmentation. For example, after the training device 320 in FIG. 3 uses the second image to train to obtain an instance segmentation model, the execution device 310 may execute an instance segmentation service based on the instance segmentation model. Taking the execution device 310 including the architecture shown in FIG. 2 as an example, the execution device 310 can implement the protagonist locking and spatiotemporal transformation services based on the instance segmentation model.

In the method of this embodiment, when the first image is enhanced, the largest grayscale difference value between the designated neighborhood in the first image and one's own designated neighborhood in the first image is selected from the multiple candidate instances of the first instance in the first image. The best candidate instance of , and the best candidate instance is added to the first image. This results in a larger contrast between the best candidate instance and its neighborhood in the second image obtained by performing data enhancement on the first image, thereby making the best candidate instance clearer in the first image. In this case, the contour information of the best candidate instance can be considered as very reasonable contour information, and the second annotation information of the second image is obtained based on the reasonable contour information, and the second annotation information is obtained based on the second image and the second annotation information. To train the instance segmentation model, an instance segmentation model with more accurate segmentation accuracy can be obtained, or the segmentation accuracy of the instance segmentation model can be significantly improved.

FIG. 5 is an exemplary flowchart of a data enhancement method according to another embodiment of the present application. As shown in Fig. 5, the method may include S450 in addition to S410 to S440. S450: Perform contrast enhancement processing on the first region of interest in the second image to obtain a third image, where the first region of interest includes the second instance and the second instance in the second image the neighborhood.

In this embodiment, the second image may include one or more regions of interest (region of interest, ROI), and each region of interest may include an instance and a neighborhood of the instance. The regions of interest of the second image may form a set of regions of interest.

For convenience of description, the region of interest in the second image is referred to as the first region of interest, the instance in the first region of interest is referred to as the second instance, and the neighborhood of the second instance is referred to as the second neighborhood.

In this embodiment, the second neighborhood may include the circumscribed rectangle neighborhood of the second instance, or, in other words, the circumscribed rectangle of the second instance is located in the second neighborhood, or, in other words, the second neighborhood may not include the circumscribed rectangle of the second instance. In addition to the pixel points within the circumscribed rectangle neighborhood, the pixel points outside the circumscribed rectangle neighborhood of the third example are also included.

For each first region of interest in the set of regions of interest in the second image, as an example, a low-frequency part in the first region of interest may be acquired through a Gaussian low-pass filter. For example, a Gaussian low-pass filter can be implemented using the "GaussianBlur()" function in the open-source opencv algorithm. Specifically, the pixels in the first region of interest can be passed to a Gaussian low-pass filter, so that the high-frequency part of each pixel can be filtered out, and the low-frequency part of each pixel can be obtained.

After the low-frequency part in the first region of interest is obtained through the Gaussian low-pass filter, the low-frequency part can be subtracted from the original pixels of the first region of interest, so as to obtain the high-frequency part of the first region of interest.

After acquiring the low-frequency part and the high-frequency part in the first region of interest, for each pixel in the first region of interest, the enhanced pixel of the high-frequency part can be calculated according to the gain value of the high-frequency part. value, the pixel value of the pixel point after the enhancement of the low frequency part is calculated according to the gain value of the low frequency part, so as to obtain the image after contrast enhancement.

For example, traverse each pixel in the first region of interest, if the pixel is a pixel in the high-frequency part, it can be considered that the pixel value of the pixel and the surrounding pixels have a larger mean square error. At this time, you can The pixel value of the pixel is reduced by a smaller gain value to alleviate the phenomenon that the pixel is too bright; if the pixel is a pixel in the low frequency part, it can be considered that the pixel value of the pixel and the surrounding pixels The mean square error of , is small. At this time, the high-frequency part can be amplified by a larger gain value to make the detailed features around the pixel more obvious, thereby alleviating the problem of image blurring.

In an example, the gain values of the high frequency part and the low frequency part can be preset as required. For example, the gain value of the high frequency part can be set to 0.5, and the gain value of the low frequency part can be set to 2. An implementation manner of calculating the enhanced pixel value of the high-frequency part of the pixels according to the gain value of the high-frequency part or calculating the enhanced pixel value of the low-frequency part according to the gain value of the low-frequency part is: combining the gain value with the The product of the pixel values of the pixel points is used as the pixel value after the pixel point is enhanced.

In this embodiment, after the above-mentioned adaptive contrast enhancement processing is performed on each region of interest in the second image, a third image that has undergone local adaptive contrast enhancement can be obtained. After each second image is processed using the method shown in FIG. 5 , a training data set with enhanced image contrast can be obtained.

Further, in the method of this embodiment, the method may further include: using the training data set to train the instance segmentation model. For example, after the data processing device 370 in FIG. 3 performs second image processing on the second image in the database 330 , the training device 320 can use the processed third image to train the instance segmentation model to obtain the target model 301 .

Further, in the method of this embodiment, the method may further include: using the instance segmentation model obtained by training to perform instance segmentation. For example, after the training device 320 in FIG. 3 uses the third image to train to obtain an instance segmentation model, the execution device 310 may execute an instance segmentation service based on the instance segmentation model. Taking the execution device 310 including the architecture shown in FIG. 2 as an example, the execution device 310 can implement the protagonist locking and spatiotemporal transformation services based on the instance segmentation model.

The instance segmentation model trained by using the training data set with image contrast enhancement in this embodiment has high robustness, and has strong anti-interference ability and high tolerance for noise in images in noisy scenes Spend.

In an embodiment of the present application, the data enhancement method shown in FIG. 4 or FIG. 5 can be used to process the images in the original training data set, and after obtaining the processed training data set, the training data set obtained by processing can be used for training Instance segmentation model. The processing of the original training dataset in this embodiment may be referred to as offline data augmentation.

In another embodiment of the present application, while using the original training data set to train the instance segmentation model, the method of FIG. 4 or FIG. 5 may be used to perform image processing on the original training data set. After the instance segmentation model is trained using the original training data set and the image processing is completed on the original training data set using the method in FIG. 4 or FIG. 5 , the instance segmentation model is trained using the training data set obtained by image processing. The method of this embodiment may be referred to as an online data augmentation training method.

Among them, since the duration of one training iteration of the instance segmentation model is usually much longer than the duration of image processing, the method of this embodiment does not increase additional time-consuming, and the latest enhanced training data set can be obtained in real time, thereby The accuracy of the instance segmentation model can be further improved. For example, no matter how the original training data set of the instance segmentation model changes, the method of this embodiment can obtain the latest enhanced training data set based on the original training data set, so that an instance segmentation model with better performance can be obtained.

An exemplary implementation of the online data augmentation training method is described below by taking the implementation of the data augmentation method in Fig. 4 or Fig. 5 based on the TensorFlow open source framework as an example.

In the existing open source training framework, the image processing operations are completed in the constructor of the data_generator object. When using the method of this embodiment, first read the original training data set, use the "imread" function in the opencv library to read the first image and the first annotation information in the original training data set, and use the "loadAnns" of the coco data set The function reads the first annotation information, and implements the data enhancement method in Figure 4 or Figure 5 in the constructor of the data_generator object to output a data_generator object, and the above operations are performed by the image processing thread; then, the data_generator object is transmitted as a parameter to The training thread of the Tensorflow model, where the training thread and the image processing thread are executed in parallel, and the image processing thread will execute independently, output the processed second image or third image, and output the second annotation information to the public storage area, training Before each training iteration, the process goes to the public storage area to read the processed second image or the third image and output the second annotation information, so as to realize the training of the instance segmentation model.

FIG. 6 is a schematic structural diagram of a data enhancement apparatus 600 for an instance segmentation model according to an embodiment of the present application. The apparatus 600 may be an example of the data processing device 370 in the system architecture shown in FIG. 3 . The apparatus 600 may include an acquisition module 610 and a processing module 620, and optionally, may also include a training module. The apparatus 600 may be used to implement the data augmentation method of the instance segmentation model in any of the foregoing embodiments, for example, may be used to implement the method shown in FIG. 4 or FIG. 5 . For example, the acquiring module 610 may be used to perform S410, and the processing module 620 may be used to perform S420 to S440. Optionally, the processing module 620 may also be configured to perform S450.

The schematic structure of the apparatus for training an instance segmentation model according to an embodiment of the present application is similar to the structure of the apparatus 600 including a training module, and details are not repeated here. The training device can be used to perform the aforementioned online data augmentation training method.

FIG. 7 is a schematic structural diagram of an apparatus 700 according to an embodiment of the present application. The apparatus 700 includes a processor 702 , a communication interface 703 and a memory 704 .

The apparatus 700 may be a chip or a computing device. For example, the apparatus 700 may be the data processing device 370 in the system architecture shown in FIG. 3 or may be an example of a chip that can be applied to the data processing device 370 . For another example, the apparatus 700 may be the training device 320 in the system architecture shown in FIG. 3 or may be an example of a chip that can be applied to the training device 320 .

The processor 702, the memory 704 and the communication interface 703 can communicate through a bus. Executable code is stored in the memory 704, and the processor 702 reads the executable code in the memory 704 to execute the corresponding method. The memory 704 may also include other software modules required for running processes such as an operating system. The operating system can be LINUX ^™ , UNIX ^™ , WINDOWS ^™ and the like.

For example, the executable code in the memory 704 is used to implement the method described in any one of the foregoing embodiments (the method shown in FIG. 4 or FIG. 5 ), and the processor 702 reads the executable code in the memory 704 to execute the foregoing method The method described in any one of the embodiments (the method shown in FIG. 4 or FIG. 5 ).

Among them, the processor 702 may include a CPU. Memory 704 may include volatile memory, such as random access memory (RAM). The memory 704 may also include non-volatile memory (2non-volatile memory, 2NVM), such as 2read-only memory (2ROM), flash memory, hard disk drive (HDD) or solid state drive ( solid state disk, SSD).

In some embodiments of the present application, the disclosed methods may be implemented as computer program instructions encoded on a computer-readable storage medium in a machine-readable format or on other non-transitory media or articles of manufacture. Figure 8 schematically illustrates a conceptual partial view of an example computer program product comprising a computer program for executing a computer process on a computing device, arranged in accordance with any of the above-described embodiments. In one embodiment, example computer program product 800 is provided using signal bearing medium 801 . The signal bearing medium 801 may include one or more program instructions 802 which, when executed by one or more processors, may provide the functions or part of the functions described in the methods described in any of the above embodiments. Thus, for example, in the embodiment shown in FIG. 5 , one or more of the features of S410 to S430 may be undertaken by one or more instructions associated with the signal bearing medium 801 .

In some examples, the signal bearing medium 801 may include a computer readable medium 803 such as, but not limited to, a hard drive, a compact disc (CD), a digital video disc (DVD), a digital tape, a memory, a read only memory (read only memory) -only memory, ROM) or random access memory (RAM), etc. In some implementations, the signal bearing medium 801 may include a computer recordable medium 804 such as, but not limited to, memory, read/write (R/W) CD, R/W DVD, and the like. In some embodiments, signal bearing medium 801 may include communication medium 805, such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.). Thus, for example, signal bearing medium 801 may be conveyed by a wireless form of communication medium 805 (eg, a wireless communication medium that conforms to the IEEE 802.11 standard or other transmission protocol). The one or more program instructions 802 may be, for example, computer-executable instructions or logic-implemented instructions. In some examples, the aforementioned computing devices may be configured to, in response to program instructions 802 communicated to the computing device via one or more of computer-readable media 803 , computer-recordable media 804 , and/or communication media 805 , Provides various operations, functions, or actions. It should be understood that the arrangements described herein are for illustrative purposes only. Thus, those skilled in the art will understand that other arrangements and other elements (eg, machines, interfaces, functions, sequences, and groups of functions, etc.) can be used instead and that some elements may be omitted altogether depending on the desired results . Additionally, many of the described elements are functional terms that can be implemented as discrete or distributed components, or in conjunction with other components in any suitable combination and position.

Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: a U disk, a removable hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk and other media that can store program codes.

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

A data enhancement method for an instance segmentation model, comprising:

get the first image;

Perform multiple affine transformations on the first instance in the first image to obtain multiple candidate instances, and the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations;

selecting the best candidate instance among the plurality of candidate instances, the best candidate instance having the largest grayscale difference among the plurality of candidate instances, the grayscale of each candidate instance in the plurality of candidate instances the difference is the grayscale difference between each candidate instance and the neighborhood of each candidate instance in the first image;

The best candidate instance is added to the first image to obtain a second image enhanced from the first image.
The method according to claim 1, wherein the grayscale difference between the each candidate instance and the neighborhood of each candidate instance is based on the difference between the each candidate instance and the each candidate instance. The pixel value variance of the neighborhood is determined, wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:

g=w 0 ×(u 0 -u) 2 +w 1 ×(u 1 -u) 2 =w 0 ×w 1 ×(u 0 -u 1 ) 2

u=w 0 ×u 0 +w 1 ×u 1

Wherein, u 0 represents the average gray level of each candidate instance, and w 0 represents that the number of pixels of each candidate instance accounts for the total pixels in the neighborhood of each candidate instance and each candidate instance The ratio of points, u 1 represents the average gray level of the neighborhood of each candidate instance, w 1 represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels, and u represents the number of pixels in the neighborhood of each candidate instance. The average gray level of each candidate instance and the neighborhood of each candidate instance, and g represents the pixel value variance.
The method according to claim 1 or 2, wherein the method further comprises:

Performing contrast enhancement processing on the first region of interest in the second image to obtain a third image, where the first region of interest includes the second instance in the second image and the neighbors of the second instance area.
A method for training an instance segmentation model, comprising the method according to any one of claims 1 to 3, wherein, while executing the method, the method further comprises: using a plurality of images to pair the instance segmentation model For training, the plurality of images includes the first image.
The training method according to claim 4, wherein the training method further comprises:

The instance segmentation model is trained using the second image.
A data enhancement device for an instance segmentation model, comprising:

an acquisition module for acquiring the first image;

a processing module, configured to: perform multiple affine transformations on the first instance in the first image to obtain multiple candidate instances, and the multiple candidate instances are in one-to-one correspondence with the multiple affine transformations; The best candidate instance is selected from the candidate instances, the best candidate instance has the largest grayscale difference among the plurality of candidate instances, and the grayscale difference of each candidate instance in the plurality of candidate instances is the The grayscale difference between each candidate instance and the neighborhood of each candidate instance in the first image; adding the best candidate instance to the first image to obtain the first image The second image after enhancement is performed.
The device according to claim 6, wherein the grayscale difference between the each candidate instance and the neighborhood of each candidate instance is based on the difference between the each candidate instance and the each candidate instance. The pixel value variance of the neighborhood is determined, wherein the pixel value variance of each candidate instance and the neighborhood of each candidate instance is calculated as follows:

g=w 0 ×(u 0 -u) 2 +w 1 ×(u 1 -u) 2 =w 0 ×w 1 ×(u 0 -u 1 ) 2

u=w 0 ×u 0 +w 1 ×u 1

Wherein, u 0 represents the average gray level of each candidate instance, and w 0 represents that the number of pixels of each candidate instance accounts for the total number of pixels in the neighborhood of each candidate instance and each candidate instance ratio, u 1 represents the average gray level of the neighborhood of each candidate instance, w 1 represents the ratio of the number of pixels in the neighborhood of each candidate instance to the total number of pixels, u represents the each candidate instance The average gray level of the instance and the neighborhood of each candidate instance, and g represents the pixel value variance.
The device according to claim 6 or 7, wherein the processing module is further configured to:

Performing contrast enhancement processing on the first region of interest in the second image to obtain a third image, where the first region of interest includes the second instance in the second image and the neighbors of the second instance area.
A training device for instance segmentation model, characterized in that it comprises the device according to any one of claims 6 to 8 and a training module, wherein, while the device implements the function of the device, the training module for training an instance segmentation model using a plurality of images, the plurality of images including the first image.
The training device according to claim 9, wherein the training module is further configured to train the instance segmentation model according to the second image.
A data enhancement device for instance segmentation model, characterized by comprising: a processor, wherein the processor is coupled to a memory;

the memory is used to store instructions;

The processor is adapted to execute instructions stored in the memory to cause the apparatus to implement the method of any one of claims 1 to 3.
An apparatus for training an instance segmentation model, comprising: a processor, wherein the processor is coupled to a memory;

the memory is used to store instructions;

The processor is adapted to execute instructions stored in the memory to cause the apparatus to implement the method of claim 4 or 5.
A computer-readable medium comprising instructions which, when executed on a processor, cause the processor to implement the method of any one of claims 1 to 3.
A computer-readable medium comprising instructions which, when executed on a processor, cause the processor to implement the method of claim 4 or 5.
A computer program product comprising instructions, when the computer program product is run on a computer, causing the computer to implement the method of any one of claims 1 to 3.
A computer program product comprising instructions, when the computer program product is run on a computer, causing the computer to implement the method of claim 4 or 5.