CN114170470A

CN114170470A - Sample generation method, device, equipment and storage medium

Info

Publication number: CN114170470A
Application number: CN202111290380.3A
Authority: CN
Inventors: 曾吉申; 杨锐; 刘永亮
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2021-11-02
Filing date: 2021-11-02
Publication date: 2022-03-11

Abstract

The application provides a sample generation method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a target task and normal sample data corresponding to the target task; determining a candidate area corresponding to the target task from the normal sample data, and determining the change content corresponding to the target task; and fusing the change content and the candidate area to obtain negative sample data corresponding to the target task. By the scheme, the semantics of the target task can be combined, the negative sample data adaptive to the semantics of the target task is generated in a self-adaptive manner on the basis of a large amount of collected positive sample data, and the corresponding sample marking information is obtained.

Description

Sample generation method, device, equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a sample generation method, apparatus, device, and storage medium.

Background

In many practical classification or detection scenarios, situations are often encountered where a sufficiently labeled training sample cannot be obtained. For example, in some scenarios involving privacy, security, etc., there may be only positive examples of labeled samples, while abnormal fine category samples or negative examples are difficult to collect and difficult to fine-mark. The negative sample or the abnormal sample is manually made, which consumes great manpower and material resources and has huge cost.

For example, in a medical image detection scenario, because user privacy data is involved, the number of valid label samples that can be used for model training is very rare, and especially for the detection of a certain disease, for example, the detection of a specific tumor, only a data set containing tens of images marked in a detection area is often used. Such a small number of labeled samples makes it difficult to train a detection model with better performance. The way in which the sample is generated manually is inefficient.

The above problem can be attributed to the class learning problem of small samples in which the number of samples in a certain class is particularly rare. How to efficiently realize the learning of the small sample is an urgent problem to be solved.

Disclosure of Invention

The embodiment of the invention provides a sample generation method, a sample generation device, sample generation equipment and a storage medium, which are used for efficiently and accurately generating negative sample data.

In a first aspect, an embodiment of the present invention provides a sample generation method, where the method includes:

acquiring a target task and normal sample data corresponding to the target task;

determining a candidate area corresponding to the target task from the normal sample data, and determining modified content corresponding to the target task;

and fusing the modified content and the candidate area to obtain negative sample data corresponding to the target task.

In a second aspect, an embodiment of the present invention provides a sample generation apparatus, including:

the acquisition module is used for acquiring a target task and normal sample data corresponding to the target task;

the determining module is used for determining a candidate area corresponding to the target task from the normal sample data and determining the modified content corresponding to the target task;

and the fusion module is used for fusing the change content and the candidate area to obtain negative sample data corresponding to the target task.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor, a communication interface; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to implement at least the sample generation method of the first aspect.

In a fourth aspect, embodiments of the present invention provide a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to implement at least the sample generation method of the first aspect.

In a fifth aspect, an embodiment of the present invention provides a sample generation method, where the method includes:

acquiring a first medical image corresponding to a medical image detection task, wherein the medical detection task is used for detecting whether a target body part has a set lesion, and the first medical image is a medical image indicating that the target body part is in a healthy state;

determining a candidate image area corresponding to the target body part from the first medical image based on the hospital image detection task;

acquiring a lesion image corresponding to the set lesion;

superimposing the lesion image into the candidate image region in the first medical image to obtain a second medical image as a negative example and labeling information of the second medical image.

The embodiment of the invention provides a self-adaptive sample generation scheme based on target task semantics. Specifically, different tasks may exist in different application scenarios, a neural network model or a machine learning model for executing a certain task needs to be trained in advance when the certain task is completed, and sufficient positive sample data and negative sample data with label information (i.e., supervision information) need to be collected in advance when the certain task is completed. In practical applications, the positive sample data is relatively easier to collect, while the negative sample data is not easy to collect. Therefore, when a target task is faced, on the basis that a large amount of positive sample data corresponding to the target task is collected, a candidate area for modification can be determined from the positive sample data by combining the semantics of the target task (namely the description information of the target task), and modification content corresponding to the target task is acquired, wherein the modification content actually corresponds to the characteristic information which should be contained in the negative sample data of the target task. Therefore, by fusing the change content and the candidate area, negative sample data adaptive to the semantics of the target task can be efficiently and accurately generated, and corresponding sample marking information is obtained.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of a sample generation method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a sample generation method according to an embodiment of the present invention;

FIG. 3 is a flow chart of a sample generation method according to an embodiment of the present invention;

fig. 4 is a schematic application diagram of a sample generation method according to an embodiment of the present invention;

FIG. 5 is a flow chart of a sample generation method according to an embodiment of the present invention;

fig. 6 is a schematic application diagram of a sample generation method according to an embodiment of the present invention;

FIG. 7a is a flowchart of a sample generation method according to an embodiment of the present invention;

FIG. 7b is a flowchart of a sample generation method according to an embodiment of the present invention;

fig. 8 is a schematic application diagram of a sample generation method according to an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of a sample generation device according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an electronic device corresponding to the sample generation apparatus provided in the embodiment shown in fig. 9.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The features of the embodiments and examples described below may be combined with each other without conflict between the embodiments. In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

The sample generation method provided by the embodiment of the invention can be executed by an electronic device, the electronic device can be a server or a user terminal, and the server can be a physical server or a virtual server (virtual machine) of a cloud.

Fig. 1 is a flowchart of a sample generation method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

101. and acquiring the target task and normal sample data corresponding to the target task.

102. And determining a candidate area corresponding to the target task from the normal sample data, and determining the modified content corresponding to the target task.

103. And fusing the modified content and the candidate area to obtain negative sample data corresponding to the target task.

The target task existing in the non-application scenario may be different, for example, in the medical image detection scenario, the target task may be to detect a specific tumor; in a counterfeit video detection scenario, the objective task is to detect whether there are counterfeit images and/or voices in the video. Therefore, corresponding target tasks can be set for different application scenarios.

The sample generation method provided by the embodiment of the invention is summarized as automatically generating the negative sample data corresponding to the target task, because in practical application, the positive sample data is easier to collect, and the negative sample data with the marking information is difficult to collect. And acquiring positive sample data and negative sample data of the target task for performing supervision training on the neural network model corresponding to the target task.

As described above, since target tasks existing in different application scenarios are different, and therefore models that need to be trained are also different, in order to perform automatic generation of negative sample data for target tasks existing in different application scenarios, first, for a certain target task set in a certain current application scenario, positive sample data corresponding to the target task needs to be acquired first, so that the positive sample data is modified based on semantic information of the target task to generate the negative sample data corresponding to the target task.

The semantic information of the target task is description information describing what the target task specifically does. For example, the target task is to detect whether there are forged images and/or voices in the video, and the target task may be to detect whether there is some kind of tumor in the images.

In practical application, a user (a person responsible for executing a sample generation task) may input a target task and a plurality of collected positive sample data corresponding to the target task, then determine a candidate region corresponding to the target task from the positive sample data, determine modification content corresponding to the target task, and then fuse the modification content and the candidate region to obtain negative sample data corresponding to the target task and label information of the negative sample data.

The candidate region may also be referred to as a candidate modified region, and refers to a feature region in the normal sample data that matches the target task, that is, a region that is most concerned when the target task is executed, where the candidate regions selected by different target tasks are different.

The candidate region in the positive sample data may be modified to match the target task, so as to generate negative sample data corresponding to the target task based on the positive sample data. Wherein, in order to perform the modification, the modification content corresponding to the target task is acquired. The modified content is a content reflecting the "negative example" feature, that is, the modified content actually corresponds to the feature information to be included in the negative example sample data of the target task, and is the feature information inconsistent with the feature information included in the positive example sample data.

In fact, the candidate area acquisition prompt information corresponding to different tasks may be preset, so as to adaptively determine the candidate area corresponding to the current target task from the normal sample data corresponding to the current target task based on the prompt information. Similarly, the modified content acquiring mode corresponding to different tasks may be preset, so as to acquire the modified content corresponding to the current target task.

Finally, the modified content and the candidate area are fused, negative sample data containing a fusion result can be obtained, and marking information corresponding to the target task, namely a sample label, can be automatically obtained based on the fusion result. In practical application, the fusion may be embodied as different effects such as superposition, replacement, scaling, and the like, and in summary, modified contents are fused into the candidate region.

In conclusion, the negative sample data adaptive to the semantics of the target task can be efficiently and accurately generated through the scheme, and the corresponding sample marking information can be obtained.

Based on the process, a plurality of new negative sample data can be generated, and finally training of the neural network model corresponding to the target task can be completed based on the positive sample data and the generated negative sample data.

In addition, in an optional embodiment, in order to further improve the robustness of the model, the following steps may be further included:

acquiring an attack behavior of a user under a target task, wherein the attack behavior refers to a behavior which can influence the performance of a model;

editing the positive sample data and/or the negative sample data according to the attack behavior;

and training a model corresponding to the target task by adopting the positive sample data, the negative sample data, the edited positive sample data and the edited negative sample data.

The attack behavior under the target task is some operation behaviors that the model input data may experience in practical application when the model corresponding to the target task is used for processing the model input data, and the operation behaviors may generate interference influence on the processing result of the model. For example, the target task is a task of detecting whether an image contains a certain target object, and the attack behavior may include scaling, filtering, adjusting resolution, forwarding, beautifying, and the like on the input image, and these behaviors may affect the processing result of the model. Therefore, in order to improve the robustness of the model, in the training stage, the attacking behaviors are implemented on the training samples, so that the training samples after the attacking behaviors are implemented and the training samples before the attacking behaviors are implemented are used together to train the model, and the robustness of the model can be improved.

For example, the attack behavior includes a forwarding behavior, and the effect of the forwarding behavior on the training sample is mainly embodied as performing compression processing on the training sample, so that compression processing can be performed on the collected positive sample data and the generated negative sample data. Other attacks behave in the same way.

For ease of understanding, the implementation of the above sample generation method will be exemplarily described below with reference to the target task in different application scenarios.

Fig. 2 is a flowchart of a sample generation method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

201. acquiring target tasks and normal sample data corresponding to the target tasks, wherein the target tasks are used for detecting whether the images contain the target objects, and the normal sample data are normal sample images not containing the target objects.

202. A candidate image region including a target object for detection is determined from the normal sample image.

203. Acquiring a first negative example sample image corresponding to the target task, and extracting a target image area containing the target object according to position marking information in the first negative example sample image, wherein the position marking information is used for marking the position of the target object in the first negative example sample image.

204. And adjusting the form of the target image area, and superposing the adjusted target image area into the candidate image area in the positive example sample image to obtain a second negative example sample image and the mark information of the second negative example sample image.

One application scenario assumed in the present embodiment is an image detection scenario in which a task for detecting whether or not a certain specific target object is included in an input image can be set as a target task. In practice, the target task needs to detect whether the target object is included in the input image, and needs to locate a position area of the target object in the input image to output as a detection result.

Under the target task, the normal sample image may be an image collected for the application scene that does not contain the target object. For example, if the objective task is to detect whether a certain wine contains impurities, the image of the positive example sample can be obtained by shooting several bottles of wine without impurities, that is, the target object of the impurities is not contained in the image of the positive example sample. However, in practical applications, a small amount of negative example sample images including the target object may be obtained, and the position area corresponding to the target object (foreign matter) may be marked in the negative example sample images.

The purpose of collecting positive example sample images not containing the target object and negative example sample images containing the target object with the label information is to train a detection model for completing the target task, and in order to ensure the performance of the detection model obtained by final training, a sufficient number of training samples, especially a sufficient number of negative example sample images, need to be obtained. Therefore, sufficient training samples with labeled information are important prerequisites for ensuring model performance.

Therefore, in order to generate more negative example sample images with corresponding marking information, first, a candidate image region containing a target object for detection is extracted from any of the positive example sample images, then, a first negative example sample image corresponding to the target task that has been collected is acquired, and a target image region containing the target object is extracted based on position marking information in the first negative example sample image, the position marking information being used to mark the position of the target object in the first negative example sample image.

Wherein the determination of the candidate image regions is related to a target task for detecting a position region of the target object in the input image, the candidate image regions will then correspond to image regions where the target object may be located in the image. For example, under the objective task of detecting impurities in wine, in a static state of wine, if impurities are likely to appear near the bottle mouth, the candidate image area may be set as a set bottle mouth area of a wine bottle, and at this time, the bottle mouth area needs to be determined as a candidate image area from a regular sample image that may include a wine bottle and some background objects, that is, the position of the candidate image area is identified and recorded.

In practical application, the foreground and background images of the normal sample image can be separated, then a candidate image area for detecting the target object is identified in the foreground image, and the position of the candidate image area is recorded. Specifically, the recognition processing may be completed by acquiring the prompt information according to a candidate image region corresponding to the target task, which is set in advance. In the above example of detecting impurities in wine, the prompt information is, for example: the rectangular area with the height of 5 cm (or N pixel points) below the bottle mouth in the bottle body is used as a candidate image area.

Since a small number of negative example images can be collected and include the position mark information of the target object in the negative example images. Therefore, one of the negative example sample images that has been collected is randomly selected as a first negative example sample image from which an image area corresponding to the target object, referred to as a target image area, is extracted based on the position mark information of the target object therein.

Then, the morphology of the target image area may be adjusted, and the adjusted target image area is superimposed on the candidate image area in the positive example image, so as to obtain a second negative example image and label information of the second negative example image. The form adjustment may include adjustment of size, shape, and the like. A position may be randomly selected in the candidate image region, and the adjusted target image region may be fused to the selected position of the candidate image region, so that the finally obtained second negative example sample image exhibits a display effect of the target object being fused. The mark information of the second negative example image is the position mark of the target image area in the second negative example image.

It will be appreciated that by selecting a positive example image and an already collected negative example image through the above process, new negative example images can be generated. Then a large number of new negative example images can be generated by traversing the plurality of positive example images and the negative example images that have already been collected. Finally, the training of the corresponding detection model can be completed by using all positive example sample images and all negative example sample images.

The image detection scenario in the embodiment shown in fig. 2 may be, for example, a medical image detection scenario, and the following describes, in conjunction with fig. 3, an implementation procedure of the sample generation method in the medical image detection scenario.

Fig. 3 is a flowchart of a sample generation method according to an embodiment of the present invention, and as shown in fig. 3, the method includes the following steps:

301. the method comprises the steps of obtaining a first medical image corresponding to a medical image detection task, wherein the medical detection task is used for detecting whether a set lesion occurs in a target body part, and the first medical image is a medical image indicating that the target body part is in a healthy state.

302. And determining a candidate image area corresponding to the target body part from the first medical image based on the hospital image detection task.

303. And acquiring a lesion image corresponding to the set lesion.

304. The lesion image is superimposed on the candidate image region in the first medical image to obtain a second medical image as a negative example and labeling information of the second medical image.

In a medical image detection scenario, due to the involvement of user private data, the number of samples with valid labels that can be used for model training is very rare, and especially for the detection of a certain disease, such as the detection of a specific tumor, only a data set containing tens of labeled detection areas is often included. The tumor target varies greatly in size, shape, and location, resulting in a small difference between tumor and non-tumor, and a small amount of tag data is difficult to characterize such a complex detection problem. Therefore, it is necessary to expand enough sample data with tags so that a model with good detection performance can be trained.

In this medical image detection scenario, the setting target task is to detect whether or not a set lesion occurs in a target body part, and the positive sample data is a medical image indicating that the target body part is in a healthy state, and is referred to as a first medical image. For example, the target body part is a lung, and the lesion is lung cancer, i.e., a lung tumor. A large number of medical images of healthy lungs can be collected as a positive example.

In other words, in a medical image detection scene, by using the sample generation method provided by the embodiment of the present invention, an image sample in which a designated marking region belongs to a lesion abnormal region can be generated according to a normal and healthy image sample, and a medical image classification or detection task is completed.

First, taking an arbitrary first medical image selected from a plurality of collected medical images as a positive example sample as an example, a candidate image region corresponding to a target body part is determined from the first medical image. In practical applications, the first medical image may be various medical images captured by an imaging technique, and the captured first medical image may include other pixel regions besides the image region corresponding to the target body part, from which the image region corresponding to the target body part may be identified as the candidate image region. The image region of the target body-part corresponding thereto in the first medical image may be identified based on the contour features of the target body-part. In addition, in practical applications, when the first medical image is obtained by capturing the target body part, the brightness or color of the corresponding pixel of the image region corresponding to the target body part in the first medical image is obviously distinguished from other pixels (regarded as background), so that the image region corresponding to the target body part in the first medical image, that is, the candidate image region can be identified based on the difference in the pixel color and brightness.

And then acquiring a lesion image corresponding to the set lesion, and overlaying the acquired lesion image to a candidate image area in the first medical image to acquire a second medical image serving as a negative example sample and mark information of the second medical image.

In contrast to the above-mentioned first medical image in which the target body part is in a healthy state, in practical applications, a small number of medical images indicating that the target body part has a set lesion may be collected in advance, and the medical images are referred to as third medical images, and the third medical images are negative examples. In order to generate more negative examples based on a large number of positive examples, a lesion region corresponding to a set lesion is extracted as the lesion image based on marking information for a lesion position in the third medical image that has been acquired and indicates that the set lesion occurs in the target body part. Then, the morphology of the lesion region may be adjusted, and the adjusted lesion region may be superimposed on the candidate image region identified in the first medical image, so as to obtain the second medical image as a negative example and the label information of the second medical image. Wherein the marking information of the second medical image is the position marking of the lesion region therein in the second medical image.

It is understood that the lesion area after each morphological adjustment can be respectively fused to the candidate image area in the first medical image, so that a plurality of negative example sample images can be obtained based on the deformation results of one first medical image and a plurality of lesion areas.

Each of the generated second medical images as negative example sample images and the first medical image as positive example sample image may be used for training of a model that accomplishes the above-described medical examination task.

In an optional embodiment, to improve the performance of the model, the method may further include:

acquiring an attack behavior of a user under a medical image detection task, wherein the attack behavior is a behavior which can influence the performance of a model;

editing the first medical image and/or the second medical image according to the attack behavior;

and training a model corresponding to the medical image detection task by adopting the first medical image, the second medical image, the edited first medical image and the edited second medical image.

In practical applications, the above attack behavior may be a copying behavior, a dump behavior, a compressed transmission behavior, and the like. For example, when a medical image detected by a user is photographed to obtain a corresponding training sample image, the resolution of the camera may affect the sharpness of the image, so that the resolution of the medical image may be adjusted to simulate such a behavior. For another example, a large number of medical images are stored in a terminal device of a medical institution, and when a certain server completes a model training task, the terminal device may upload the locally stored medical images to the server, and in this process, an operator of the terminal device may compress the local medical images to upload the compressed medical images to the server, so that the server may simulate the effect of compression transmission on the medical images (actually, the medical images collected in the server may not be obtained by the transmission method, and the simulation effect may be performed on all the medical images regardless of the collection method).

The above-described procedure is exemplified in fig. 4 by taking the target body part as the lung region and setting the lesion to be a tumor.

Fig. 5 is a flowchart of a sample generation method according to an embodiment of the present invention, and as shown in fig. 5, the method includes the following steps:

501. the method comprises the steps of obtaining a target task and normal sample data corresponding to the target task, wherein the target task is to detect whether video data contain forged voice and images, and the normal sample data is a normal sample video which does not contain the forged voice and images.

502. Candidate speech segments and candidate image frame sequences are randomly extracted from the positive sample video.

503. Synthesizing a forged voice segment corresponding to the candidate voice segment and synthesizing a sequence of forged image frames corresponding to the sequence of candidate image frames.

504. And replacing the candidate voice segment by the fake voice segment and/or replacing the candidate image frame sequence by the fake image frame sequence to obtain the negative example video containing the fake voice segment and/or the fake image frame sequence and the mark information of the negative example video.

The sample generation scheme provided by the embodiment of the invention can also be suitable for the problem of insufficient training samples with labels in a multi-modal hybrid detection scene.

A multimodal classification scenario, such as a public persona multimodal counterfeit video detection scenario. Lawbreakers can utilize a deep forgery (deepfake) technology to synthesize a false lecture video of a target public character, so that social public opinion crisis is caused, and damage is brought to individuals and enterprises. In an actual forged video detection scene, a forged video may have a mixture of two modalities, namely a forged voice modality and a video frame image modality, and forged synthesis is performed on only certain key segments of the video. Most deep forgery detection methods focus on the detection of only one of the modalities, while most data sets used to train a detection model of forged video also have only one modality. If a multi-mode forged video detection model with a good effect is to be obtained, a data set containing mixed multi-mode forged contents needs to be constructed, and the forged video detection model is trained in a multi-mode fusion mode.

In order to generate more video samples containing multi-modal forgery information, first, a large amount of original video not containing forgery information is prepared as regular sample video. Then, a speech segment position and an image frame segment position to be synthesized are randomly selected in a regular sample video, a speech segment corresponding to the speech segment position in the regular sample video is called a candidate speech segment, and a multi-frame image corresponding to the image frame segment position is called a candidate image frame sequence. In practice, the candidate speech segment does not require temporal alignment with the candidate image frame sequence.

Then, the voice content corresponding to the candidate voice segment is synthesized by using the depth forgery technology, which is called a forged voice segment, and the image frame content corresponding to the candidate image frame sequence is synthesized, which is called a forged image frame sequence. Then, the synthesized fake voice segment and fake image frame sequence can be used to replace the candidate voice segment and candidate image frame sequence in the positive example sample video, so as to obtain a fake video fused with the fake voice segment and fake image frame sequence as the negative example sample video. The label information of the negative example video may include label information indicating whether a forged voice, a forged image, and label information of a corresponding position of the forged voice, the forged image are contained.

The above implementation is illustrated in fig. 6. Fig. 6 illustrates three multimodal fusion methods: the method comprises the steps of firstly, forging a video containing original voice and a synthesized forged image frame sequence; a forged video containing a synthesized forged voice segment and a forged image frame sequence; and thirdly, the forged video comprises synthesized forged voice segments and original image frame sequences.

The sample generation scheme provided by the embodiment of the invention can also be applied to some auditing scenes related to safety and forensics, such as detection scenes of image embezzlement.

In some practical applications, a large number of merchants can upload images for promoting and introducing themselves to a certain service platform, so as to output the images to a large number of users for knowing through the service platform. For example, a merchant may upload store images and related introduction information to the map application platform, so that a user may see the introduction information of the related merchant and the uploaded images thereof when searching for nearby food, business superman, and the like using the map application.

When a merchant uploads an image of a store or a commodity, a service platform (such as the map application platform) needs to verify the uploaded image, and only the image passing the verification can be issued to a corresponding service program. The auditing task is to audit whether the uploaded image is a stolen image or not, for example, because sample labels which are relatively easy to obtain in practical application are only stolen or not, a deep learning model can be trained based on collected sample images marked with stolen or not. However, the service platform may also want to obtain more detailed verification failure reasons, for example, the image to be detected may include other platform watermarks, the image to be detected is a screen shot image or a computer synthesis effect image, and the image to be detected is a copied image of another user. These are fine category labels that the image review that the service platform side wants to obtain does not pass, and the number of samples of these fine category labels that can be obtained is rare and it is difficult to have correct manual labels.

When a new detection category is required to be added on the service platform side, the sample generation scheme provided by the embodiment of the invention can automatically generate a large number of label samples corresponding to the requirements according to the new requirements on the service platform side to perform model retraining or enhanced training.

That is, after a deep learning model is deployed on the ground, the deep learning model is likely to meet the requirement of a new fine classification detection task. For example, in a platform image stealing detection scene, after the model is deployed on the ground, it may be necessary to obtain more detailed detection items of the failure cause fine categories, where the fine categories include that the image to be detected may contain other platform watermarks, the image to be detected is a screen captured image or a stealing stitching composite effect image, the image to be detected is a copying stealing image, and the like. The new fine classification requirements are often less, and only a very small number of label samples cannot meet the requirements of model fine tuning training. At this time, the scheme provided by the embodiment of the invention can be applied to carry out sample data generation processing on the newly added fine classification task, so that the actual landing requirement is met, and the actual landing performance of the depth model is improved. The following examples are given for illustrative purposes.

Fig. 7a is a flowchart of a sample generation method according to an embodiment of the present invention, and as shown in fig. 7a, the method includes the following steps:

701a, acquiring a target task and normal sample data corresponding to the target task, wherein the target task is a sub-classification task added under a set image classification task, the normal sample data is a normal sample image corresponding to the set image classification task, and the sub-classification task comprises identifying whether the set image contains a set watermark.

And 702a, determining a background image area of the positive sample image.

703a, selecting a target watermark from the set watermark set, wherein the watermark stored in the watermark set is different from the set watermark.

704a, fusing the target watermark into the background image area of the positive example image to obtain the negative example image corresponding to the sub-classification task and the marking information of the negative example image.

The setting image classification task in the present embodiment is, for example, the above-described binary classification task of whether or not the image is stolen. When specific stealing categories are required to be known for stealing images, corresponding sub-classification tasks can be added. In this embodiment, it is assumed that the added sub-classification tasks include a watermark embezzlement classification task, that is, it is determined whether a watermark included in an input image is a watermark of a certain set service platform.

When the model executing the two classification tasks (i.e., corresponding to the set image classification task) is initially trained, a large number of training sample images are collected, and in the present embodiment, the positive sample image is used as the positive sample image shared by the newly added sub-classification tasks.

For the sub task of watermark classification, assuming that the currently used service platform is the service platform a, if the image uploaded by the merchant carries watermarks of other service platforms such as the service platform B, the image is considered to be a stolen image, i.e. the image cannot pass the audit, and the image does not pass the reason: watermarks of other service platforms are used.

To generate a negative example sample image corresponding to this subtask, first, a background image area of the positive example sample image, that is, an image area not containing a foreground object, is identified. For example, in fig. 8, the foreground object is a human body in the image, and a pixel area in the image except for a pixel area corresponding to the human body is a background image area. In addition, watermark images of other service platforms can be collected in advance to form a watermark set, a watermark can be randomly selected from the watermark set to serve as a current target watermark, and then the target watermark is fused into a background image area of the positive sample image, which is equivalent to the target watermark being randomly inserted into a certain position in the background image area, so that a negative sample image is obtained. Meanwhile, generating the marking information of the negative sample image based on the semantics of the subtask, for example, the semantics represents that the subtask needs to identify whether the input image contains watermarks of other service platforms, and the marking information may be: watermarks comprising other service platforms; if the semantic representation subtask needs to detect whether the watermark and the watermark position of other service platforms are included, the marking information may be: the watermark of other service platforms and the corresponding position information of the watermark in the negative sample image are contained. The above processing is described with reference to fig. 8.

Fig. 7b is a flowchart of a sample generation method according to an embodiment of the present invention, and as shown in fig. 7b, the method includes the following steps:

701b, acquiring a target task and normal sample data corresponding to the target task, wherein the target task is a sub-classification task added under the set image classification task, the normal sample data is a normal sample image corresponding to the set image classification task, and the sub-classification task comprises the step of identifying whether screenshot embezzlement exists in the image.

And 702b, extracting foreground image areas with different sizes from the normal sample image.

703b, acquiring an additional frame.

704b, adaptively fusing the additional frame to the edges of the foreground image areas with different sizes to obtain negative example images corresponding to the sub-classification task and the marking information of the negative example images.

In this embodiment, it is assumed that the added sub-classification task includes identifying whether a screenshot stealing situation exists in the image, that is, determining whether the input image is a screenshot image.

Aiming at identifying the subtask of screenshot embezzlement, the influence of the operation behavior of 'screenshot' on the image display effect needs to be embodied in the generated negative sample image. Therefore, in order to generate the negative example sample image corresponding to the subtask, first, foreground image regions with different sizes are extracted from the positive example sample image, because the screenshot behavior is often performed on a foreground object, as shown in fig. 8, the foreground object is a human body in the diagram, and then different rectangular regions surrounding the human body can be intercepted as different foreground image regions. In addition, in practical application, it is found that the operation is often not so accurate when a user captures a picture, and a frame such as black and white edges exists in many screenshots, so that some common frame thicknesses can be obtained through statistics of a large number of screenshots. Based on this, a frame thickness can be randomly selected from the plurality to generate an additional frame matching the size of each foreground image region. Finally, each additional frame is correspondingly placed at the edge of the corresponding foreground image area, namely the foreground image area is placed in the corresponding additional frame to synthesize an image which is used as a negative example sample image under the subtask. At this time, the flag information of the negative example sample image may be: there is a screenshot stealing situation in the negative example sample image. The above processing is described with reference to fig. 8.

In summary, in order to solve the problem that training samples with label information in multiple types of scenes, such as a scene related to user privacy, an audit scene related to security, a multi-modal hybrid detection scene, and the like, are insufficient, an embodiment of the present invention provides a small sample learning framework for generating corresponding negative sample data based on semantics of a self-adaptive target task, which can conveniently perform self-adaptive transformation on a large number of obtained positive sample data to generate sample data of a target class, and can obtain enough sample data of the target class.

As described above, the sample generation method provided by the present invention can be executed in the cloud, and a plurality of computing nodes may be deployed in the cloud, and each computing node has processing resources such as computation and storage. In the cloud, a plurality of computing nodes may be organized to provide a service, and of course, one computing node may also provide one or more services. The way that the cloud provides the service may be to provide a service interface to the outside, and the user calls the service interface to use the corresponding service. The service Interface includes Software Development Kit (SDK), Application Programming Interface (API), and other forms.

According to the scheme provided by the embodiment of the invention, the cloud end can be provided with a service interface of the sample generation service, and the user calls the service interface through the user equipment so as to trigger a calling request to the cloud end. The cloud determines the compute node that responds to the request, and performs the steps provided in the foregoing embodiments using the processing resources in the compute node. For example, fig. 4 illustrates that the user terminal triggers a request including a positive example sample image and a target task to the cloud server cluster, and the cloud server responds to the request to execute a process of generating a negative example sample image corresponding to the target task.

The sample generation apparatus of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these means can each be constructed using commercially available hardware components and by performing the steps taught in this disclosure.

Fig. 9 is a schematic structural diagram of a sample generation apparatus according to an embodiment of the present invention, and as shown in fig. 9, the apparatus includes: the device comprises an acquisition module 11, a determination module 12 and a fusion module 13.

The obtaining module 11 is configured to obtain a target task and normal sample data corresponding to the target task.

A determining module 12, configured to determine, from the normal sample data, a candidate region corresponding to the target task, and determine alteration content corresponding to the target task.

And the fusion module 13 is configured to fuse the modified content and the candidate region to obtain negative sample data corresponding to the target task.

Optionally, the apparatus further comprises: the training module is used for acquiring the attack behavior of the user under the target task, wherein the attack behavior is the behavior which can influence the performance of the model; editing the positive sample data and/or the negative sample data according to the attack behavior; and training a model corresponding to the target task by adopting the positive sample data, the negative sample data, the edited positive sample data and the edited negative sample data.

Optionally, the target task is to detect whether a target object is included in an image, and the positive example data is a positive example image not including the target object. Based on this, the determining module 12 is specifically configured to: determining a candidate image area containing the target object for detection from the normal sample image; acquiring a first negative example sample image corresponding to the target task, wherein the first negative example sample image is collected; and extracting a target image area containing the target object according to position marking information in the first negative example sample image, wherein the position marking information is used for marking the position of the target object in the first negative example sample image. The fusion module 13 is specifically configured to: and adjusting the shape of the target image area, and superposing the adjusted target image area to the candidate image area in the positive example sample image to obtain a second negative example sample image and the mark information of the second negative example sample image.

Optionally, the target task is to detect whether a set lesion occurs in a target body part, and the positive sample data is a first medical image indicating that the target body part is in a healthy state. Based on this, the determining module 12 is specifically configured to: determining a candidate image area corresponding to the target body part from the first medical image; acquiring a second medical image which is collected and indicates that the target body part has the set lesion; and extracting a lesion area corresponding to the set lesion according to the position mark information in the second medical image. The fusion module 13 is specifically configured to: and adjusting the morphology of the lesion area, and overlaying the adjusted lesion area to the candidate image area in the first medical image to obtain a third medical image serving as a negative example sample and mark information of the third medical image.

Optionally, the target task is to detect whether the video data contains forged voice and images; the positive example data is a positive example video that does not include spurious speech and images. Based on this, the determining module 12 is specifically configured to: randomly extracting candidate voice segments and candidate image frame sequences from the regular sample video; and synthesizing a fake voice segment corresponding to the candidate voice segment and synthesizing a fake image frame sequence corresponding to the candidate image frame sequence. The fusion module 13 is specifically configured to: and replacing the candidate voice segment by the fake voice segment, and/or replacing the candidate image frame sequence by the fake image frame sequence to obtain a negative example video containing the fake voice segment and/or the fake image frame sequence and mark information of the negative example video.

Optionally, the target task is a sub-classification task added under a set image classification task, and the normal sample data is a normal sample image corresponding to the set image classification task.

Optionally, the sub-classification task includes identifying whether the image includes a set watermark. Based on this, the determining module 12 is specifically configured to: determining a background image area of the positive example image; and selecting a target watermark from a set watermark set, wherein the watermark stored in the watermark set is different from the set watermark. The fusion module 13 is specifically configured to: and fusing the target watermark into a background image area of the positive sample image to obtain a negative sample image corresponding to the sub-classification task and the marking information of the negative sample image.

Optionally, the sub-classification task includes identifying whether a screenshot stealing situation exists in the image. Based on this, the determining module 12 is specifically configured to: foreground image areas with different sizes are extracted from the normal sample image; and acquiring an additional frame. The fusion module 13 is specifically configured to: and adaptively fusing the additional frame to the edges of the foreground image areas with different sizes to obtain a negative example image corresponding to the sub-classification task and the marking information of the negative example image.

The apparatus shown in fig. 9 can perform the steps provided in the foregoing embodiments, and the detailed performing process and technical effects refer to the description in the foregoing embodiments, which are not described herein again.

In one possible design, the structure of the sample generation apparatus described above and shown in FIG. 9 may be implemented as an electronic device. As shown in fig. 10, the electronic device may include: a processor 21, a memory 22, and a communication interface 23. Wherein the memory 22 has stored thereon executable code which, when executed by the processor 21, causes the processor 21 to at least implement the sample generation method as provided in the previous embodiments.

Additionally, embodiments of the present invention provide a non-transitory machine-readable storage medium having stored thereon executable code that, when executed by a processor of an electronic device, causes the processor to implement at least the sample generation method as provided in the preceding embodiments.

The above described embodiments of the apparatus are merely illustrative, wherein the network elements illustrated as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of generating a sample, comprising:

2. The method of claim 1, further comprising:

acquiring an attack behavior of a user under the target task, wherein the attack behavior refers to a behavior which can influence the performance of a model;

3. The method according to claim 1, wherein the target task is detecting whether a target object is included in an image, and the positive example data is a positive example image not including the target object;

the determining a candidate area corresponding to the target task from the normal sample data and determining modified content corresponding to the target task include:

determining a candidate image area containing the target object for detection from the normal sample image;

acquiring a first negative example sample image corresponding to the target task, wherein the first negative example sample image is collected;

extracting a target image area containing the target object according to position marking information in the first negative example sample image, wherein the position marking information is used for marking the position of the target object in the first negative example sample image;

the fusing the modified content and the candidate area to obtain negative sample data corresponding to the target task includes:

and adjusting the shape of the target image area, and superposing the adjusted target image area to the candidate image area in the positive example sample image to obtain a second negative example sample image and the mark information of the second negative example sample image.

4. The method according to claim 3, wherein the target task is detecting whether a target body part has a set lesion, and the positive sample data is a first medical image indicating that the target body part is in a healthy state;

the determining a candidate region corresponding to the target task from the normal sample data and determining modified content corresponding to the target task comprise:

determining a candidate image area corresponding to the target body part from the first medical image;

acquiring a second medical image which is collected and indicates that the target body part has the set lesion;

extracting a lesion area corresponding to the set lesion according to the position mark information in the second medical image;

and adjusting the morphology of the lesion area, and overlaying the adjusted lesion area to the candidate image area in the first medical image to obtain a third medical image serving as a negative example sample and mark information of the third medical image.

5. The method according to claim 1, wherein the target task is detecting whether or not a forged voice and image are contained in the video data; the positive example sample data is a positive example video not containing forged voice and images;

randomly extracting candidate voice segments and candidate image frame sequences from the regular sample video;

synthesizing a forged voice segment corresponding to the candidate voice segment and synthesizing a forged image frame sequence corresponding to the candidate image frame sequence;

and replacing the candidate voice segment by the fake voice segment, and/or replacing the candidate image frame sequence by the fake image frame sequence to obtain a negative example video containing the fake voice segment and/or the fake image frame sequence and mark information of the negative example video.

6. The method according to claim 1, wherein the target task is a sub-classification task added under a set image classification task, and the normal sample data is a normal sample image corresponding to the set image classification task.

7. The method of claim 6, wherein the sub-classification task comprises identifying whether a set watermark is included in the image;

determining a background image area of the positive example image;

selecting a target watermark from a set watermark set, wherein the watermark stored in the watermark set is different from the set watermark;

and fusing the target watermark into a background image area of the positive sample image to obtain a negative sample image corresponding to the sub-classification task and the marking information of the negative sample image.

8. The method of claim 6, wherein the sub-classification task comprises identifying whether a screenshot theft situation exists in the image;

foreground image areas with different sizes are extracted from the normal sample image;

acquiring an additional frame;

and adaptively and respectively fusing the additional frame to the edges of the foreground image areas with different sizes to obtain a negative example sample image corresponding to the sub-classification task and the marking information of the negative example sample image.

9. A sample generation device, comprising:

10. An electronic device, comprising: a memory, a processor, a communication interface; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the sample generation method of any of claims 1 to 8.

11. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the sample generation method of any of claims 1 to 8.

12. A method of generating a sample, comprising:

acquiring a lesion image corresponding to the set lesion;

13. The method of claim 12, wherein said acquiring a lesion image corresponding to said set lesion comprises:

acquiring a third medical image which is collected and indicates that the target body part has the set lesion;

extracting a lesion area corresponding to the set lesion from the third hospital image according to the position mark information in the third medical image;

the superimposing the lesion image into the candidate image region in the first medical image comprises:

and adjusting the morphology of the lesion area, and overlaying the adjusted lesion area to the candidate image area in the first medical image.

14. The method of claim 12, further comprising:

acquiring an attack behavior of a user under the medical image detection task, wherein the attack behavior is a behavior which can influence the performance of a model;

and training a model corresponding to the medical image detection task by adopting the first medical image, the second medical image and the edited first medical image and second medical image.