CN114529756A

CN114529756A - Image annotation method and device

Info

Publication number: CN114529756A
Application number: CN202210082788.XA
Authority: CN
Inventors: 钟成; 周颖婕; 邓星; 张泽熙
Original assignee: Zhugao Intelligent Technology Shenzhen Co ltd
Current assignee: Zhugao Intelligent Technology Shenzhen Co ltd
Priority date: 2022-01-24
Filing date: 2022-01-24
Publication date: 2022-05-24

Abstract

The invention discloses an image labeling method and device, wherein the method comprises the following steps: receiving a prediction result of the target deep learning model on an unlabeled sample image, and screening prediction labels of a plurality of sample images based on the prediction result to serve as training labeled images; obtaining a multi-scale image pyramid of a training annotation image, and copying the multi-scale image pyramid into two parts; performing first data processing on one part of the multi-scale image pyramid to obtain a first image, and performing second data processing different from the first data processing or not performing the processing on the other part of the multi-scale image pyramid to obtain a second image; and inputting the first image and the second image into the target deep learning model to obtain a corresponding first prediction label and a corresponding second prediction label, calculating a corresponding loss function, and performing iterative updating on the target deep learning model. The invention can fully utilize the annotation data output by the original target deep learning model, and improve the accuracy of the annotation data while reducing the annotation cost.

Description

Image annotation method and device

Technical Field

The invention relates to the technical field of machine learning, in particular to an image labeling method and device.

Background

The first step in solving the practical problem using the depth model is to obtain annotation data for the corresponding application scenario. Generally speaking, training a better-performing model requires thousands of labeled data, the labeled amount is huge, and when the labeling task involves professional knowledge in the vertical field, the relevant personnel also need to be trained on duty, which leads to rapid rise of labor cost and time cost.

The marking precision is also a crucial ring, strong uncertainty and contingency exist in manual marking, different quality inspection modes need to be designed according to different scenes, more professional quality inspection personnel are trained, and the comprehensive cost is very high. Therefore, there is a need for an automatic image annotation method that can reduce the annotation cost and obtain high-precision annotation data.

Disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides an image annotation method which can reduce the annotation cost and improve the accuracy of the annotation data.

The invention also provides an image annotation device with the image annotation method.

The invention also provides a computer readable storage medium with the image annotation method.

The image annotation method according to the embodiment of the first aspect of the invention comprises the following steps: receiving a prediction result of a target deep learning model on an unlabelled sample image, and screening prediction labels of a plurality of sample images based on the prediction result to serve as training labeled images; obtaining a multi-scale image pyramid of the training annotation image, and copying the multi-scale image pyramid into two parts; performing first data processing on one part of the multi-scale image pyramid to obtain a first image, and performing or not performing second data processing different from the first data processing on the other part of the multi-scale image pyramid to obtain a second image; inputting the first image and the second image into the target deep learning model to obtain a corresponding first prediction label and a corresponding second prediction label, calculating a corresponding loss function according to the first prediction label and the second prediction label, and performing iterative updating on the target deep learning model.

The image annotation method provided by the embodiment of the invention at least has the following beneficial effects: the method can make full use of the annotation data output by the original target deep learning model, screens out a plurality of training annotation images, performs two different processes on each training annotation image as a group of samples, inputs the training annotation images into the target deep learning model, calculates a loss function according to the obtained two prediction labels, and iterates the target deep learning model, so that the annotation cost is reduced, and the accuracy of the annotation data is effectively improved.

According to some embodiments of the present invention, the method for screening out prediction labels of a plurality of sample pictures as training annotation images based on the prediction result comprises: receiving the prediction results of unlabeled sample images; selecting a prediction frame with the confidence coefficient above a preset threshold value from the prediction result as the label of the sample image; and taking the sample image with the label as the training label image.

According to some embodiments of the invention, the first data processing is data intensive and the second data processing is data intensive.

According to some embodiments of the invention, said calculating respective loss functions from said first predictive label and said second predictive label comprises: and taking the second prediction label as a real label of the first multi-scale image pyramid after the first data processing is executed, comparing the real label with the first prediction label, and calculating a corresponding loss function.

According to some embodiments of the invention, the method for iteratively updating the target deep learning model comprises: and inputting the first image and the second image into a second multi-scale thinning branch of the target deep learning model, calculating a loss function of the second multi-scale thinning branch, merging the loss function into a loss function corresponding to a main branch of the target deep learning model, and performing iterative updating on the target deep learning model.

According to some embodiments of the invention, the second multi-scale refined branch of the target deep learning model is shared with weights of a feature extraction network of main branches of the target deep learning model.

According to some embodiments of the invention, the method for iteratively updating the target deep learning model comprises: and inputting the first image and the second image into the target deep learning model, calculating a corresponding loss function, and performing iterative updating on the target deep learning model.

An image annotation apparatus according to an embodiment of a second aspect of the present invention includes: the selection module is used for receiving the prediction result of the target deep learning model on the unmarked sample image, screening the prediction labels of a plurality of sample images based on the prediction result, and using the prediction labels as training marked images; the image processing module is used for acquiring a multi-scale image pyramid of the training annotation image and copying the pyramid into two parts; performing first data processing on one part of the multi-scale image pyramid to obtain a first image, and performing or not performing second data processing different from the first data processing on the other part of the multi-scale image pyramid to obtain a second image; and the training module is used for inputting the first image and the second image into the target deep learning model to obtain a corresponding first prediction label and a corresponding second prediction label, calculating a corresponding loss function according to the first prediction label and the second prediction label, and performing iterative updating on the target deep learning model.

The image annotation device provided by the embodiment of the invention at least has the following beneficial effects: the method can make full use of the annotation data output by the original target deep learning model, screens out a plurality of training annotation images, performs two different processes on each training annotation image as a group of samples, inputs the training annotation images into the target deep learning model, calculates a loss function according to the obtained two prediction labels, and iterates the target deep learning model, so that the annotation cost is reduced, and the accuracy of the annotation data is effectively improved.

A computer-readable storage medium according to an embodiment of the third aspect of the invention has stored thereon a computer program which, when executed by a processor, implements a method according to an embodiment of the first aspect of the invention.

The computer-readable storage medium according to an embodiment of the present invention has at least the same advantageous effects as the method according to an embodiment of the first aspect of the present invention.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of data interaction in a method according to an embodiment of the invention;

FIG. 3 is two examples of a target deep learning model;

FIG. 4 is a schematic block diagram of an exemplary training process for a target deep learning model using methods of embodiments of the present invention;

FIG. 5 is a block diagram of the modules of the system of an embodiment of the present invention.

Reference numerals:

a selection module 100, an image processing module 200 and a training module 300.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

In the description of the present invention, the meaning of a plurality is one or more, the meaning of a plurality is two or more, and larger, smaller, larger, etc. are understood as excluding the present numbers, and larger, smaller, inner, etc. are understood as including the present numbers. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated. In the description of the present invention, the step numbers are merely used for convenience of description or for convenience of reference, and the sequence numbers of the steps do not mean the execution sequence, and the execution sequence of the steps should be determined by the functions and the inherent logic, and should not constitute any limitation to the implementation process of the embodiment of the present invention.

Referring to fig. 1, a method of an embodiment of the present invention includes:

(1) receiving a prediction result of the target deep learning model on an unlabelled sample image, and screening prediction labels of a plurality of sample images based on the prediction result to serve as training labeled images;

(2) obtaining a multi-scale image pyramid of a training annotation image, and copying the multi-scale image pyramid into two parts; performing first data processing on one part of the multi-scale image pyramid to obtain a first image, and performing second data processing different from the first data processing or not performing the processing on the other part of the multi-scale image pyramid to obtain a second image;

(3) inputting the first image and the second image into a target depth learning model to obtain a corresponding first prediction label and a corresponding second prediction label; and calculating a corresponding loss function according to the first prediction label and the second prediction label, and performing iterative updating on the target deep learning model.

The flow of the whole system processing data will be described below by taking fig. 2 as an example after applying the method to the deep learning model for small sample training. The method of the embodiment of the present invention corresponds to semi-supervised loop training in fig. 2. It should be understood that fig. 2 is only an example, and the method of the embodiment of the present invention is not limited thereto, and may also be applied to other target depth learning models to improve the accuracy of image annotation while reducing the cost of annotation.

Firstly, a user uploads image data through a data set uploading interface and performs data cleaning, wherein the data cleaning process comprises the following steps: removing damaged images, removing duplicate images, removing unsupported format images, data quality assessment (quantity, resolution), etc. Then, selecting a part of images from the cleaned image data through a clustering method as starting image data to be labeled, and enabling the data to be labeled by a user through interface interaction; the user can use the annotation tool to annotate the start-up image data. And inputting labeled starting image data (equivalent to sample labeled data), and training through a small sample deep learning model (few-shot model in fig. 2) to obtain a rough label model. And part of data can be sampled and returned through the rough marking model, and the marking effect can be confirmed by a user through an interactive interface, if the precision is achieved, the training is stopped, and a marking result is output. And if the model precision obtained by the training of the small sample deep learning model does not meet the preset requirement, adding semi-supervised cyclic training to improve the model precision.

In this embodiment, the training method for deep learning of a small sample can be implemented by applying a multi-scale refinement branch, and the training method for the multi-scale refinement branch can be applied to a single-stage detection model and a dual-stage detection model as shown in fig. 3. As shown in fig. 3, in the single-stage detection model, training data is input to the feature extractor to obtain a feature map, and the position and the category of the target are directly found according to the feature map. In the two-stage detection model, in the first stage, training data is input to a feature extractor to obtain a feature map; and a second stage, determining a candidate region according to the characteristic diagram, determining a target region with a target from the candidate region, and then classifying and positioning the target according to the target region. Generally, the target location of the two-stage detection model is more accurate than that of the single-stage detection model.

The specific training process of the small sample deep learning in the embodiment shown in fig. 2 will be described below by taking a two-stage detection model as an example in the case of applying a multi-scale refinement branch. The training process includes the following steps S110 to S140.

Step S110, inputting a sample labeling image, cutting a positive sample target from the sample labeling image, carrying out multi-scale scaling on the cut positive sample target, and generating a multi-scale image pyramid as the input of the first multi-scale thinning branch.

Step S120, inputting the original sample labeled image into a trunk (also called a main branch), inputting a corresponding multi-scale image pyramid into a first multi-scale thinning branch, and obtaining corresponding image characteristics after a second characteristic extraction network. And the second characteristic extraction network shares the weight with the first characteristic extraction network.

And S130, inputting the sample labeling image into a backbone network, and calculating a corresponding loss function after the sample labeling image is input into the backbone network through a first feature extraction network, wherein the original image feature in the main branch is a normal training process.

And step S140, inputting the multi-scale image pyramid into the first multi-scale thinning branch, calculating a loss function corresponding to the branch, combining the loss functions into the loss function of the main branch, and performing iterative updating on the detection network.

An example small sample training process for a two-stage detection model is shown in FIG. 4. In the step S130, the sample labeled image is input into the backbone network, and after passing through the first feature extraction network, the sample labeled image enters a region of interest (ROI) classification regression network to obtain a final prediction result. And calculating corresponding function functions of the backbone network, such as the background classification loss, the RPN frame regression loss, the class classification loss and the ROI frame regression loss shown in the figure 4, and performing iterative updating on the backbone network. In this embodiment, the first feature extraction network may be an FPN network or another network.

In step S140, since the obtained image features of the first multi-scale refinement branch are positive sample image features, only the class classification loss and the background classification loss corresponding to the branch need to be calculated. Referring to fig. 4, the class classification penalty of the first multi-scale refinement branch is merged to the class classification penalty of the trunk; merging the background classification penalty of the first multi-scale refinement branch to the background class classification penalty of the trunk. And after the updated background classification loss and the classification loss of the backbone network, and the RPN frame regression loss and the ROI frame regression loss of the backbone network, performing iterative updating on the backbone network. And in the iterative updating process, the weight of the first feature extraction network is shared and synchronized to the second feature extraction network through the weight.

If the detection model is a single-stage detection model, the calculated loss function is different from the single-stage detection model when the steps S110 to S140 are performed. For the single-stage detection model, the first multi-scale refinement branch only needs to calculate, for example, classification loss, and merge to the backbone network.

The detection model may also be any other type of neural network structure, including various well-known structures. The loss function in the detection model may also be other types of loss functions. At this time, it is only necessary to calculate the corresponding loss function in the above steps S130 and S140.

Obviously, the training method of the small sample deep learning can also be realized without applying the multi-scale refinement branch. The original sample labeling image and the corresponding multi-scale image pyramid are input into a backbone network, and corresponding loss functions are calculated to iterate a detection network. At this time, the detection network has only a backbone network (refer to the middle box in fig. 4), and no multi-scale refinement branch exists.

Through the strengthening effect of the first multi-scale refining branch, the recognition capability of the model to the sample characteristics is effectively enhanced, and the detection precision of the model is improved. The model can generally achieve the detection precision of 80% of all data by dozens of data.

When the labeling effect of the small sample learning model cannot reach the preset precision, the semi-supervised loop training is started in the embodiment shown in fig. 2, and the model precision is further improved by using the method of the embodiment of the invention. Referring to the second multi-scale refinement branch in fig. 4, the training process includes the following steps:

step 1, predicting all un-labeled data by using a detection network model (equivalent to a target deep learning model) trained in the previous round, and selecting a prediction frame with the confidence coefficient above a certain threshold value as a label of the image to be used as a sample label image input by the training in the current round.

That is, the input samples are labeled with images in the training round, and the labeled prediction boxes only include the prediction boxes with the confidence degrees above a certain threshold.

Step 2, copying the multi-scale image pyramid into two parts, performing first data processing on one part to obtain a first image, and performing no processing or second data processing on the other part to obtain a second image; using the obtained two image data, namely the first image and the second image, as a group of input samples, inputting the input samples into a second multi-scale thinning branch, and predicting through a third feature extraction network; and the third feature extraction network shares the weight with the first feature extraction network and the second feature extraction network.

In this embodiment, the first data processing is data intensive enhancement, and the second data processing is data intensive enhancement. Fig. 4 shows that the data intensity increase and the data intensity increase are respectively performed on the two multi-scale image pyramids obtained by copying, so as to obtain corresponding intensity-enhanced images and corresponding intensity-enhanced images.

In this embodiment, if the other part is not processed, it means that the multi-scale image pyramid is directly input.

The data enhancement may be a combination of various data enhancement methods, including a method of changing or not changing the structure and characteristics of the image data, or a combination of various methods only changing the structure and characteristics of the image data, that is, the data enhancement is to process the input image by at least one method of changing the structure and characteristics of the image data, such as gaussian blur or noise increase. And the data weak enhancement is a data enhancement method which does not change the structure and the characteristics of image data, such as flip translation and the like. That is, a strong enhancement may be considered a weak enhancement as well as a method of changing the data structure, characteristics, or a combination thereof.

And 3, regarding a group of input samples, taking a first prediction result label output by a second image (equivalent to the weakly-increased image in the graph 4) as a pseudo label, namely, the pseudo label is a real label of a first image (equivalent to the strongly-enhanced image in the graph 4), calculating a corresponding loss function corresponding to the second multi-scale thinning branch, merging the loss function into a loss function of the main network, and performing iterative update on the main network to optimize the network.

Taking the two-stage detection model in fig. 4 as an example, for example, the class classification loss and the background classification loss are calculated in the second multi-scale refinement branch, the class classification loss and the background classification loss are merged into the backbone network, the backbone network is iteratively updated, and the network is optimized.

If the target deep learning model is a single-stage detection model, for example, the classification loss is calculated in the second multi-scale refinement branch, the classification loss is combined into the classification loss in the backbone network, the backbone network is updated iteratively, and the network is optimized.

The target deep learning model in the present embodiment is not limited to the single-stage detection model or the two-stage detection model described above, and may be any other type of neural network structure, including various known structures. The loss function in the detection model may also be other types of loss functions. At this time, it is only necessary to calculate the corresponding loss function in the above steps S130 and S140. The loss function in the target deep learning model in the present embodiment may also be other types of loss functions.

And 4, repeating the steps 1-3 to carry out circular training until the model meets the precision requirement or the set maximum cycle number.

Through the training mode, the influence of the noise label on the network precision can be weakened, more target modes are learned by the network through different data enhancement, the robustness to the complex environment is higher, the representative characteristics of the target can be better learned, and the model precision is improved.

In other embodiments of the present invention, the processed first image and the processed second image may be directly input to the target deep learning model, and a corresponding loss function is calculated to perform iterative update on the target deep learning model. Namely, at this time, the multi-scale refined branch of the target deep learning model is not constructed, but is directly input into the main branch of the target deep learning model, corresponding loss is calculated, and the target deep learning model is iteratively updated. Taking the two-stage detection model shown in fig. 4 as an example, the loss functions to be calculated include, for example, class classification loss, background classification loss, RPN bounding box regression loss, and ROI bounding box regression loss. For the single-stage detection model, the loss function to be calculated includes, for example, classification loss. It should be understood that the two-stage detection model and the single-stage detection model are only examples of the target deep learning model and are not limited to the target deep learning model. The above-described loss functions are also merely illustrative examples, and the loss functions of the embodiments of the present invention are not limited thereto.

Referring to fig. 5, an internal module of an apparatus according to an embodiment of the present invention includes: a selection module 100, an image processing module 200 and a training module 300.

The selection module 100 is configured to receive a prediction result of the target deep learning model on an unlabeled sample image, where, for each sample image, the prediction result includes a plurality of prediction labels; then, screening a plurality of sample pictures and prediction labels corresponding to the sample pictures from the prediction result to be used as training annotation images. In particular, the predicted labels may be chosen according to their confidence. That is, only the prediction box with the confidence coefficient above a certain threshold in the picture is taken as the labeling prediction box; if a prediction frame with the confidence coefficient higher than a certain threshold value does not exist in a certain picture, the picture is not listed as a training annotation image.

The image processing module 200 is configured to receive the training annotation image selected by the selecting module 100, obtain a multi-scale image pyramid of the training annotation image, and copy the multi-scale image pyramid into two parts; and carrying out different processing on the two multi-scale image pyramids to obtain a first image and a second image. Specifically, first data processing is performed on one of the multi-scale image pyramids to obtain a first image, and second data processing different from the first data processing is performed on the other of the multi-scale image pyramids to obtain a second image. Or specifically, the first data processing is performed on one of the multi-scale image pyramids to obtain a first image, and the other multi-scale image pyramid is directly used as a second image without any processing.

The first data processing specifically includes: the data is strongly enhanced. The second data processing is specifically: the data weakly increases.

The training module 300 is configured to receive the first image and the second image input by the image processing module 200, input the first image and the second image into the target deep learning model, obtain corresponding first prediction labels and second prediction labels, calculate corresponding loss functions according to the first prediction labels and the second prediction labels, and perform iterative update on the target deep learning model.

In the iterative process, the target deep learning model can be trained by reselecting a new training annotation image through the selection module 100 according to the prediction result of the previous round.

The device provided by the embodiment of the invention can weaken the influence of the noise label on the network precision, and through different data enhancement, the network learns more target modes, the robustness to the complex environment is higher, the representative characteristics of the target can be better learned, and the model precision is improved.

Although specific embodiments have been described herein, those of ordinary skill in the art will recognize that many other modifications or alternative embodiments are equally within the scope of this disclosure. For example, any of the functions and/or processing capabilities described in connection with a particular device or component may be performed by any other device or component. In addition, while various exemplary implementations and architectures have been described in accordance with embodiments of the present disclosure, those of ordinary skill in the art will recognize that many other modifications to the exemplary implementations and architectures described herein are also within the scope of the present disclosure.

Certain aspects of the present disclosure are described above with reference to block diagrams and flowchart illustrations of systems, methods, systems, and/or computer program products according to example embodiments. It will be understood that one or more blocks of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, respectively, can be implemented by executing computer-executable program instructions. Also, according to some embodiments, some blocks of the block diagrams and flow diagrams may not necessarily be performed in the order shown, or may not necessarily be performed in their entirety. In addition, additional components and/or operations beyond those shown in the block diagrams and flow diagrams may be present in certain embodiments.

Accordingly, blocks of the block diagrams and flowchart illustrations support combinations of means for performing the specified functions, combinations of elements or steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, can be implemented by special purpose hardware-based computer systems that perform the specified functions, elements or steps, or combinations of special purpose hardware and computer instructions.

Program modules, applications, etc. described herein may include one or more software components, including, for example, software objects, methods, data structures, etc. Each such software component may include computer-executable instructions that, in response to execution, cause at least a portion of the functionality described herein (e.g., one or more operations of the illustrative methods described herein) to be performed.

The software components may be encoded in any of a variety of programming languages. An illustrative programming language may be a low-level programming language, such as assembly language associated with a particular hardware architecture and/or operating system platform. Software components that include assembly language instructions may need to be converted by an assembler program into executable machine code prior to execution by a hardware architecture and/or platform. Another exemplary programming language may be a higher level programming language, which may be portable across a variety of architectures. Software components that include higher level programming languages may need to be converted to an intermediate representation by an interpreter or compiler before execution. Other examples of programming languages include, but are not limited to, a macro language, a shell or command language, a job control language, a scripting language, a database query or search language, or a report writing language. In one or more exemplary embodiments, a software component containing instructions of one of the above programming language examples may be executed directly by an operating system or other software component without first being converted to another form.

The software components may be stored as files or other data storage constructs. Software components of similar types or related functionality may be stored together, such as in a particular directory, folder, or library. Software components may be static (e.g., preset or fixed) or dynamic (e.g., created or modified at execution time).

The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims

1. An image annotation method is characterized by comprising the following steps:

receiving a prediction result of a target deep learning model on an unlabelled sample image, and screening prediction labels of a plurality of sample images based on the prediction result to serve as training labeled images;

obtaining a multi-scale image pyramid of the training annotation image, and copying the multi-scale image pyramid into two parts; performing first data processing on one part of the multi-scale image pyramid to obtain a first image, and performing or not performing second data processing different from the first data processing on the other part of the multi-scale image pyramid to obtain a second image;

inputting the first image and the second image into the target deep learning model to obtain a corresponding first prediction label and a corresponding second prediction label, calculating a corresponding loss function according to the first prediction label and the second prediction label, and performing iterative updating on the target deep learning model.

2. The image annotation method of claim 1, wherein the method of screening out prediction labels of a plurality of sample pictures as training annotation images based on the prediction results comprises:

receiving the prediction results of unlabeled sample images;

selecting a prediction frame with the confidence coefficient above a preset threshold value from the prediction result as the label of the sample image;

and taking the sample image with the label as the training label image.

3. The image annotation method of claim 1, wherein the first data processing is data intensive enhancement and the second data processing is data weak enhancement.

4. The method for image annotation according to claim 1, wherein said calculating a corresponding loss function according to the first prediction label and the second prediction label comprises:

and taking the second prediction label as a real label of the first multi-scale image pyramid after the first data processing is executed, comparing the real label with the first prediction label, and calculating a corresponding loss function.

5. The image annotation method of claim 1, wherein the iterative update method for the target deep learning model comprises:

and inputting the first image and the second image into a second multi-scale thinning branch of the target deep learning model, calculating a loss function of the second multi-scale thinning branch, merging the loss function into a loss function corresponding to a main branch of the target deep learning model, and performing iterative updating on the target deep learning model.

6. The image annotation method of claim 5, wherein the second multi-scale refinement branch of the target deep learning model is shared with weights of a feature extraction network of main branches of the target deep learning model.

7. The image annotation method of claim 1, wherein the iterative update method for the target deep learning model comprises:

and inputting the first image and the second image into the target deep learning model, calculating a corresponding loss function, and performing iterative updating on the target deep learning model.

8. An image annotation apparatus using the method of any one of claims 1 to 7, comprising:

the selection module is used for receiving the prediction result of the target deep learning model on the unmarked sample image, screening the prediction labels of a plurality of sample images based on the prediction result, and using the prediction labels as training marked images;

the image processing module is used for acquiring a multi-scale image pyramid of the training annotation image and copying the pyramid into two parts; performing first data processing on one part of the multi-scale image pyramid to obtain a first image, and performing or not performing second data processing different from the first data processing on the other part of the multi-scale image pyramid to obtain a second image;

and the training module is used for inputting the first image and the second image into the target deep learning model to obtain a corresponding first prediction label and a corresponding second prediction label, calculating a corresponding loss function according to the first prediction label and the second prediction label, and performing iterative updating on the target deep learning model.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 7.