CN110516707B

CN110516707B - Image labeling method and device and storage medium thereof

Info

Publication number: CN110516707B
Application number: CN201910655710.0A
Authority: CN
Inventors: 张�浩; 邵新庆; 宋咏君; 刘强
Original assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Current assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Priority date: 2019-07-19
Filing date: 2019-07-19
Publication date: 2023-06-02
Anticipated expiration: 2039-07-19
Also published as: CN110516707A

Abstract

An image labeling method, a device and a storage medium thereof, wherein the image labeling method comprises the following steps: acquiring an image of a target object in a field environment; extracting characteristic information of an image of a target object according to a pre-established machine vision model, wherein the machine vision model is a model which is obtained by training through machine learning by utilizing a second data set formed by carrying out style conversion on a preset first data set; and marking the target object in the image of the target object by using the extracted characteristic information, and outputting marking information of the target object. When the machine vision model is built, the marked data set is subjected to style migration to the field data set through the GAN model, so that the marked data set can obtain style information of the field data set while label information is kept, the field environment is simulated to the maximum extent, and the migration effect of the machine vision model is enhanced.

Description

Image labeling method and device and storage medium thereof

Technical Field

The invention relates to the technical field of image processing, in particular to an image labeling method, an image labeling device and a storage medium.

Background

Pedestrian Re-identification (ReID for short is one research focus of computer vision in recent years, namely, given a monitored pedestrian image, searching the image of the pedestrian across devices, because of the difference between different camera devices, the appearance of the pedestrian is easily influenced by wearing, dimensions, shielding, gestures, visual angles and the like, and the pedestrian Re-identification is a problem with research value and extremely high aggressiveness.

ReID's goal is to match and return detector images from a large atlas collected by the camera network, which has attracted extensive academic and industrial interest due to its important use in security and monitoring, and significantly improved performance due to the development of deep learning and availability of many datasets.

Although current ReID datasets perform satisfactorily, there are still some unresolved issues that hamper personnel ReID application. First, existing public datasets differ from the data collected in real scenes in terms of lighting, resolution, race, sharpness, background, etc. For example, the current dataset contains a limited number of identities or is done in a limited environment, the limited number of people presented and simple lighting conditions simplify the ReID task of the people and help achieve a high accuracy recognition effect; however, in actual scenes, reID is typically performed in camera networks deployed in indoor and outdoor scenes, and processes long-shot video, so real applications must deal with challenges such as massive identities and complex lighting and scene changes, and current algorithms may not be able to address.

In addition, when the ReID model in the visual aspect of the computer is trained by using the deep neural network, the performance of the model trained from one data set (the data set is usually an object picture obtained by a manual labeling or image labeling algorithm) on the other data set is greatly reduced, namely, the model migration effect is poor. Therefore, when the current computer vision technology is applied, a large amount of labeling is needed for the field data, and the labeled field data is used for retraining the model, so that a large amount of time and cost are consumed.

Disclosure of Invention

The invention mainly solves the technical problem of how to enhance the migration effect of the machine vision model so as to improve the accuracy of image annotation.

According to a first aspect, in one embodiment, there is provided an image labeling method, including: acquiring an image of a target object in a field environment; extracting characteristic information of the image of the target object according to a pre-established machine vision model; the machine vision model is a model which is obtained by training through machine learning by utilizing a second data set formed by performing style conversion on a preset first data set; and marking the target object in the image of the target object by using the extracted characteristic information, and outputting marking information of the target object.

The marking the target object in the image of the target object by using the extracted characteristic information comprises the following steps: matching a plurality of pieces of characteristic information extracted from the target object with preset characteristics of the target object respectively, and labeling the successfully matched characteristic information; and forming labeling information of the target object according to the labeled characteristic information.

The machine vision model is a model obtained by training a second data set formed by performing style conversion by using a preset first data set through machine learning, and the establishment process of the machine vision model is as follows:

the acquisition step: collecting a group of images of at least one moving object in the field environment to form a field data set, and obtaining style information of the field data set, wherein the style information comprises one or more of brightness, color, chromatic aberration, definition, contrast and resolution; the conversion step: performing style conversion on a preset first data set according to style information of the field data set to obtain a second data set; the first data set comprises a group of images of at least one mobile object marked in any environment, and the group of images corresponding to each mobile object has uniform label information; training: and training to obtain the machine vision model through machine learning by utilizing the second data set.

In the step of converting, performing style conversion on the preset first data set according to style information of the field data set to obtain a second data set, including: migrating the first data set style to the field data set through a GAN model so as to perform style conversion on each group of images in the first data set according to style information of the field data set, thereby obtaining a corresponding group of new images; and integrating a group of new images corresponding to each group of images in the first data set to form the second data set.

The step of performing style conversion on each group of images in the first data set according to style information of the field data set by migrating the first data set to the field data set through a GAN model to obtain a corresponding new group of images, including: establishing a total loss function expressed as

Loss＝L _Style +λ ₁ L _ID

Wherein L is _Style A style loss function L corresponding to the style information representing the field data set _ID A label loss function lambda corresponding to label information representing each group of images in the first data set ₁ Is a specific gravity coefficient;

adjusting parameters of the GAN model by using the style Loss function and the tag Loss function so as to minimize a Loss value of the total Loss function; and inputting each group of images in the first data set into the GAN model obtained by regulating the Loss value at the minimum time so as to perform style conversion on each group of images in the first data set and output a group of new images corresponding to the group of images.

In the total loss function, the style loss function is expressed as

Wherein A, B is the field data set, the first data set, L respectively _GAN As a standard resistance loss function, L _cyc For a periodic consistency loss function, G represents a pattern mapping function from a to B,

representing a style mapping function from B to A, D _A And D _B Pattern discriminator, lambda, of a and B respectively ₂ Is a specific gravity coefficient; the tag loss function is expressed as

Wherein, the data distribution of A is a-p _data (a) B has a data distribution of B-p _data (b) Var is the variance calculation function of the data, G (a) is the migrated target image from image a in A, M (a) is the foreground mask for image a, G (B) is the migrated target image from image B in B, and M (B) is the foreground mask for image a.

The training step is followed by a testing step comprising: testing the machine vision model by utilizing the field data set, and adjusting the super-parameters in the GAN model by an iterative algorithm or a gradient descent algorithm; after the super-parameters in the GAN model are adjusted each time, the second data set is formed again through the conversion step, the machine vision model is obtained through retraining through the training step, and the machine vision model obtained through retraining is continuously tested by utilizing the on-site data set until the super-parameters in the GAN model are adjusted.

According to a second aspect, in one embodiment there is provided an image annotation device comprising:

the acquisition unit is used for acquiring an image of a target object in a field environment;

the extraction unit is used for extracting characteristic information of the image of the target object according to a pre-established machine vision model; the machine vision model is a model which is obtained by training through machine learning by utilizing a second data set formed by performing style conversion on a preset first data set;

and the labeling unit is used for labeling the target object in the image of the target object by using the extracted characteristic information and outputting the labeling information of the target object.

The image labeling device also comprises a model building unit for building the machine vision model, and the model building unit is connected with the extraction unit and comprises: the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a group of images of at least one mobile object in the field environment to form a field data set and obtaining style information of the field data set, and the style information comprises one or more of brightness, color, chromatic aberration, definition, contrast and resolution; the conversion module is used for carrying out style conversion on a preset first data set according to the style information of the field data set to obtain a second data set; the first data set comprises a group of images of at least one mobile object marked in any environment, and the group of images corresponding to each mobile object has uniform label information; and the training module is used for obtaining the machine vision model through machine learning by utilizing the second data set.

According to a third aspect, an embodiment provides a computer readable storage medium comprising a program executable by a processor to implement the image annotation method as described in the first aspect above.

The beneficial effects of this application are:

according to the embodiment, the image labeling method, the device and the storage medium thereof comprise the following steps: acquiring an image of a target object in a field environment; extracting characteristic information of an image of a target object according to a pre-established machine vision model, wherein the machine vision model is a model which is obtained by training through machine learning by utilizing a second data set formed by carrying out style conversion on a preset first data set; and marking the target object in the image of the target object by using the extracted characteristic information, and outputting marking information of the target object. In the first aspect, when the machine vision model is built, the marked data set is subjected to style migration to the field data set through the GAN model, so that the marked data set can obtain style information of the field data set while label information is kept, the field environment is simulated to the maximum extent, and the migration effect of the machine vision model is enhanced; in the second aspect, the problem of poor migration effect of the machine vision model is well solved by using the established machine vision model, and when the machine vision model is applied to a field environment, characteristic information can be well extracted from images, so that a target object can be quickly identified in the field environment during image annotation, manual annotation work required by new scene modeling is reduced, and time and cost of the new scene modeling are effectively saved.

Drawings

FIG. 1 is a flow chart of an image labeling method in the present application;

FIG. 2 is a flow chart of labeling a target object;

FIG. 3 is a flow chart of the machine vision model creation in the present application;

FIG. 4 is a flow chart of test steps in building a machine vision model;

FIG. 5 is a schematic diagram of a machine vision model;

FIG. 6 is a schematic structural diagram of an image labeling device in the present application;

FIG. 7 is a schematic diagram of a model building unit in the image labeling apparatus;

fig. 8 is a schematic diagram of GAN model style migration.

Detailed Description

The invention will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, some operations associated with the present application have not been shown or described in the specification to avoid obscuring the core portions of the present application, and may not be necessary for a person skilled in the art to describe in detail the relevant operations based on the description herein and the general knowledge of one skilled in the art.

Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.

The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning. The terms "coupled" and "connected," as used herein, are intended to encompass both direct and indirect coupling (coupling), unless otherwise indicated.

Embodiment 1,

Referring to fig. 1, the present application discloses an image labeling method, which includes steps S110-S130, and is described below.

Step S110, an image of a target object in a field environment is acquired.

In this embodiment, the field environment may be a public place such as a street, a square, a highway, a station, a market, a hotel, etc., and the target object may be a movable object such as a pedestrian, a vehicle, a pet, etc., which is not particularly limited herein. In addition, images of the target object within the relevant field environment may be acquired by one or more video acquisition devices (e.g., cameras) installed in public places and transmitted to a control center for acquisition.

Step S120, extracting characteristic information of an image of a target object according to a pre-established machine vision model; the machine vision model is a model which is obtained by training through machine learning by utilizing a second data set formed by performing style conversion by utilizing a preset first data set.

In a specific embodiment, when extracting a plurality of feature information in an image of a target object through a pre-established machine vision model, some feature vectors in the image are mainly extracted. For example, when a plurality of pedestrians move on a square, the image has not only the characteristic information of the pedestrians but also the characteristic information of other objects on the square, and at this time, the characteristic information about the pedestrians and other objects in the image is extracted.

It should be noted that, the feature information is often a feature vector, which is equivalent to the representation of the picture in the target task, and may be considered as a general representation mode in the field of computer vision, that is, the vector is used to represent the target object, so as to support the tasks such as face recognition and pedestrian recognition in practical application. If the face recognition is used, the target vector searches out the feature with the highest similarity in the face vector library, and when the similarity is higher than a certain threshold value, the feature is considered to be the same person.

Step S130, annotating the target object in the image of the target object by using the extracted characteristic information, and outputting the annotation information of the target object. Specifically, the labeling information of the target object can be classified, stored and displayed, so that a manager can conveniently find the labeling information to the target object.

In one embodiment, step S130 of FIG. 2 may include steps S131-S132, as described below.

Step S131, a plurality of feature information extracted from the image of the target object are respectively matched with the preset features of the target object, and feature vectors successfully matched are labeled. For example, if the target object is a pedestrian, the preset features (such as height, body shape outline, facial outline, clothing, etc.) of the pedestrian can be determined through the previously acquired image, then the machine vision model can conveniently identify the feature vector matched with the preset features of the pedestrian from other images, and determine that the matched feature information is associated with the pedestrian, so that the feature information is marked in the other images through the form of a rectangular frame, namely the pedestrian is identified.

Step S132, labeling information of the target object is formed according to the labeled characteristic information. Specifically, if some already-marked feature information is associated with a pedestrian, not only the pedestrian may be marked by using a rectangular frame, but also the pedestrian may be uniquely numbered in the form of tag information, so as to form marked information of the pedestrian.

In this embodiment, for the accuracy of image recognition, the image of the target object is processed according to a machine vision model established in advance, so as to extract feature information in the image; the machine vision model is a model which is obtained by training through machine learning by utilizing a second data set formed by performing style conversion on a preset first data set; then, the process of creating the machine vision model may be described by step S200, please refer to fig. 3, and step S200 may include steps S210-S230, which are described below.

Step S210, as an acquisition step, acquires a set of images of at least one moving object in the field environment to form a field data set, and obtains style information of the field data set, where the style information includes one or more of brightness, color difference, sharpness, contrast, and resolution.

For example, to image-label pedestrians in a square, a machine vision model related to the field environment of the square needs to be established first, and in order to obtain such a machine learning model, the field environment of the square needs to be simulated and a corresponding field data set needs to be formed. Therefore, a group of images formed by one or more pedestrians in the square can be acquired through a camera or other acquisition device, and the group of images can comprise a plurality of frames of digital pictures and have continuity in time; the field data set formed by the acquired set of images often includes information about specific styles, such as brightness, color, sharpness, etc. that are typical of current square environments.

Step S220, which is regarded as a conversion step, is performed on the preset first data set according to the style information of the field data set to obtain the second data set. In this embodiment, the first data set includes a set of images of at least one mobile object that has been marked in an arbitrary environment, and a set of images corresponding to each mobile object has uniform tag information.

For example, in the ReID scenario, the first dataset may be a DukeMTMC-reID dataset that collects an open source, which includes 1404 pedestrians, 36411 detected pedestrian rectangular boxes captured by the camera, and includes multiple moving images captured by each pedestrian at different points in time or at different angles during the course of the street's travel.

It should be noted that, between the field data set in step S210 and the first data set in step S220, there may be overall differences in brightness, color, sharpness, contrast, etc. of the pictures acquired by the different data sets due to differences in illumination, angle, camera, and background, and such differences may cause poor model migration effect.

In one embodiment, see FIG. 4, step S220 may include steps S221-S222, each of which is described below.

Step S221, migrating the style of the first data set to the field data set through the GAN model, so as to perform style conversion on each group of images in the first data set according to the style information of the field data set, and obtaining a corresponding new group of images.

For example, FIG. 8, a first dataset is formed using the DukeMTMC-reiD dataset of the open source, including annotated images of a pedestrian at various points in time during travel on the street; a field data set is formed using a set of images acquired within the field environment, including images (both annotated and not) of a pedestrian at various points in time during travel over the plaza. Migrating the style of the first data set to the field data set through the GAN model to obtain style information of the field data set and converting the style according to the style information, so as to obtain a second data set corresponding to the first data set; in the second dataset, the style of each image is changed, and the style is closer to that of the on-site dataset, meanwhile, the pedestrians still keep a certain resolution, and the ID can also be kept unchanged.

In step S222, a new set of images corresponding to each set of images in the first dataset is integrated to form a second dataset.

Note that the GAN model in this embodiment is a generated countermeasure network (Generative Adversarial Networks, abbreviated as GAN), which is a deep learning model. The GAN model is built by two modules in the framework: the generating module (generating Model) and the discriminating module (Discriminative Model) learning to game each other produces a fairly good output. In practical applications, deep neural networks are often used as G and D, and a good training method should be set for the GAN model, otherwise, the output may be not ideal due to the freedom of the neural network model. The generation module is mainly used for learning the real image distribution so as to enable the self-generated image to be more real and cheat the discrimination module; the judging module judges whether the received image is true or false. The whole process is that the image generated by the generating module is more and more real, the performance of judging the real image by the judging module is more accurate, and the two modules are balanced along with the time. Since GAN models are often used to perform style migration of two images, which is a prior art, they are not described in detail herein.

In this embodiment, the first dataset is migrated to the field dataset through the GAN model, so that each group of images in the first dataset is subjected to style conversion according to style information of the field dataset, and in the process of obtaining a corresponding group of new images, in order to ensure the implementation effect of the GAN model style migration, the style migration is controlled through the following 3 steps:

(1) Establishing a total loss function expressed as

Loss＝L _Style +λ ₁ L _ID

Wherein L is _Style A style loss function L corresponding to the style information representing the field data set _ID A label loss function lambda corresponding to label information representing each group of images in the first data set ₁ Is a specific gravity coefficient.

In the total Loss function Loss, the style Loss function is expressed as

representing samples from B to AMapping function D _A And D _B Pattern discriminator, lambda, of a and B respectively ₂ Is a specific gravity coefficient;

in the total Loss function Loss, the tag Loss function is expressed as

(2) Using style loss function L _Style And a tag loss function L _ID Parameters of the GAN model are adjusted to minimize the Loss value of the total Loss function.

(3) And inputting each group of images in the first data set into a GAN model which is obtained by regulating the Loss value at the minimum time so as to perform style conversion on each group of images in the first data set and outputting a group of new images corresponding to the group of images.

Step S230, which is regarded as a training step, is performed by machine learning using the second data set to obtain a machine vision model.

For example, the second dataset may be utilized to obtain a machine vision model through training of a ReID model, where the ReID model is a Person Re-identification model (Re-ID for short, also referred to as pedestrian Re-identification), and is a technology for judging whether a specific pedestrian exists in an image or a video sequence by using a computer vision technology. Two key technologies exist in the ReID model, one is feature extraction, and learning can cope with the features of pedestrian variation under different cameras; the other is metric learning, mapping learned features to new spaces brings the same person closer to different people farther. Since the ReID model belongs to the prior art, a detailed description thereof will not be provided here.

In another embodiment, see FIG. 4, during the training stepS230 further includes a test step S240, and the test step S400 may be summarized as: (a) Testing machine vision model with field data set, and adjusting super-parameters (such as parameter lambda) in GAN model by iterative algorithm or gradient descent algorithm ₁ 、λ ₂ ) The method comprises the steps of carrying out a first treatment on the surface of the (b) After the super-parameters in the GAN model are adjusted each time, the second data set is reformed through the conversion step S220 (i.e., S221-S222), the machine vision model is obtained through retraining through the training step S230, and the machine vision model obtained through retraining is continuously tested by using the field data set until the super-parameters in the GAN model are adjusted.

To clearly illustrate the principles of building a machine vision model, this will be described herein with reference to fig. 5. Referring to FIG. 5, a first dataset includes a set of images of at least one mobile object annotated within an arbitrary environment, and a live dataset includes a set of images of at least one mobile object within a live environment; migrating the style of the first data set to the field data set through the GAN model, performing style conversion on each group of images in the first data set according to style information of the field data set to obtain a corresponding group of new images, and integrating the new images to form a second data set; the ReID model is trained using the second data set to obtain the machine vision model as claimed herein. And then, testing the machine vision model by using a field data set, adjusting the super-parameters in the GAN model by using an iterative algorithm or a gradient descent algorithm, and considering that the super-parameters are adjusted to be finished when the set iteration times or the gradient descent requirement are reached, wherein the machine vision model is optimized, and the image marking of the target object can be carried out in a field environment.

Embodiment II,

Referring to fig. 6, on the basis of the image labeling method disclosed in the first embodiment, the present application further discloses an image labeling device 3, where the image labeling device 3 mainly includes an obtaining unit 31, an extracting unit 32, and a labeling unit 33, and the following descriptions will be given respectively.

The acquisition unit 31 is used for acquiring an image of a target object in a field environment.

The extraction unit 32 is connected to the acquisition unit 31 for extracting feature information of an image of the target object according to a machine vision model established in advance. The machine vision model in the application is a model which is obtained by training through machine learning by utilizing a second data set formed after style conversion of a preset first data set. For the specific function of the extracting unit 32, reference may be made to step S120 in the first embodiment, and detailed description thereof will be omitted.

The labeling unit 33 is connected to the extracting unit 32, and is configured to label a target object in an image of the target object by using the extracted feature information, and output labeling information of the target object. Specifically, if some already-marked feature information (feature vector) is associated with a certain pedestrian, not only the pedestrian may be marked with a rectangular frame, but also the pedestrian may be uniquely numbered in the form of tag information, thereby forming the marked information of the pedestrian. In addition, the labeling unit 33 may store and display labeling information of the target object in a classified manner, so that a manager can find the target object through the labeling information conveniently.

Further, referring to fig. 6 and 7, the image labeling apparatus 3 further includes a model building unit 34 for building a machine vision model, connected to the extraction unit 32, the model building unit 34 includes an acquisition module 341, a conversion module 342, and a training module 343.

The acquisition module 341 is configured to acquire a set of images of at least one moving object in the field environment, form a field data set, and obtain style information of the field data set, where the style information may include one or more of brightness, color, chromatic aberration, sharpness, contrast, and resolution. For specific functions of the acquisition module 341, reference may be made to step S210 in the first embodiment, and detailed description is omitted here.

The conversion module 342 is configured to perform style conversion on a preset first data set according to style information of a field data set, so as to obtain a second data set; the first data set comprises a group of images of at least one mobile object marked in any environment, and the group of images corresponding to each mobile object has unified label information. For the specific function of the conversion module 342, reference may be made to step S220 in the first embodiment, and detailed description thereof will be omitted.

The training module 343 is configured to train to obtain a machine vision model through machine learning (e.g. ReID model) by using the second data set. For specific functions of the training module 343, reference can be made to step S230 in the first embodiment, and detailed description thereof will be omitted.

To clearly illustrate the beneficial effects of the technical method of the present application, comparative experiments were performed herein. In the first test, a machine vision model is directly obtained by training an open-source DukeMTMC-reiD data set, and a first group of test indexes mAP and Rank1 are obtained by testing in a field environment according to the machine vision model; in a second test, the DukeMTMC-reiD dataset of the open source is migrated to the field dataset to form a second dataset DukeMTMC-reiD M after the style migration, the second dataset is used for training to obtain another machine vision model, and testing is conducted under the field environment according to the machine vision model to obtain a second group of test indexes mAP and Rank1.

Table 1 test index results of comparative experiments

As can be seen from Table 1, the test indexes obtained in the second test are greatly improved compared with those in the first test, so that the migration effect of the machine vision model is better, the required manual labeling work can be reduced, and the accuracy of image labeling is improved.

It should be noted that, mAP (mean average precision) and rank1 are both indexes for measuring the searching capability of the algorithm, and serve as a reference to measure the accuracy of the algorithm, which belongs to the prior art, and will not be described in detail here.

Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a computer readable storage medium, and the storage medium may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the program in the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and when the program in the memory is executed by a processor.

The foregoing description of the invention has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the invention pertains, based on the idea of the invention.

Claims

1. An image labeling method, comprising:

acquiring an image of a target object in a field environment;

extracting characteristic information of the image of the target object according to a pre-established machine vision model; the machine vision model is a model which is obtained by training through machine learning by utilizing a second data set formed by performing style conversion on a preset first data set;

marking a target object in the image of the target object by using the extracted characteristic information, and outputting marking information of the target object;

the establishment process of the machine vision model comprises the following steps:

the acquisition step: collecting a group of images of at least one moving object in the field environment to form a field data set, and obtaining style information of the field data set, wherein the style information comprises one or more of brightness, color, chromatic aberration, definition, contrast and resolution;

the conversion step: performing style conversion on a preset first data set according to style information of the field data set to obtain a second data set, wherein the style conversion comprises the following steps: migrating the first data set style to the field data set through a GAN model so as to perform style conversion on each group of images in the first data set according to style information of the field data set to obtain a corresponding group of new images, and integrating the group of new images corresponding to each group of images in the first data set to form the second data set; the first data set comprises a group of images of at least one mobile object marked in any environment, and the group of images corresponding to each mobile object has uniform label information;

training: training to obtain the machine vision model by machine learning by utilizing the second data set;

the testing steps are as follows: testing the machine vision model by using the field data set, adjusting the super-parameters in the GAN model by using an iterative algorithm or a gradient descent algorithm, re-forming the second data set by the conversion step after adjusting the super-parameters in the GAN model each time, re-training the machine vision model by using the training step, and continuously testing the machine vision model obtained by re-training by using the field data set until the super-parameters in the GAN model are adjusted;

the step of performing style conversion on each group of images in the first data set according to style information of the field data set by migrating the first data set to the field data set through a GAN model to obtain a corresponding new group of images, includes:

establishing a total loss function expressed as

Loss＝L _Style +λ ₁ L _ID

the style loss function is expressed as

representing a style mapping function from B to A, D _A And D _B Pattern discriminator, lambda, of a and B respectively ₂ Is a specific gravity coefficient;

the tag loss function is expressed as

Wherein, the data distribution of A is a-p _data (a) B has a data distribution of B-p _data (b) Var is the variance calculation function of the data, G (a) is the migrated target image from image a in A, M (a) is the foreground mask for image a, G (B) is the migrated target image from image B in B, M (B) is the foreground mask for image a;

adjusting parameters of the GAN model by using the style Loss function and the tag Loss function so as to minimize a Loss value of the total Loss function;

and inputting each group of images in the first data set into the GAN model obtained by regulating the Loss value at the minimum time so as to perform style conversion on each group of images in the first data set and output a group of new images corresponding to the group of images.

2. The image labeling method according to claim 1, wherein labeling the target object in the image of the target object using the extracted feature information comprises:

matching a plurality of feature information extracted from the image of the target object with preset features of the target object respectively, and labeling the feature information successfully matched;

and forming labeling information of the target object according to the labeled characteristic information.

3. An image marking apparatus, comprising:

the labeling unit is used for labeling the target object in the image of the target object by utilizing the extracted characteristic information and outputting labeling information of the target object;

the machine vision model extraction unit is used for extracting machine vision models from the machine vision models, and the machine vision models comprise:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a group of images of at least one mobile object in the field environment to form a field data set and obtaining style information of the field data set, and the style information comprises one or more of brightness, color, chromatic aberration, definition, contrast and resolution;

the conversion module is used for carrying out style conversion on a preset first data set according to style information of the field data set to obtain a second data set, and comprises the following steps: migrating the first data set style to the field data set through a GAN model so as to perform style conversion on each group of images in the first data set according to style information of the field data set to obtain a corresponding group of new images, and integrating the group of new images corresponding to each group of images in the first data set to form the second data set; the first data set comprises a group of images of at least one mobile object marked in any environment, and the group of images corresponding to each mobile object has uniform label information;

the training module is used for obtaining the machine vision model through machine learning by utilizing the second data set;

the conversion module migrates the style of the first data set to the field data set through a GAN model, so as to perform style conversion on each group of images in the first data set according to style information of the field data set, and obtain a corresponding new group of images, including:

establishing a total loss function expressed as

Loss＝L _Style +λ ₁ L _ID

the style loss function is expressed as

/>

the tag loss function is expressed as

4. A computer-readable storage medium comprising a program executable by a processor to implement the image annotation method of any one of claims 1-2.