CN113065013B - Image annotation model training and image annotation method, system, equipment and medium - Google Patents

Image annotation model training and image annotation method, system, equipment and medium Download PDF

Info

Publication number
CN113065013B
CN113065013B CN202110321391.7A CN202110321391A CN113065013B CN 113065013 B CN113065013 B CN 113065013B CN 202110321391 A CN202110321391 A CN 202110321391A CN 113065013 B CN113065013 B CN 113065013B
Authority
CN
China
Prior art keywords
image annotation
model
image
convolution layer
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110321391.7A
Other languages
Chinese (zh)
Other versions
CN113065013A (en
Inventor
杨凯
罗超
胡泓
李巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ctrip Computer Technology Shanghai Co Ltd
Original Assignee
Ctrip Computer Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctrip Computer Technology Shanghai Co Ltd filed Critical Ctrip Computer Technology Shanghai Co Ltd
Priority to CN202110321391.7A priority Critical patent/CN113065013B/en
Publication of CN113065013A publication Critical patent/CN113065013A/en
Application granted granted Critical
Publication of CN113065013B publication Critical patent/CN113065013B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses an image annotation model training and image annotation method, an image annotation system, image annotation equipment and a medium. The image annotation model training method comprises the following steps: acquiring image data and constructing a training data set, wherein the training data set comprises image data marked by preset classification labels; the classification labels comprise a plurality of different target labels and a non-target label; adding an attention mechanism module behind a convolution layer included in a residual error network structure to construct an image annotation model, wherein the attention mechanism module is used for adjusting different channels and areas of a feature map output by the convolution layer, and the residual error network structure comprises at least one convolution layer and one full connection layer which are sequentially connected; and inputting the training data set into the image annotation model for training to obtain the target image annotation model. According to the method, the training data set is constructed by adding the non-target labels in the image classification label system, and the image annotation model is constructed by utilizing the residual error network and the attention mechanism, so that the accuracy of image annotation is improved.

Description

Image annotation model training and image annotation method, system, equipment and medium
Technical Field
The invention relates to the technical field of deep learning, in particular to an image annotation model training and image annotation method, an image annotation system, image annotation equipment and a medium.
Background
With the development of information technology, image information has been increasing explosively. For example, a map library for scenic spot sharing and recommendation is newly increased every day, and a large number of disordered pictures are backlogged in the map library, so that the map library is difficult to further use. The image data of a large amount cannot be marked only by manual processing, and an image classification algorithm based on a deep learning model is a main method for marking mass images at present. However, the existing open source image classification model aims at images in specific narrow neighborhood, and image data comprising massive irrelevant pictures cannot be accurately identified and marked in open scenes such as an attack gallery and the like.
Disclosure of Invention
The invention aims to overcome the defect that an image classification model aiming at a specific narrow neighborhood cannot accurately identify and label massive image data comprising massive irrelevant pictures in the prior art, and provides an image labeling model training and image labeling method, an image labeling system, image labeling equipment and a medium.
The invention solves the technical problems by the following technical scheme:
the invention provides an image annotation model training method, which comprises the following steps:
acquiring image data and constructing a training data set, wherein the training data set comprises the image data marked by a preset classification label; the classification labels comprise a plurality of different target labels and a non-target label; the non-target tag is of a different category than the target tag;
adding an attention mechanism module behind a convolution layer included in a residual error network structure to construct an image annotation model, wherein the attention mechanism module is used for adjusting different channels and areas of a feature map output by the convolution layer, and the residual error network structure comprises at least one convolution layer and one full connection layer which are sequentially connected;
and inputting the training data set into the image annotation model for training to obtain a target image annotation model.
Preferably, the step of adding the attention mechanism module after the convolutional layer included in the residual network structure includes:
inputting the first feature map output by the convolution layer to the attention mechanism module to obtain an attention weight feature map;
And determining a second characteristic diagram output by the attention mechanism module according to the first characteristic diagram and the attention weight characteristic diagram.
Preferably, the residual network structure comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer and a fifth convolution layer;
the step of adding the attention mechanism module after the convolution layer included in the residual network structure comprises the following steps:
And adding an attention mechanism module after the second convolution layer, the third convolution layer, the fourth convolution layer and the fifth convolution layer respectively.
Preferably, the step of inputting the training data set into the image annotation model for training to obtain the target image annotation model includes:
Inputting the training data set into the image annotation model to obtain a model output result;
calculating error loss of the image annotation model by using a first loss function according to the model output result and the balance factor;
the balance factor is the ratio of the number of samples marked by each classification label in the training data set to the total number of samples in the training data set.
Preferably, the step of inputting the training data set into the image annotation model for training to obtain the target image annotation model includes:
calculating constraint loss of the image annotation model by using a second loss function according to the model output result;
Determining a total loss of the image annotation model according to the error loss and the constraint loss;
and adjusting parameters of the image annotation model according to the total loss until convergence conditions are reached.
The invention also provides an image labeling method, which comprises the following steps:
Acquiring image data to be annotated;
And inputting the image data to be annotated into a target image annotation model obtained by using the image annotation model training method, so as to obtain an annotation result of the image data to be annotated.
The invention also provides an image annotation model training system, which comprises:
The data set construction module is used for acquiring image data and constructing a training data set, wherein the training data set comprises the image data marked by a preset classification label; the classification labels comprise a plurality of different target labels and a non-target label; the non-target tag is of a different category than the target tag;
the model construction module is used for adding an attention mechanism module after a convolution layer is included in a residual error network structure so as to construct an image annotation model, wherein the attention mechanism module is used for adjusting different channels and areas of a feature map output by the convolution layer, and the residual error network structure comprises at least one convolution layer and one full connection layer which are connected in sequence;
and the model training module is used for inputting the training data set into the image annotation model for training to obtain a target image annotation model.
The invention also provides an image annotation system, which comprises:
the image acquisition module is used for acquiring image data to be annotated;
And the image labeling module is used for inputting the image data to be labeled into a target image labeling model obtained by using the image labeling model training method to obtain a labeling result of the image data to be labeled.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the image annotation model training method as described above or the image annotation method as described above when executing the computer program.
The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an image annotation model training method as described above or an image annotation method as described above.
The invention has the positive progress effects that:
According to the invention, the training data set is constructed by adding the non-target labels in the image classification label system, the image annotation model is constructed by utilizing the residual error network and the attention mechanism, and training is carried out on the image data which comprises massive image data belonging to the target class and the image data which does not belong to the target class, wherein the target labels are marked on the image data belonging to the target class, the non-target labels are marked on the image data which does not belong to the target class, automatic identification and marking of the image data which does not belong to the target class in massive images are realized, the labor cost is greatly saved, the accuracy of image identification and marking is greatly improved, the follow-up secondary development based on the marking result is facilitated, and more quality images are selected for display in product development, so that the user experience is improved.
Drawings
Fig. 1 is a flowchart of an image annotation model training method according to embodiment 1 of the present invention.
Fig. 2 is another flowchart of the image labeling model training method of embodiment 1 of the present invention.
Fig. 3 is a flowchart of an image labeling method according to embodiment 2 of the present invention.
Fig. 4 is a block diagram of an image labeling model training system according to embodiment 3 of the present invention.
FIG. 5 is a block diagram showing an image labeling system according to embodiment 4 of the present invention
Fig. 6 is a schematic hardware structure of an electronic device according to embodiment 5 of the present invention.
Detailed Description
The invention is further illustrated by means of the following examples, which are not intended to limit the scope of the invention.
Example 1
As shown in fig. 1, the present embodiment provides an image labeling model training method, which includes:
S101, acquiring image data and constructing a training data set, wherein the training data set comprises image data marked by preset classification labels; the classification labels comprise a plurality of different target labels and a non-target label; non-target tags are of a different class than target tags.
Specifically, a plurality of different target labels are used for labeling and identifying images of specific neighborhoods, and can be scene labels such as rivers, lakes, waterfalls, oceans, seafloors and the like, or animal labels such as cats, dogs, horses, sheep and the like; the non-target label is different from the target label and is used for labeling massive pictures which do not belong to the category corresponding to the target label. Image data tagged with target tags and non-target tags is acquired in a variety of ways, including image data collected using crawler techniques, accumulating related image data, and supplementing the image data by manual tagging.
S102, adding an attention mechanism module after a convolution layer included in a residual error network structure to construct an image annotation model, wherein the attention mechanism module is used for adjusting different channels and areas of a feature map output by the convolution layer, and the residual error network structure comprises at least one convolution layer and one full connection layer which are sequentially connected.
S103, inputting the training data set into the image annotation model for training to obtain the target image annotation model.
As shown in fig. 2, step S102 includes:
S1021, using a residual network as a basic network, wherein the residual network comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer and a fifth convolution layer;
And S1022, adding an attention mechanism module after the second convolution layer, the third convolution layer, the fourth convolution layer and the fifth convolution layer respectively.
In the training stage, based on transfer learning, the image annotation model loads pre-training weights, and as the structure of the model changes more forward, the influence on the original model structure is larger, so that the effective utilization rate of the pre-training weights is smaller, the attention mechanism module is not added after the first convolution layer, but is added after the second convolution layer, the third convolution layer, the fourth convolution layer and the fifth convolution layer.
Specifically, the residual network selects WIDE RESNET, the input picture size is limited to 224×224, the number of full-connection layer output nodes is set to n+1, corresponding to N target tags and 1 non-target tag.
Inputting the first feature map output by the convolution layer to an attention mechanism module to obtain an attention weight feature map; determining a second feature map output by the attention mechanism module according to the first feature map and the attention weight feature map; the second feature map is input to the next convolution layer or full connection layer. Specifically, the first feature map input to the attention mechanism module by the convolution layer is denoted as F in,Fin with dimensions [ C, H, W ], C is the number of channels (channels) of the feature map, H is the height (height) of the feature map, and W is the width (width) of the feature map. And carrying out average pooling on F in, obtaining a descriptor with the size of [ C, 1] through an MLP (Multi-Layer Perceptron) comprising 1 Layer of hidden Layer and a BN Layer (Batch Normalization, batch standardization), further expanding the descriptor into a characteristic diagram of [ C, H, W ], and marking the value of any position of the characteristic diagram of each channel H multiplied by W in M c,Mc to be equal to the value of the channel corresponding to the original descriptor. F in is subjected to dimension reduction by using a 1X1 convolution, the dimension reduction proportion is r, context information is utilized by using two 3X 3 cavity convolutions, the dimension of the characteristic diagram is reduced to [1, H, W ] by using 1X1 convolution, the characteristic diagram is regularized by a BN layer, and finally the characteristic diagram of the 1 dimension is copied and expanded into the characteristic diagram of [ C, H, W ], and the characteristic diagram is marked as M s. Adding M c and M s gives the final attention weight profile M total:
Mc(Fin)=BN(MLP(AvgPool(Fin)))
Mtotal(Fin)=σ(Mc(Fin)+Ms(Fin))
Where f represents a convolution operation and σ represents a sigmoid function.
The attention weight feature map M total and the first feature map F in are subjected to element-wise multiplication, and F in is added to obtain a second feature map F out after final adjustment:
as shown in fig. 2, step S103 includes:
S1031, pre-training the image annotation model by adopting a transfer learning method to obtain a pre-training weight, and loading the pre-training weight to adjust parameters of the image annotation model.
In this embodiment, the migration learning is performed based on a pre-training model trained on the public scene classification dataset place365 (one dataset), and pre-training weights other than the full-connection layer in the pre-training model are loaded. Fine tuning weights in the second convolution layer, the third convolution layer, the fourth convolution layer and the fifth convolution layer, wherein initial learning rates of the second convolution layer and the third convolution layer are set to 0.001, and initial learning rates of the fourth convolution layer and the fifth convolution layer are set to 0.002; training weights of 4 attention mechanism modules and a residual error network except for a full connection layer, wherein the initial learning rate is 0.01; the weights in the other layers are frozen and no update is made. In the training process, the parameter learning rate is halved by 5 rounds per iteration.
S1032, inputting the training data set into the image annotation model to obtain a model output result;
S1033, calculating error loss of the image annotation model by using a first loss function according to the model output result and the balance factor;
The balance factor is the ratio of the number of samples labeled by each classification label in the training data set to the total number of samples in the training data set.
Model output y= { Y 1,y2,…,yN+1 }, balance factor a= { α 12,…,αN+1 }, using focal loss as the first loss function, the error loss between N target tags and non-target tags is expressed as loss fl:
Wherein, the centralized parameter gamma=2, label represents the formal label serial number of the picture, and the value range of error loss is an integer of [1, N+1 ].
S1034, calculating constraint loss of the image annotation model by using a second loss function according to the model output result; the total loss of the image annotation model is determined from the error loss and the constraint loss.
Using ring loss as a second loss function, the target modulus length is R, initializing R with the mean value of the feature vector modulus length after the first round of iteration, and the constraint loss is expressed as loss rl:
The total loss of image annotation model total is a weighted sum of two loss functions:
losstotal=lossce+λlossrl
Wherein lambda is a weight factor and takes a value of 0.01.
S1035, adjusting parameters of the image annotation model according to the total loss until convergence conditions are reached.
In this embodiment, the back propagation of the loss adopts a random gradient descent method based on momentum, and the momentum factor is momentum=0.9.
As shown in fig. 2, the image annotation model training method further includes:
s104, testing the target image annotation model, updating the balance factor according to the test result, and retraining the target image annotation model until the accuracy of the target image annotation model is greater than a preset threshold.
In this embodiment, the on-line data is used to test the model, test the test result, supplement the corresponding positive and negative samples to the training set for the case (situation) of the error label, reject the atypical samples unfavorable for the training of the model, update the balance factor of the error loss, and retrain the model. And repeating the data iteration for a plurality of times until the accuracy of the model meets the production requirement, and stopping training. Based on TorchServe model server frames, target image annotation models are packaged and deployed, and service interfaces are developed by combining Gunicorn and Flask frames.
According to the method, a training data set is added to an image classification label system, an image annotation model is built by utilizing a residual network and an attention mechanism, pre-training model weights are loaded based on transfer learning, training is conducted on image data which comprises massive image data belonging to a target class and image data which does not belong to the target class, the target labels are marked on the image data which belong to the target class, non-target labels are marked on the image data which do not belong to the target class, error loss and constraint loss of the model are calculated during model training, the weight of the model is optimized by utilizing a random gradient descent method based on momentum, automatic identification and marking of the image data which do not belong to the target class in massive images are achieved, labor cost is greatly saved, accuracy of image identification and marking is greatly improved, secondary development based on marking results is facilitated, more and better quality images are selected for display in product development, and user experience is further improved.
Example 2
As shown in fig. 3, the present embodiment provides an image labeling method, which includes:
S201, obtaining image data to be marked;
S202, inputting the image data to be annotated into a target image annotation model obtained by using the image annotation model training method of the embodiment 1, and obtaining an annotation result of the image data to be annotated.
According to the method and the device, the target image annotation model is utilized, so that image data which do not belong to the target category in a large number of pictures can be automatically identified and annotated.
Example 3
As shown in fig. 4, the present embodiment provides an image annotation model training system, which includes:
the data set constructing module 1 is used for acquiring image data and constructing a training data set, wherein the training data set comprises image data marked by preset classification labels; the classification labels comprise a plurality of different target labels and a non-target label; non-target tags are of a different class than target tags.
Specifically, a plurality of different target labels are used for labeling and identifying images of specific neighborhoods, and can be scene labels such as rivers, lakes, waterfalls, oceans, seafloors and the like, or animal labels such as cats, dogs, horses, sheep and the like; the non-target label is different from the target label and is used for labeling massive pictures which do not belong to the category corresponding to the target label. Image data tagged with target tags and non-target tags is acquired in a variety of ways, including image data collected using crawler techniques, accumulating related image data, and supplementing the image data by manual tagging.
The model construction module 2 is used for adding an attention mechanism module after a convolution layer is included in the residual error network structure so as to construct an image annotation model, wherein the attention mechanism module is used for adjusting different channels and areas of a feature map output by the convolution layer, and the residual error network structure comprises at least one convolution layer and one full connection layer which are connected in sequence;
The model training module 3, the model training module 3 is used for inputting the training data set into the image annotation model for training, and the target image annotation model is obtained.
Specifically, the model building module 2 is further configured to use a residual error network as a base network, where the residual error network includes a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer, and a fifth convolution layer; the model building block 2 is further configured to add an attention mechanism block after the second convolution layer, the third convolution layer, the fourth convolution layer, and the fifth convolution layer, respectively.
In the training stage, based on transfer learning, the image annotation model loads pre-training weights, and as the structure of the model changes more forward, the influence on the original model structure is larger, so that the effective utilization rate of the pre-training weights is smaller, the attention mechanism module is not added after the first convolution layer, but is added after the second convolution layer, the third convolution layer, the fourth convolution layer and the fifth convolution layer.
Specifically, the residual network selects WIDE RESNET, the input picture size is limited to 224×224, the number of full-connection layer output nodes is set to n+1, corresponding to N target tags and 1 non-target tag.
The model building module 2 is further configured to input the first feature map output by the convolution layer to the attention mechanism module to obtain an attention weight feature map; the model construction module 2 is further used for determining a second feature map output by the attention mechanism module according to the first feature map and the attention weight feature map; the model building block 2 is also used to input the second feature map to the next convolution layer or full connection layer. Specifically, the first feature map input to the attention mechanism module by the convolution layer is denoted as F in,Fin with dimensions [ C, H, W ], C is the number of channels (channels) of the feature map, H is the height (height) of the feature map, and W is the width (width) of the feature map. And carrying out average pooling on F in, obtaining a descriptor with the size of [ C, 1] through an MLP (Multi-Layer Perceptron) comprising 1 Layer of hidden Layer and a BN Layer (Batch Normalization, batch standardization), further expanding the descriptor into a characteristic diagram of [ C, H, W ], and marking the value of any position of the characteristic diagram of each channel H multiplied by W in M c,Mc to be equal to the value of the channel corresponding to the original descriptor. F in is subjected to dimension reduction by using a 1X 1 convolution, the dimension reduction proportion is r, context information is utilized by using two 3X 3 cavity convolutions, the dimension of the characteristic diagram is reduced to [1, H, W ] by using 1X 1 convolution, the characteristic diagram is regularized by a BN layer, and finally the characteristic diagram of the 1 dimension is copied and expanded into the characteristic diagram of [ C, H, W ], and the characteristic diagram is marked as M s. Adding M c and M s gives the final attention weight profile M total:
Mc(Fin)=BN(MLP(AvgPool(Fin)))
Mtotal(Fin)=σ(Mc(Fin)+Ms(Fin))
Where f represents a convolution operation and σ represents a sigmoid function.
The attention weight feature map M total and the first feature map F in are subjected to element-wise multiplication, and F in is added to obtain a second feature map F out after final adjustment:
The model training module 3 is further configured to pretrain the image annotation model by adopting a migration learning method to obtain a pretraining weight, and load the pretraining weight to adjust parameters of the image annotation model.
In this embodiment, the migration learning is performed based on a pre-training model trained on the public scene classification dataset place365 (one dataset), and pre-training weights other than the full-connection layer in the pre-training model are loaded. Fine tuning weights in the second convolution layer, the third convolution layer, the fourth convolution layer and the fifth convolution layer, wherein initial learning rates of the second convolution layer and the third convolution layer are set to 0.001, and initial learning rates of the fourth convolution layer and the fifth convolution layer are set to 0.002; training weights of 4 attention mechanism modules and a residual error network except for a full connection layer, wherein the initial learning rate is 0.01; the weights in the other layers are frozen and no update is made. In the training process, the parameter learning rate is halved by 5 rounds per iteration.
The model training module 3 is also used for inputting a training data set into the image annotation model to obtain a model output result; the model training module 3 is further used for calculating error loss of the image annotation model by using the first loss function according to the model output result and the balance factor; the balance factor is the ratio of the number of samples labeled by each classification label in the training data set to the total number of samples in the training data set.
Model output y= { Y 1,y2,...,yN+1 }, balance factor a= { α 12,...,αN+1 }, using focal loss as the first loss function, the error loss between N target tags and non-target tags is expressed as loss fl:
wherein, the centralized parameter gamma=2, label represents the formal label serial number of the picture, and the value range of error loss is an integer of [1, N+1 ].
The model training module 3 is further used for calculating constraint loss of the image annotation model by using a second loss function according to the model output result;
Model training module 3 is also used to determine the total loss of the image annotation model from the error loss and constraint loss.
Using ring loss as a second loss function, the target modulus length is R, initializing R with the mean value of the feature vector modulus length after the first round of iteration, and the constraint loss is expressed as loss rl:
The total loss of image annotation model total is a weighted sum of two loss functions:
losstotal=lossce+λlossrl
Wherein lambda is a weight factor and takes a value of 0.01.
The model training module 3 is further configured to adjust parameters of the image annotation model according to the total loss until convergence conditions are reached.
In this embodiment, the back propagation of the loss adopts a random gradient descent method based on momentum, and the momentum factor is momentum=0.9.
The image annotation model training system further comprises:
And the test module 4 is used for testing the target image annotation model, updating the balance factor according to the test result, and retraining the target image annotation model until the accuracy of the target image annotation model is greater than a preset threshold.
In this embodiment, the on-line data is used to test the model, test the test result, supplement the corresponding positive and negative samples to the training set for the case (situation) of the error label, reject the atypical samples unfavorable for the training of the model, update the balance factor of the error loss, and retrain the model. And repeating the data iteration for a plurality of times until the accuracy of the model meets the production requirement, and stopping training. Based on TorchServe model server frames, target image annotation models are packaged and deployed, and service interfaces are developed by combining Gunicorn and Flask frames.
According to the method, a training data set is added to an image classification label system, an image annotation model is built by utilizing a residual network and an attention mechanism, pre-training model weights are loaded based on transfer learning, training is conducted on image data which comprises massive image data belonging to a target class and image data which does not belong to the target class, the target labels are marked on the image data which belong to the target class, non-target labels are marked on the image data which do not belong to the target class, error loss and constraint loss of the model are calculated during model training, the weight of the model is optimized by utilizing a random gradient descent method based on momentum, automatic identification and marking of the image data which do not belong to the target class in massive images are achieved, labor cost is greatly saved, accuracy of image identification and marking is greatly improved, secondary development based on marking results is facilitated, more and better quality images are selected for display in product development, and user experience is further improved.
Example 4
As shown in fig. 5, the present invention further provides an image labeling system, where the image labeling system includes:
the image acquisition module 5 is used for acquiring image data to be annotated;
The image labeling module 6, the image labeling module 6 is configured to input the image data to be labeled into a target image labeling model obtained by using the image labeling model training system of embodiment 3, so as to obtain a labeling result of the image data to be labeled.
According to the method and the device, the target image annotation model is utilized, so that image data which do not belong to the target category in a large number of pictures can be automatically identified and annotated.
Example 5
Fig. 6 is a schematic structural diagram of an electronic device according to embodiment 5 of the present invention. The electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, which when executed implements the image annotation model training method of embodiment 1 or the image annotation method of embodiment 2. The electronic device 30 shown in fig. 6 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 6, the electronic device 30 may be embodied in the form of a general purpose computing device, which may be a server device, for example. Components of electronic device 30 may include, but are not limited to: the at least one processor 31, the at least one memory 32, a bus 33 connecting the different system components, including the memory 32 and the processor 31.
The bus 33 includes a data bus, an address bus, and a control bus.
Memory 32 may include volatile memory such as Random Access Memory (RAM) 321 and/or cache memory 322, and may further include Read Only Memory (ROM) 323.
Memory 32 may also include a program/utility 325 having a set (at least one) of program modules 324, such program modules 324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The processor 31 executes a computer program stored in the memory 32 to thereby perform various functional applications and data processing, such as the image annotation model training method of embodiment 1 or the image annotation method of embodiment 2 of the present invention.
The electronic device 30 may also communicate with one or more external devices 34 (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface 35. Also, model-generating device 30 may also communicate with one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet, via network adapter 36. As shown, network adapter 36 communicates with the other modules of model-generating device 30 via bus 33. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in connection with the model-generating device 30, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.
It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present invention. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
Example 6
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the image annotation model training method of embodiment 1 or the image annotation method of embodiment 2.
More specifically, among others, readable storage media may be employed including, but not limited to: portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible embodiment, the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps of implementing the image annotation model training method of example 1 or the image annotation method of example 2, when said program product is run on the terminal device.
Wherein the program code for carrying out the invention may be written in any combination of one or more programming languages, which program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on the remote device or entirely on the remote device.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the principles and spirit of the invention, but such changes and modifications fall within the scope of the invention.

Claims (8)

1. The image annotation model training method is characterized by comprising the following steps of:
acquiring image data and constructing a training data set, wherein the training data set comprises the image data marked by a preset classification label; the classification labels comprise a plurality of different target labels and a non-target label; the non-target tag is of a different category than the target tag;
adding an attention mechanism module behind a convolution layer included in a residual error network structure to construct an image annotation model, wherein the attention mechanism module is used for adjusting different channels and areas of a feature map output by the convolution layer, and the residual error network structure comprises at least one convolution layer and one full connection layer which are sequentially connected;
inputting the training data set into the image annotation model for training to obtain a target image annotation model;
The residual error network structure comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer and a fifth convolution layer;
the step of adding the attention mechanism module after the convolution layer included in the residual network structure comprises the following steps:
Respectively adding an attention mechanism module after the second convolution layer, the third convolution layer, the fourth convolution layer and the fifth convolution layer;
The step of inputting the training data set into the image annotation model for training to obtain a target image annotation model comprises the following steps:
Testing the target image annotation model, updating a balance factor according to a test result, and retraining the target image annotation model until the accuracy of the target image annotation model is greater than a preset threshold;
Inputting the training data set into the image annotation model to obtain a model output result;
calculating error loss of the image annotation model by using a first loss function according to the model output result and the balance factor;
the balance factor is the ratio of the number of samples marked by each classification label in the training data set to the total number of samples in the training data set.
2. The image annotation model training method according to claim 1, wherein the step of adding an attention mechanism module after the convolution layer included in the residual network structure comprises:
inputting the first feature map output by the convolution layer to the attention mechanism module to obtain an attention weight feature map;
And determining a second characteristic diagram output by the attention mechanism module according to the first characteristic diagram and the attention weight characteristic diagram.
3. The method for training an image annotation model of claim 2, wherein the step of inputting the training dataset into the image annotation model for training to obtain a target image annotation model comprises:
calculating constraint loss of the image annotation model by using a second loss function according to the model output result;
Determining a total loss of the image annotation model according to the error loss and the constraint loss;
and adjusting parameters of the image annotation model according to the total loss until convergence conditions are reached.
4. An image labeling method, characterized in that the image labeling method comprises the following steps:
Acquiring image data to be annotated;
inputting the image data to be annotated into a target image annotation model obtained by the image annotation model training method according to any one of claims 1-3, and obtaining an annotation result of the image data to be annotated.
5. An image annotation model training system, comprising:
The data set construction module is used for acquiring image data and constructing a training data set, wherein the training data set comprises the image data marked by a preset classification label; the classification labels comprise a plurality of different target labels and a non-target label; the non-target tag is of a different category than the target tag;
The model construction module is used for adding an attention mechanism module after a convolution layer is included in a residual error network structure so as to construct an image annotation model, wherein the attention mechanism module is used for adjusting different channels and areas of a feature map output by the convolution layer, and the residual error network structure comprises at least one convolution layer and one full connection layer which are connected in sequence; the residual error network structure comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer and a fifth convolution layer; the model building module is further configured to add an attention mechanism module after the second convolution layer, the third convolution layer, the fourth convolution layer, and the fifth convolution layer, respectively;
the model training module is used for inputting the training data set into the image annotation model for training to obtain a target image annotation model;
The test module is used for testing the target image annotation model, updating a balance factor according to a test result, and retraining the target image annotation model until the accuracy of the target image annotation model is greater than a preset threshold;
the model training module is also used for inputting the training data set into the image annotation model to obtain a model output result;
the model training module is also used for calculating error loss of the image annotation model by using a first loss function according to the model output result and the balance factor;
the balance factor is the ratio of the number of samples marked by each classification label in the training data set to the total number of samples in the training data set.
6. An image annotation system, the image annotation system comprising:
the image acquisition module is used for acquiring image data to be annotated;
The image labeling module is used for inputting the image data to be labeled into a target image labeling model obtained by using the image labeling model training method according to any one of claims 1 to 3, and obtaining a labeling result of the image data to be labeled.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the image annotation model training method of any of claims 1 to 3 or the image annotation method of claim 4 when the computer program is executed by the processor.
8. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the image annotation model training method of any one of claims 1 to 3 or the image annotation method of claim 4.
CN202110321391.7A 2021-03-25 2021-03-25 Image annotation model training and image annotation method, system, equipment and medium Active CN113065013B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110321391.7A CN113065013B (en) 2021-03-25 2021-03-25 Image annotation model training and image annotation method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110321391.7A CN113065013B (en) 2021-03-25 2021-03-25 Image annotation model training and image annotation method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN113065013A CN113065013A (en) 2021-07-02
CN113065013B true CN113065013B (en) 2024-05-03

Family

ID=76563512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110321391.7A Active CN113065013B (en) 2021-03-25 2021-03-25 Image annotation model training and image annotation method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN113065013B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114519404B (en) * 2022-04-20 2022-07-12 四川万网鑫成信息科技有限公司 Image sample classification labeling method, device, equipment and storage medium
CN114821207B (en) * 2022-06-30 2022-11-04 浙江凤凰云睿科技有限公司 Image classification method and device, storage medium and terminal
CN117671678A (en) * 2022-08-29 2024-03-08 华为技术有限公司 Image labeling method and device
CN116432770B (en) * 2023-02-28 2023-10-20 阿里巴巴(中国)有限公司 Model training, reasoning and construction method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334715A (en) * 2019-07-04 2019-10-15 电子科技大学 A kind of SAR target identification method paying attention to network based on residual error
CN110503154A (en) * 2019-08-27 2019-11-26 携程计算机技术(上海)有限公司 Method, system, electronic equipment and the storage medium of image classification
CN110991511A (en) * 2019-11-26 2020-04-10 中原工学院 Sunflower crop seed sorting method based on deep convolutional neural network
CN111259982A (en) * 2020-02-13 2020-06-09 苏州大学 Premature infant retina image classification method and device based on attention mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334715A (en) * 2019-07-04 2019-10-15 电子科技大学 A kind of SAR target identification method paying attention to network based on residual error
CN110503154A (en) * 2019-08-27 2019-11-26 携程计算机技术(上海)有限公司 Method, system, electronic equipment and the storage medium of image classification
CN110991511A (en) * 2019-11-26 2020-04-10 中原工学院 Sunflower crop seed sorting method based on deep convolutional neural network
CN111259982A (en) * 2020-02-13 2020-06-09 苏州大学 Premature infant retina image classification method and device based on attention mechanism

Also Published As

Publication number Publication date
CN113065013A (en) 2021-07-02

Similar Documents

Publication Publication Date Title
CN113065013B (en) Image annotation model training and image annotation method, system, equipment and medium
US11586880B2 (en) System and method for multi-horizon time series forecasting with dynamic temporal context learning
US9990558B2 (en) Generating image features based on robust feature-learning
US11928600B2 (en) Sequence-to-sequence prediction using a neural network model
CN114241282B (en) Knowledge distillation-based edge equipment scene recognition method and device
CN111797893B (en) Neural network training method, image classification system and related equipment
US20190370659A1 (en) Optimizing neural network architectures
CN110728317A (en) Training method and system of decision tree model, storage medium and prediction method
CN108288067A (en) Training method, bidirectional research method and the relevant apparatus of image text Matching Model
CN116702843A (en) Projection neural network
WO2021139191A1 (en) Method for data labeling and apparatus for data labeling
US20220245424A1 (en) Microgenre-based hyper-personalization with multi-modal machine learning
WO2022105108A1 (en) Network data classification method, apparatus, and device, and readable storage medium
CN113705603A (en) Incomplete multi-view data clustering method and electronic equipment
WO2023207411A1 (en) Traffic determination method and apparatus based on spatio-temporal data, and device and medium
CN114298122A (en) Data classification method, device, equipment, storage medium and computer program product
CN113065443A (en) Training method, recognition method, system, device and medium of image recognition model
CN112380427B (en) User interest prediction method based on iterative graph attention network and electronic device
CN112784157A (en) Training method of behavior prediction model, behavior prediction method, device and equipment
CN113609337A (en) Pre-training method, device, equipment and medium of graph neural network
CN111832435A (en) Beauty prediction method and device based on migration and weak supervision and storage medium
CN110704650A (en) OTA picture tag identification method, electronic device and medium
CN114821248B (en) Point cloud understanding-oriented data active screening and labeling method and device
US20230196067A1 (en) Optimal knowledge distillation scheme
CN112633246A (en) Multi-scene recognition method, system, device and storage medium in open scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant