CN113822348A

CN113822348A - Model training method, training device, electronic device and readable storage medium

Info

Publication number: CN113822348A
Application number: CN202111069317.7A
Authority: CN
Inventors: 罗安华; 邢军华
Original assignee: ZTE ICT Technologies Co Ltd
Current assignee: ZTE ICT Technologies Co Ltd
Priority date: 2021-09-13
Filing date: 2021-09-13
Publication date: 2021-12-21

Abstract

The invention provides a model training method, a training device, electronic equipment and a readable storage medium, wherein the model training method comprises the following steps: acquiring N image data, and generating a first data set according to the N image data, wherein N is an integer greater than 1; updating the first feature in the N image data into a second feature, and generating a second data set according to the updated N image data; and obtaining a target data set according to the first data set and the second data set, and training the detection model through the target data set. The method balances the quantity proportion of each different type of features in the data set used for training the model, and because only the first features are updated and converted into the second features, the method does not influence other image features in the image data, and improves the accuracy of the detection model obtained by final training for identifying the object to be detected.

Description

Model training method, training device, electronic device and readable storage medium

Technical Field

The invention belongs to the technical field of detection models, and particularly relates to a model training method, a training device, electronic equipment and a readable storage medium.

Background

The construction site with a complex and messy environment can bring certain negative influence to the development of urban construction, and certain manpower and time are needed to be spent to monitor whether uncovered objects which are easy to raise dust exist in real time. With the continuous development of deep learning, the traditional human eye monitoring is replaced by the automatic video monitoring, so that the labor cost and the time cost are saved, and the detection model in the related technology cannot accurately identify the object to be detected.

Disclosure of Invention

The present invention is directed to solving one of the technical problems of the prior art or the related art.

To this end, a first aspect of the invention proposes a model training method.

A second aspect of the invention proposes a model training apparatus.

A third aspect of the invention proposes an electronic device.

A fourth aspect of the invention proposes a readable storage medium.

In view of this, a model training method according to a first aspect of the present invention is provided, including: acquiring N image data, and generating a first data set according to the N image data, wherein N is an integer greater than 1; updating the first feature in the N image data into a second feature, and generating a second data set according to the updated N image data; and obtaining a target data set according to the first data set and the second data set, and training the detection model through the target data set.

The invention provides a model training method for electronic equipment, wherein the electronic equipment acquires N pieces of image data, N is an integer larger than 1, and specifically, the electronic equipment directly acquires a plurality of pieces of image data through a shooting device or receives a plurality of pieces of image data sent by other electronic equipment. The plurality of image data are labeled to generate a first data set, and the first data set generated by the plurality of image data can ensure that the number of features in the first data set is large, so that the effect of a subsequent training model is improved. In the process of generating the data set, it is necessary to identify image features in the plurality of image data, where the image features include a plurality of image features such as a first feature, a second feature, and a third feature, and mark different image features respectively. After the first data set is generated, the first features in the plurality of image data are updated, the first features in the N image data are all converted into the second features, and a second data set is generated through the plurality of updated image data. And training the model by taking the first data set and the second data set as target data sets, thereby obtaining a detection model. Under the condition that the number of the first features in the first data set is small, if the model is trained only through the first data set, the false detection rate of the model on the target features is increased, and the detection effect of the model on the key category features is reduced. According to the method and the device, the first features in the plurality of image data are updated to the second features, the second data set is generated through the plurality of updated image data, the first data set and the second data set are combined to form the target data set, the quantity proportion of each different type of features in the data set used for training the model is balanced, and the accuracy of the detection model obtained through final training for identifying the object to be detected is improved because only the first features are updated and converted into the second features and the other image features in the image data are not affected.

Specifically, the image data is an image of a construction site, and the first data set is obtained by labeling felt, bare soil, green grass and yellow grass in the image data. And (3) extracting green grass characteristics in the image data, updating the green grass into yellow grass through color conversion, uniformly modifying labels of the green grass into the yellow grass, and obtaining a second data set through the updated image data. The first data set and the second data set are combined to serve as a target data set, and the detection model is trained, so that the detection model can effectively identify felt, bare soil, green grass and yellow grass, and the false detection rate of the detection model to the bare soil in an uncovered state is reduced.

The green grass feature in the image of the construction site is replaced with the yellow grass feature by means of the Value re-quantization of HSV (Hue, Saturation, Value, color model). Specifically, a range of yellow HSV is obtained, where H has a range value of 26 to 34, S has a range value of 43 to 255, and V has a range value of 46 to 255. And replacing all elements of the position of the green grass characteristic in the image of the construction site with values in the yellow HSV range, and uniformly modifying the label of the position into a label corresponding to the yellow grass.

In addition, according to the model training method in the above technical solution provided by the present invention, the following additional technical features may also be provided:

in one possible design, generating a first data set from the N image data includes: and labeling the first feature and the second feature in the N image data through the first mark and the second mark to obtain a first data set.

In this design, a first dataset including a plurality of first features and a plurality of second features in a plurality of image data is obtained by labeling the first features and the second features in the plurality of image data.

Specifically, the image data includes image features such as green grass, yellow grass, felt, and bare soil. Green grass is labeled by "g _ grass", yellow grass is labeled by "y _ grass", felt is labeled by "fugai", and bare soil is labeled by "weikugai", wherein green grass is the first feature, yellow grass is the second feature, "g _ grass" is the first mark, and "y _ grass" is the second mark. The image features to be recognized in the image are marked through different marks in the plurality of images, so that the first data set can be guaranteed to comprise a plurality of image features required by training.

In one possible design, updating a first feature to a second feature in the N image data includes: identifying M image data to be processed with first characteristics in the N image data, wherein M is less than or equal to N; replacing the first features in the M pieces of image data to be processed with second features, and replacing the first marks with second marks to obtain M pieces of target image data; the N image data are updated by the M target image data.

In the design, in the process of updating the first feature of the N image data to the second feature, the image data to be processed in the N image data needs to be identified, the image data to be processed includes the second feature, the number of the image data to be processed is M, and the number of the image data to be processed is less than or equal to the number of the image data. Extracting a first feature in the image data to be processed, replacing the first feature with a second feature, and correspondingly replacing a first mark corresponding to the first feature, completing updating of the M image data to be processed to obtain updated M target image data, and obtaining a second data set according to the updated M target image data. The detection model is trained through the first data set and the second data set, the quantity proportion of different types of features in the data set used for training the model is balanced, the false detection rate of the detection model is reduced, and the detection accuracy of the detection model on the target features is improved.

In some embodiments, the second data set is generated directly from the updated M target image data.

In some other embodiments, M target image data are substituted for M to-be-processed image data of the N image data, thereby obtaining a second data set.

The image processing method and the device reduce the data processing amount in the process of updating the image data by identifying M pieces of image data to be processed in the N pieces of image data, wherein only the image data including the first characteristic is updated. In the process of updating the first features in the image data to be processed, the first marks corresponding to the first features are replaced by the second marks, so that the updated N image features are in a data set labeling completion state, the updated N image features do not need to be labeled again, and the steps of labeling the data set are reduced.

Specifically, the N image data are traversed, and when it is detected that the image data includes the first feature, the first feature in the image data is updated to the second feature, and the corresponding first marker of the first feature is replaced with the second marker. And finally, storing the image data as updated image data, namely renaming the image data to distinguish the image data before updating.

In one possible design, replacing a first feature in the M image data to be processed with a second feature includes: acquiring position information and pixel matrixes of a first feature in M pieces of target image data; generating a second characteristic according to the pixel matrix and a set color model; the first feature in the M image data is replaced with the second feature according to the position information.

In this design, position information of the first mark in the target image data is determined, thereby determining position information of the first feature, and a pixel matrix of the first feature in the image data is acquired, from which an area of the first feature in the image data and a shape of the first feature in the image data can be determined. An updated pixel matrix corresponding to the original pixel matrix is randomly generated through the color channel, and the color of the updated pixel matrix is different from that of the original pixel matrix. And replacing the updated pixel matrix with the original pixel matrix according to the position information, thereby completing the step of replacing the first characteristic with the second characteristic. By the method, the color of the first feature can be adjusted to form the second feature while parameters such as the shape of the first feature in the target image data are kept unchanged. The invention ensures that other image characteristics in the image data are not influenced by identifying the first characteristic and only replacing and updating the first characteristic.

In one possible design, training a detection model with a target data set includes: extracting a feature vector of each image data in the target data set; the detection model is trained by the feature vectors.

In the design, the feature vector of each image data in the target data set is extracted, and the target data set comprises the first data set and the second data set, so that the extracted feature vector comprises the first feature and the second feature, the detection model is trained according to the feature vector, the types of the data sets for training the model are more balanced, the detection accuracy of the detection model on the target features is improved, and the false detection rate is reduced.

Specifically, in feature extraction, channel information obtained by a convolution kernel is directly superimposed with equal weight, which results in that features of important information cannot be highlighted. Aiming at the problem, the global information of the target class can be better screened by respectively carrying out average pooling and maximum pooling on the channel dimensions by using an attention mechanism, and meanwhile, the salient features of the target object are better highlighted.

In one possible design, obtaining a feature vector for each image data includes: performing spatial compression on the image data through global average pooling and global maximum pooling to obtain a first average pooling vector and a first maximum pooling vector; performing weighted calculation on the first average pooling vector and the first maximum pooling vector to obtain channel characteristics; and determining a feature vector according to the channel features.

In this design, the image data in the target dataset is represented by X ∈ R^H×W×CWhere X is image data, H is image height, W is image width, C is image color channel, and X ═ X₁,x_c,…，x_c]Wherein x is_cRepresenting the parameters of the c-th convolution kernel.

Respectively using global average pooling and global maximum pooling to perform space compression on X, and setting the average pooling process and the maximum pooling process to be F_aveAnd F_maxThe corresponding outputs are respectively A_aveAnd A_maxWherein A is_aveIs the first average pooled vector, A_maxIs the first largest pooling vector. The specific calculation formula is as follows:

wherein A is_aveIs the first average pooled vector, F_aveFor the average pooling process, X_CDenotes the parameter of the c-th convolution kernel, H is the height of the image and W is the width of the image.

Wherein A is_maxIs the first largest pooling vector, F_maxFor maximum pooling process, X_CDenotes the parameter of the c-th convolution kernel, H is the height of the image and W is the width of the image.

The first average pooling vector and the first maximum pooling vector obtained by calculation are weighted, so that a channel attention map, specifically a channel attention map, can be obtained, and the specific calculation formula is as follows:

M_c(X)＝σ(MLP(A_ave)+MLP(A_max))

＝σ(W₁(W₀*A_ave)+W₀(W₁*A_max))；

wherein M is_C(X) denotes a channel attention map, σ denotes a sigmoid function, A_maxIs the first largest pooling vector, A_aveIs the first average pooled vector, MLP is the multilayer perceptron, W₁And W₀Is a weight value.

Through the formula, a channel attention map can be obtained through calculation, the first average pooling vector and the first maximum pooling vector are respectively calculated through a shared network formed by multilayer perceptrons (MLPs), the calculated results are added to obtain the channel attention map, and then the channel attention map is subjected to matrix multiplication to complete feature weighting operation, so that channel features are obtained, specifically as follows:

wherein X is image data, M_c(X) denotes a channel attention map, W being a channel characteristic.

In one possible design, determining the feature vector from the channel features includes: performing channel average pooling and channel maximum pooling on the channel characteristics to obtain a second average pooling vector and a second maximum pooling vector; and performing weighted calculation on the second average pooling vector and the second maximum pooling vector to obtain a feature vector.

In the design, channel average pooling and channel maximum pooling are respectively carried out on channel characteristics W along channel dimensions, obtained pooling results are spliced in series and then convolved by using a standard convolutional layer, and the calculation formula is as follows:

wherein M is_S(W) represents a spatial attention map, F^s _aveDenotes average pooling operation, F^s _maxRepresenting maximum pooling operation, f^7×7Represents the convolution operation with a convolution kernel of 7 × 7, avepool (w) represents the second average pooling vector, maxppool (w) represents the second maximum pooling vector.

Finally, completing the feature weighting operation through matrix multiplication to finally obtain a feature vector, which is as follows:

wherein, Out_featureRepresenting a feature vector, M_S(W) represents a spatial attention map, and W represents a channel characteristic.

According to a second aspect of the present invention, there is provided a model training apparatus comprising: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring N pieces of image data and generating a first data set according to the N pieces of image data, and N is an integer greater than 1; the updating unit is used for updating the first characteristic in the N image data into a second characteristic and generating a second data set according to the updated N image data; and the training unit is used for obtaining a target data set according to the first data set and the second data set and training the detection model through the target data set.

The model training device provided by the invention is used for electronic equipment, the electronic equipment acquires N pieces of image data, N is an integer larger than 1, and specifically, the electronic equipment directly acquires a plurality of pieces of image data through a shooting device or receives a plurality of pieces of image data sent by other electronic equipment. The plurality of image data are labeled to generate a first data set, and the first data set generated by the plurality of image data can ensure that the number of features in the first data set is large, so that the effect of a subsequent training model is improved. In the process of generating the data set, it is necessary to identify image features in the plurality of image data, where the image features include a plurality of image features such as a first feature, a second feature, and a third feature, and mark different image features respectively. After the first data set is generated, the first features in the plurality of image data are updated, the first features in the N image data are all converted into the second features, and a second data set is generated through the plurality of updated image data. And training the model by taking the first data set and the second data set as target data sets, thereby obtaining a detection model. Under the condition that the number of the first features in the first data set is small, if the model is trained only through the first data set, the false detection rate of the model on the target features is increased, and the detection effect of the model on the key category features is reduced. According to the method and the device, the first features in the plurality of image data are updated to the second features, the second data set is generated through the plurality of updated image data, the first data set and the second data set are combined to form the target data set, the quantity proportion of each different type of features in the data set used for training the model is balanced, and the accuracy of the detection model obtained through final training for identifying the object to be detected is improved because only the first features are updated and converted into the second features and the other image features in the image data are not affected.

According to a third aspect of the invention, there is provided an electronic device comprising: a memory having a program or instructions stored therein; the processor, which executes the program or the instructions stored in the memory to implement the steps of the model training method in any one of the above possible designs, has all the beneficial technical effects of the model training method in any one of the above possible designs in the first aspect, and will not be described in detail herein.

According to a fourth aspect of the present invention, there is provided a readable storage medium, on which a program or instructions are stored, which when executed by a processor, implement the steps of the model training method as in any one of the possible designs of the first aspect described above. Therefore, the method has all the beneficial technical effects of any possible design model training method in the first aspect, and will not be described in detail herein.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 shows one of the schematic flow diagrams of a model training method in a first embodiment of the invention;

FIG. 2 shows a second schematic flow chart of a model training method in a first embodiment of the invention;

FIG. 3 shows a third schematic flow chart of a model training method in a first embodiment of the invention;

FIG. 4 shows a fourth schematic flow chart of a model training method in a first embodiment of the invention;

FIG. 5 shows a fifth schematic flow chart of a model training method in a first embodiment of the invention;

FIG. 6 shows a sixth schematic flow chart of a model training method in a first embodiment of the invention;

FIG. 7 shows a schematic block diagram of a model training apparatus in a second embodiment of the present invention;

fig. 8 shows a schematic block diagram of an electronic device in a third embodiment of the invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

A model training method, a model training apparatus, an electronic device, and a readable storage medium according to some embodiments of the present invention are described below with reference to fig. 1 to 8.

The first embodiment is as follows:

as shown in fig. 1, a first embodiment of the present invention provides a model training method, including:

102, acquiring N image data, and generating a first data set according to the N image data, wherein N is an integer greater than 1;

step 104, updating the first feature in the N image data into a second feature, and generating a second data set according to the updated N image data;

and 106, obtaining a target data set according to the first data set and the second data set, and training the detection model through the target data set.

The embodiment of the invention provides a model training method for electronic equipment, wherein the electronic equipment acquires N pieces of image data, N is an integer larger than 1, and specifically, the electronic equipment directly acquires a plurality of pieces of image data through a shooting device or receives a plurality of pieces of image data sent by other electronic equipment. The plurality of image data are labeled to generate a first data set, and the first data set generated by the plurality of image data can ensure that the number of features in the first data set is large, so that the effect of a subsequent training model is improved. In the process of generating the data set, it is necessary to identify image features in the plurality of image data, where the image features include a plurality of image features such as a first feature, a second feature, and a third feature, and mark different image features respectively. After the first data set is generated, the first features in the plurality of image data are updated, the first features in the N image data are all converted into the second features, and a second data set is generated through the plurality of updated image data. And training the model by taking the first data set and the second data set as target data sets, thereby obtaining a detection model. Under the condition that the number of the first features in the first data set is small, if the model is trained only through the first data set, the false detection rate of the model on the target features is increased, and the detection effect of the model on the key category features is reduced. According to the method and the device, the first features in the plurality of image data are updated to the second features, the second data set is generated through the plurality of updated image data, the first data set and the second data set are combined to form the target data set, the quantity proportion of each different type of features in the data set used for training the model is balanced, and the accuracy of the detection model obtained through final training for identifying the object to be detected is improved because only the first features are updated and converted into the second features and the other image features in the image data are not affected.

In any of the above embodiments, generating the first data set from the N image data comprises: and labeling the first feature and the second feature in the N image data through the first mark and the second mark to obtain a first data set.

In this embodiment, a first data set including a plurality of first features and a plurality of second features in a plurality of image data is obtained by labeling the first features and the second features in the plurality of image data.

As shown in fig. 2, in any of the above embodiments, updating the first feature to the second feature in the N image data includes:

step 202, identifying M image data to be processed with a first characteristic in N image data, wherein M is less than or equal to N;

step 204, replacing the first features in the M image data to be processed with the second features, and replacing the first markers with the second markers to obtain M target image data;

in step 206, the N image data are updated with the M target image data.

In this embodiment, in the process of updating the first feature of the N image data to the second feature, the image data to be processed in the N image data needs to be identified, the image data to be processed includes the second feature, the number of the image data to be processed is M, and the number of the image data to be processed is less than or equal to the number of the image data. Extracting a first feature in the image data to be processed, replacing the first feature with a second feature, and correspondingly replacing a first mark corresponding to the first feature, completing updating of the M image data to be processed to obtain updated M target image data, and obtaining a second data set according to the updated M target image data. The detection model is trained through the first data set and the second data set, the quantity proportion of different types of features in the data set used for training the model is balanced, the false detection rate of the detection model is reduced, and the detection accuracy of the detection model on the target features is improved.

As shown in fig. 3, in any of the above embodiments, replacing a first feature in the M pieces of image data to be processed with a second feature includes:

step 302, acquiring position information and pixel matrixes of a first feature in M target image data;

step 304, generating a second characteristic according to a set color model according to the pixel matrix;

and step 306, replacing the first features in the M image data with the second features according to the position information.

In this embodiment, the position information of the first marker in the target image data is determined, thereby determining the position information of the first feature, and a pixel matrix of the first feature in the image data is acquired, from which the area of the first feature in the image data and the shape of the first feature in the image data can be determined. An updated pixel matrix corresponding to the original pixel matrix is randomly generated through the color channel, and the color of the updated pixel matrix is different from that of the original pixel matrix. And replacing the updated pixel matrix with the original pixel matrix according to the position information, thereby completing the step of replacing the first characteristic with the second characteristic. By the method, the color of the first feature can be adjusted to form the second feature while parameters such as the shape of the first feature in the target image data are kept unchanged. The invention ensures that other image characteristics in the image data are not influenced by identifying the first characteristic and only replacing and updating the first characteristic.

As shown in fig. 4, in any of the above embodiments, training the detection model with the target data set includes:

step 402, extracting a feature vector of each image data in a target data set;

step 404, training the detection model by the feature vector.

In the embodiment, the feature vector of each image data in the target data set is extracted, and the target data set comprises the first data set and the second data set, so that the extracted feature vector comprises the first feature and the second feature, and the detection model is trained according to the feature vector, so that the types of the data sets for training the model are more balanced, the detection accuracy of the detection model on the target feature is improved, and the false detection rate is reduced.

As shown in fig. 5, in any of the above embodiments, acquiring the feature vector of each image data includes:

step 502, performing spatial compression on image data through global average pooling and global maximum pooling to obtain a first average pooling vector and a first maximum pooling vector;

step 504, performing weighted calculation on the first average pooling vector and the first maximum pooling vector to obtain channel characteristics;

step 506, determining a feature vector according to the channel feature.

In this embodiment, the image data in the target dataset is treated with X ∈ R^H×W×CWhere X is image data, H is image height, W is image width, C is image color channel, and X ═ X₁,x_c,…，x_c]Wherein x is_cRepresenting the parameters of the c-th convolution kernel.

Spatial compression using global average pooling and global maximum pooling for X, respectively, with average pooling now being setThe chemical process and the maximum pooling process are respectively F_aveAnd F_maxThe corresponding outputs are respectively A_aveAnd A_maxWherein A is_aveIs the first average pooled vector, A_maxIs the first largest pooling vector. The specific calculation formula is as follows:

M_c(X)＝σ(MLP(A_ave)+MLP(A_max))

＝σ(W₁(W₀*A_ave)+W₀(W₁*A_max))；

As shown in fig. 6, in any of the above embodiments, determining the feature vector according to the channel feature includes:

step 602, performing channel average pooling and channel maximum pooling on the channel characteristics to obtain a second average pooling vector and a second maximum pooling vector;

step 604, performing weighted calculation on the second average pooling vector and the second maximum pooling vector to obtain a feature vector.

In this embodiment, the channel average pooling and the channel maximum pooling are performed on the channel characteristics W along the channel dimension, the obtained pooling results are spliced in series and then convolved with the standard convolutional layer, and the calculation formula is as follows:

Example two:

as shown in fig. 7, a second embodiment of the present invention provides a model training apparatus 700, comprising:

an obtaining unit 702, configured to obtain N pieces of image data, and generate a first data set according to the N pieces of image data, where N is an integer greater than 1;

an updating unit 704, configured to update the first feature in the N image data to a second feature, and generate a second data set according to the updated N image data;

a training unit 706, configured to obtain a target data set according to the first data set and the second data set, and train the detection model through the target data set.

The model training device 700 provided in the embodiment of the present application is used for an electronic device, where the electronic device acquires N image data, where N is an integer greater than 1, and specifically, the electronic device directly acquires a plurality of image data through a shooting device, or receives a plurality of image data sent by other electronic devices. The plurality of image data are labeled to generate a first data set, and the first data set generated by the plurality of image data can ensure that the number of features in the first data set is large, so that the effect of a subsequent training model is improved. In the process of generating the data set, it is necessary to identify image features in the plurality of image data, where the image features include a plurality of image features such as a first feature, a second feature, and a third feature, and mark different image features respectively. After the first data set is generated, the first features in the plurality of image data are updated, the first features in the N image data are all converted into the second features, and a second data set is generated through the plurality of updated image data. And training the model by taking the first data set and the second data set as target data sets, thereby obtaining a detection model. Under the condition that the number of the first features in the first data set is small, if the model is trained only through the first data set, the false detection rate of the model on the target features is increased, and the detection effect of the model on the key category features is reduced. According to the method and the device, the first features in the plurality of image data are updated to the second features, the second data set is generated through the plurality of updated image data, the first data set and the second data set are combined to form the target data set, the quantity proportion of different types of features in the data set used for training the model is balanced, the quantity of the first features and the quantity of the second features in the target data set are guaranteed, and the accuracy of identification of the detection model obtained through final training on the object to be detected is improved as the first features are only updated and converted into the second features, the influence on other image features in the image data is avoided.

In any of the above embodiments, the model training apparatus 700 further comprises:

and the labeling unit is used for labeling the first feature and the second feature in the N image data through the first mark and the second mark to obtain a first data set.

In any of the above embodiments, the updating unit 704 is specifically configured to: identifying M image data to be processed with first characteristics in the N image data, wherein M is less than or equal to N; replacing the first features in the M pieces of image data to be processed with second features, and replacing the first marks with second marks to obtain M pieces of target image data; the N image data are updated by the M target image data.

In any of the above embodiments, the updating unit 704 is specifically configured to: acquiring position information and pixel matrixes of a first feature in M pieces of target image data; generating a second characteristic according to the pixel matrix and a set color model; the first feature in the M image data is replaced with the second feature according to the position information.

In any of the above embodiments, the training unit 706 is specifically configured to: extracting a feature vector of each image data in the target data set; the detection model is trained by the feature vectors.

In any of the above embodiments, the training unit 706 is specifically configured to: performing spatial compression on the image data through global average pooling and global maximum pooling to obtain a first average pooling vector and a first maximum pooling vector; performing weighted calculation on the first average pooling vector and the first maximum pooling vector to obtain channel characteristics; and determining a feature vector according to the channel features.

In this embodiment, the image data in the target dataset is treated with X ∈ R^H×W×CWhere X is image data, H is image height, W is image width, C is image color channel, and X ═ X₁,x_c,…，x_cWherein x is_cRepresenting the parameters of the c-th convolution kernel.

M_c(X)＝σ(MLP(A_ave)+MLP(A_max))

＝σ(W₁(W₀*A_ave)+W₀(W₁*A_max))；

And calculating to obtain a feature vector according to the channel features.

In any of the above embodiments, the training unit 706 is specifically configured to: performing channel average pooling and channel maximum pooling on the channel characteristics to obtain a second average pooling vector and a second maximum pooling vector; and performing weighted calculation on the second average pooling vector and the second maximum pooling vector to obtain a feature vector.

wherein, Out_featureRepresenting a bit vector, M_S(W) represents a spatial attention map, and W represents a channel characteristic.

Example three:

as shown in fig. 8, a third embodiment of the present invention provides an electronic device 800 including: a memory 802, the memory 802 having programs or instructions stored therein; the processor 804 and the processor 804 execute the program or the instructions stored in the memory 802 to implement the steps of the model training method in any of the embodiments described above, so that all the beneficial technical effects of the model training method in any of the embodiments described above are achieved, and redundant description is not repeated herein.

Optionally, the electronic device 800 comprises image acquisition means for acquiring image data.

Optionally, the electronic device 800 comprises a communication means for receiving image data sent by other devices.

Example four:

in a fifth embodiment of the present invention, a readable storage medium is provided, on which a program is stored, which when executed by a processor implements the model training method as in any of the above embodiments, thereby having all the advantageous technical effects of the model training method as in any of the above embodiments.

The readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It is to be understood that, in the claims, the specification and the drawings of the specification of the present invention, the term "plurality" means two or more, unless explicitly defined otherwise, the terms "upper", "lower" and the like indicate orientations or positional relationships based on those shown in the drawings, and are used only for the purpose of describing the present invention more conveniently and simplifying the description, and are not used to indicate or imply that the device or element referred to must have the specific orientation described, be constructed in a specific orientation, and be operated, and thus the description should not be construed as limiting the present invention; the terms "connect," "mount," "secure," and the like are to be construed broadly, and for example, "connect" may refer to a fixed connection between multiple objects, a removable connection between multiple objects, or an integral connection; the multiple objects may be directly connected to each other or indirectly connected to each other through an intermediate. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art from the above data specifically.

In the claims, specification, and drawings that follow the present disclosure, the description of the terms "one embodiment," "some embodiments," "specific embodiments," and so forth, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In the claims, specification and drawings of the present invention, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of model training, comprising:

acquiring N image data, and generating a first data set according to the N image data, wherein N is an integer greater than 1;

updating the first feature in the N image data into a second feature, and generating a second data set according to the updated N image data;

and obtaining a target data set according to the first data set and the second data set, and training a detection model through the target data set.

2. The model training method of claim 1, wherein the generating a first data set from the N image data comprises:

labeling the first feature and the second feature in the N image data through a first mark and a second mark to obtain the first data set.

3. The model training method of claim 2, wherein the updating a first feature of the N image data to a second feature comprises:

identifying M image data to be processed with the first characteristic in the N image data, wherein M is less than or equal to N;

replacing the first features in the M pieces of image data to be processed with second features, and replacing the first markers with the second markers to obtain M pieces of target image data;

updating the N image data by the M target image data.

4. The model training method of claim 3, wherein the replacing the first feature in the M image data to be processed with the second feature comprises:

acquiring position information and pixel matrixes of the first features in the M pieces of target image data;

generating the second characteristic according to a set color model according to the pixel matrix;

replacing the first feature in the M image data with the second feature according to the position information.

5. The model training method according to any one of claims 1 to 4, wherein the training of the detection model by the target data set comprises:

extracting a feature vector for each of the image data in the target dataset;

and training the detection model through the feature vectors.

6. The model training method of claim 5, wherein the obtaining the feature vector of each of the image data comprises:

performing spatial compression on the image data through global average pooling and global maximum pooling to obtain a first average pooling vector and a first maximum pooling vector;

performing weighted calculation on the first average pooling vector and the first maximum pooling vector to obtain channel characteristics;

and determining the feature vector according to the channel feature.

7. The model training method of claim 6, wherein the determining the feature vector from the channel features comprises:

performing channel average pooling and channel maximum pooling on the channel characteristics to obtain a second average pooling vector and a second maximum pooling vector;

and performing weighted calculation on the second average pooling vector and the second maximum pooling vector to obtain the feature vector.

8. A model training apparatus, comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring N pieces of image data and generating a first data set according to the N pieces of image data, and N is an integer greater than 1;

the updating unit is used for updating the first feature in the N image data into a second feature and generating a second data set according to the updated N image data;

and the training unit is used for obtaining a target data set according to the first data set and the second data set and training the detection model through the target data set.

9. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions, when executed by the processor, implementing the steps of the model training method according to any one of claims 1 to 7.

10. A readable storage medium, on which a program or instructions are stored which, when executed by a processor, carry out the steps of the model training method according to any one of claims 1 to 7.