CN113822348A - Model training method, training device, electronic device and readable storage medium - Google Patents
Model training method, training device, electronic device and readable storage medium Download PDFInfo
- Publication number
- CN113822348A CN113822348A CN202111069317.7A CN202111069317A CN113822348A CN 113822348 A CN113822348 A CN 113822348A CN 202111069317 A CN202111069317 A CN 202111069317A CN 113822348 A CN113822348 A CN 113822348A
- Authority
- CN
- China
- Prior art keywords
- image data
- feature
- data set
- features
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a model training method, a training device, electronic equipment and a readable storage medium, wherein the model training method comprises the following steps: acquiring N image data, and generating a first data set according to the N image data, wherein N is an integer greater than 1; updating the first feature in the N image data into a second feature, and generating a second data set according to the updated N image data; and obtaining a target data set according to the first data set and the second data set, and training the detection model through the target data set. The method balances the quantity proportion of each different type of features in the data set used for training the model, and because only the first features are updated and converted into the second features, the method does not influence other image features in the image data, and improves the accuracy of the detection model obtained by final training for identifying the object to be detected.
Description
Technical Field
The invention belongs to the technical field of detection models, and particularly relates to a model training method, a training device, electronic equipment and a readable storage medium.
Background
The construction site with a complex and messy environment can bring certain negative influence to the development of urban construction, and certain manpower and time are needed to be spent to monitor whether uncovered objects which are easy to raise dust exist in real time. With the continuous development of deep learning, the traditional human eye monitoring is replaced by the automatic video monitoring, so that the labor cost and the time cost are saved, and the detection model in the related technology cannot accurately identify the object to be detected.
Disclosure of Invention
The present invention is directed to solving one of the technical problems of the prior art or the related art.
To this end, a first aspect of the invention proposes a model training method.
A second aspect of the invention proposes a model training apparatus.
A third aspect of the invention proposes an electronic device.
A fourth aspect of the invention proposes a readable storage medium.
In view of this, a model training method according to a first aspect of the present invention is provided, including: acquiring N image data, and generating a first data set according to the N image data, wherein N is an integer greater than 1; updating the first feature in the N image data into a second feature, and generating a second data set according to the updated N image data; and obtaining a target data set according to the first data set and the second data set, and training the detection model through the target data set.
The invention provides a model training method for electronic equipment, wherein the electronic equipment acquires N pieces of image data, N is an integer larger than 1, and specifically, the electronic equipment directly acquires a plurality of pieces of image data through a shooting device or receives a plurality of pieces of image data sent by other electronic equipment. The plurality of image data are labeled to generate a first data set, and the first data set generated by the plurality of image data can ensure that the number of features in the first data set is large, so that the effect of a subsequent training model is improved. In the process of generating the data set, it is necessary to identify image features in the plurality of image data, where the image features include a plurality of image features such as a first feature, a second feature, and a third feature, and mark different image features respectively. After the first data set is generated, the first features in the plurality of image data are updated, the first features in the N image data are all converted into the second features, and a second data set is generated through the plurality of updated image data. And training the model by taking the first data set and the second data set as target data sets, thereby obtaining a detection model. Under the condition that the number of the first features in the first data set is small, if the model is trained only through the first data set, the false detection rate of the model on the target features is increased, and the detection effect of the model on the key category features is reduced. According to the method and the device, the first features in the plurality of image data are updated to the second features, the second data set is generated through the plurality of updated image data, the first data set and the second data set are combined to form the target data set, the quantity proportion of each different type of features in the data set used for training the model is balanced, and the accuracy of the detection model obtained through final training for identifying the object to be detected is improved because only the first features are updated and converted into the second features and the other image features in the image data are not affected.
Specifically, the image data is an image of a construction site, and the first data set is obtained by labeling felt, bare soil, green grass and yellow grass in the image data. And (3) extracting green grass characteristics in the image data, updating the green grass into yellow grass through color conversion, uniformly modifying labels of the green grass into the yellow grass, and obtaining a second data set through the updated image data. The first data set and the second data set are combined to serve as a target data set, and the detection model is trained, so that the detection model can effectively identify felt, bare soil, green grass and yellow grass, and the false detection rate of the detection model to the bare soil in an uncovered state is reduced.
The green grass feature in the image of the construction site is replaced with the yellow grass feature by means of the Value re-quantization of HSV (Hue, Saturation, Value, color model). Specifically, a range of yellow HSV is obtained, where H has a range value of 26 to 34, S has a range value of 43 to 255, and V has a range value of 46 to 255. And replacing all elements of the position of the green grass characteristic in the image of the construction site with values in the yellow HSV range, and uniformly modifying the label of the position into a label corresponding to the yellow grass.
In addition, according to the model training method in the above technical solution provided by the present invention, the following additional technical features may also be provided:
in one possible design, generating a first data set from the N image data includes: and labeling the first feature and the second feature in the N image data through the first mark and the second mark to obtain a first data set.
In this design, a first dataset including a plurality of first features and a plurality of second features in a plurality of image data is obtained by labeling the first features and the second features in the plurality of image data.
Specifically, the image data includes image features such as green grass, yellow grass, felt, and bare soil. Green grass is labeled by "g _ grass", yellow grass is labeled by "y _ grass", felt is labeled by "fugai", and bare soil is labeled by "weikugai", wherein green grass is the first feature, yellow grass is the second feature, "g _ grass" is the first mark, and "y _ grass" is the second mark. The image features to be recognized in the image are marked through different marks in the plurality of images, so that the first data set can be guaranteed to comprise a plurality of image features required by training.
In one possible design, updating a first feature to a second feature in the N image data includes: identifying M image data to be processed with first characteristics in the N image data, wherein M is less than or equal to N; replacing the first features in the M pieces of image data to be processed with second features, and replacing the first marks with second marks to obtain M pieces of target image data; the N image data are updated by the M target image data.
In the design, in the process of updating the first feature of the N image data to the second feature, the image data to be processed in the N image data needs to be identified, the image data to be processed includes the second feature, the number of the image data to be processed is M, and the number of the image data to be processed is less than or equal to the number of the image data. Extracting a first feature in the image data to be processed, replacing the first feature with a second feature, and correspondingly replacing a first mark corresponding to the first feature, completing updating of the M image data to be processed to obtain updated M target image data, and obtaining a second data set according to the updated M target image data. The detection model is trained through the first data set and the second data set, the quantity proportion of different types of features in the data set used for training the model is balanced, the false detection rate of the detection model is reduced, and the detection accuracy of the detection model on the target features is improved.
In some embodiments, the second data set is generated directly from the updated M target image data.
In some other embodiments, M target image data are substituted for M to-be-processed image data of the N image data, thereby obtaining a second data set.
The image processing method and the device reduce the data processing amount in the process of updating the image data by identifying M pieces of image data to be processed in the N pieces of image data, wherein only the image data including the first characteristic is updated. In the process of updating the first features in the image data to be processed, the first marks corresponding to the first features are replaced by the second marks, so that the updated N image features are in a data set labeling completion state, the updated N image features do not need to be labeled again, and the steps of labeling the data set are reduced.
Specifically, the N image data are traversed, and when it is detected that the image data includes the first feature, the first feature in the image data is updated to the second feature, and the corresponding first marker of the first feature is replaced with the second marker. And finally, storing the image data as updated image data, namely renaming the image data to distinguish the image data before updating.
In one possible design, replacing a first feature in the M image data to be processed with a second feature includes: acquiring position information and pixel matrixes of a first feature in M pieces of target image data; generating a second characteristic according to the pixel matrix and a set color model; the first feature in the M image data is replaced with the second feature according to the position information.
In this design, position information of the first mark in the target image data is determined, thereby determining position information of the first feature, and a pixel matrix of the first feature in the image data is acquired, from which an area of the first feature in the image data and a shape of the first feature in the image data can be determined. An updated pixel matrix corresponding to the original pixel matrix is randomly generated through the color channel, and the color of the updated pixel matrix is different from that of the original pixel matrix. And replacing the updated pixel matrix with the original pixel matrix according to the position information, thereby completing the step of replacing the first characteristic with the second characteristic. By the method, the color of the first feature can be adjusted to form the second feature while parameters such as the shape of the first feature in the target image data are kept unchanged. The invention ensures that other image characteristics in the image data are not influenced by identifying the first characteristic and only replacing and updating the first characteristic.
In one possible design, training a detection model with a target data set includes: extracting a feature vector of each image data in the target data set; the detection model is trained by the feature vectors.
In the design, the feature vector of each image data in the target data set is extracted, and the target data set comprises the first data set and the second data set, so that the extracted feature vector comprises the first feature and the second feature, the detection model is trained according to the feature vector, the types of the data sets for training the model are more balanced, the detection accuracy of the detection model on the target features is improved, and the false detection rate is reduced.
Specifically, in feature extraction, channel information obtained by a convolution kernel is directly superimposed with equal weight, which results in that features of important information cannot be highlighted. Aiming at the problem, the global information of the target class can be better screened by respectively carrying out average pooling and maximum pooling on the channel dimensions by using an attention mechanism, and meanwhile, the salient features of the target object are better highlighted.
In one possible design, obtaining a feature vector for each image data includes: performing spatial compression on the image data through global average pooling and global maximum pooling to obtain a first average pooling vector and a first maximum pooling vector; performing weighted calculation on the first average pooling vector and the first maximum pooling vector to obtain channel characteristics; and determining a feature vector according to the channel features.
In this design, the image data in the target dataset is represented by X ∈ RH×W×CWhere X is image data, H is image height, W is image width, C is image color channel, and X ═ X1,xc,…,xc]Wherein x iscRepresenting the parameters of the c-th convolution kernel.
Respectively using global average pooling and global maximum pooling to perform space compression on X, and setting the average pooling process and the maximum pooling process to be FaveAnd FmaxThe corresponding outputs are respectively AaveAnd AmaxWherein A isaveIs the first average pooled vector, AmaxIs the first largest pooling vector. The specific calculation formula is as follows:
wherein A isaveIs the first average pooled vector, FaveFor the average pooling process, XCDenotes the parameter of the c-th convolution kernel, H is the height of the image and W is the width of the image.
Wherein A ismaxIs the first largest pooling vector, FmaxFor maximum pooling process, XCDenotes the parameter of the c-th convolution kernel, H is the height of the image and W is the width of the image.
The first average pooling vector and the first maximum pooling vector obtained by calculation are weighted, so that a channel attention map, specifically a channel attention map, can be obtained, and the specific calculation formula is as follows:
Mc(X)=σ(MLP(Aave)+MLP(Amax))
=σ(W1(W0*Aave)+W0(W1*Amax));
wherein M isC(X) denotes a channel attention map, σ denotes a sigmoid function, AmaxIs the first largest pooling vector, AaveIs the first average pooled vector, MLP is the multilayer perceptron, W1And W0Is a weight value.
Through the formula, a channel attention map can be obtained through calculation, the first average pooling vector and the first maximum pooling vector are respectively calculated through a shared network formed by multilayer perceptrons (MLPs), the calculated results are added to obtain the channel attention map, and then the channel attention map is subjected to matrix multiplication to complete feature weighting operation, so that channel features are obtained, specifically as follows:
wherein X is image data, Mc(X) denotes a channel attention map, W being a channel characteristic.
In one possible design, determining the feature vector from the channel features includes: performing channel average pooling and channel maximum pooling on the channel characteristics to obtain a second average pooling vector and a second maximum pooling vector; and performing weighted calculation on the second average pooling vector and the second maximum pooling vector to obtain a feature vector.
In the design, channel average pooling and channel maximum pooling are respectively carried out on channel characteristics W along channel dimensions, obtained pooling results are spliced in series and then convolved by using a standard convolutional layer, and the calculation formula is as follows:
wherein M isS(W) represents a spatial attention map, Fs aveDenotes average pooling operation, Fs maxRepresenting maximum pooling operation, f7×7Represents the convolution operation with a convolution kernel of 7 × 7, avepool (w) represents the second average pooling vector, maxppool (w) represents the second maximum pooling vector.
Finally, completing the feature weighting operation through matrix multiplication to finally obtain a feature vector, which is as follows:
wherein, OutfeatureRepresenting a feature vector, MS(W) represents a spatial attention map, and W represents a channel characteristic.
According to a second aspect of the present invention, there is provided a model training apparatus comprising: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring N pieces of image data and generating a first data set according to the N pieces of image data, and N is an integer greater than 1; the updating unit is used for updating the first characteristic in the N image data into a second characteristic and generating a second data set according to the updated N image data; and the training unit is used for obtaining a target data set according to the first data set and the second data set and training the detection model through the target data set.
The model training device provided by the invention is used for electronic equipment, the electronic equipment acquires N pieces of image data, N is an integer larger than 1, and specifically, the electronic equipment directly acquires a plurality of pieces of image data through a shooting device or receives a plurality of pieces of image data sent by other electronic equipment. The plurality of image data are labeled to generate a first data set, and the first data set generated by the plurality of image data can ensure that the number of features in the first data set is large, so that the effect of a subsequent training model is improved. In the process of generating the data set, it is necessary to identify image features in the plurality of image data, where the image features include a plurality of image features such as a first feature, a second feature, and a third feature, and mark different image features respectively. After the first data set is generated, the first features in the plurality of image data are updated, the first features in the N image data are all converted into the second features, and a second data set is generated through the plurality of updated image data. And training the model by taking the first data set and the second data set as target data sets, thereby obtaining a detection model. Under the condition that the number of the first features in the first data set is small, if the model is trained only through the first data set, the false detection rate of the model on the target features is increased, and the detection effect of the model on the key category features is reduced. According to the method and the device, the first features in the plurality of image data are updated to the second features, the second data set is generated through the plurality of updated image data, the first data set and the second data set are combined to form the target data set, the quantity proportion of each different type of features in the data set used for training the model is balanced, and the accuracy of the detection model obtained through final training for identifying the object to be detected is improved because only the first features are updated and converted into the second features and the other image features in the image data are not affected.
Specifically, the image data is an image of a construction site, and the first data set is obtained by labeling felt, bare soil, green grass and yellow grass in the image data. And (3) extracting green grass characteristics in the image data, updating the green grass into yellow grass through color conversion, uniformly modifying labels of the green grass into the yellow grass, and obtaining a second data set through the updated image data. The first data set and the second data set are combined to serve as a target data set, and the detection model is trained, so that the detection model can effectively identify felt, bare soil, green grass and yellow grass, and the false detection rate of the detection model to the bare soil in an uncovered state is reduced.
The green grass feature in the image of the construction site is replaced with the yellow grass feature by means of the Value re-quantization of HSV (Hue, Saturation, Value, color model). Specifically, a range of yellow HSV is obtained, where H has a range value of 26 to 34, S has a range value of 43 to 255, and V has a range value of 46 to 255. And replacing all elements of the position of the green grass characteristic in the image of the construction site with values in the yellow HSV range, and uniformly modifying the label of the position into a label corresponding to the yellow grass.
According to a third aspect of the invention, there is provided an electronic device comprising: a memory having a program or instructions stored therein; the processor, which executes the program or the instructions stored in the memory to implement the steps of the model training method in any one of the above possible designs, has all the beneficial technical effects of the model training method in any one of the above possible designs in the first aspect, and will not be described in detail herein.
According to a fourth aspect of the present invention, there is provided a readable storage medium, on which a program or instructions are stored, which when executed by a processor, implement the steps of the model training method as in any one of the possible designs of the first aspect described above. Therefore, the method has all the beneficial technical effects of any possible design model training method in the first aspect, and will not be described in detail herein.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 shows one of the schematic flow diagrams of a model training method in a first embodiment of the invention;
FIG. 2 shows a second schematic flow chart of a model training method in a first embodiment of the invention;
FIG. 3 shows a third schematic flow chart of a model training method in a first embodiment of the invention;
FIG. 4 shows a fourth schematic flow chart of a model training method in a first embodiment of the invention;
FIG. 5 shows a fifth schematic flow chart of a model training method in a first embodiment of the invention;
FIG. 6 shows a sixth schematic flow chart of a model training method in a first embodiment of the invention;
FIG. 7 shows a schematic block diagram of a model training apparatus in a second embodiment of the present invention;
fig. 8 shows a schematic block diagram of an electronic device in a third embodiment of the invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
A model training method, a model training apparatus, an electronic device, and a readable storage medium according to some embodiments of the present invention are described below with reference to fig. 1 to 8.
The first embodiment is as follows:
as shown in fig. 1, a first embodiment of the present invention provides a model training method, including:
102, acquiring N image data, and generating a first data set according to the N image data, wherein N is an integer greater than 1;
step 104, updating the first feature in the N image data into a second feature, and generating a second data set according to the updated N image data;
and 106, obtaining a target data set according to the first data set and the second data set, and training the detection model through the target data set.
The embodiment of the invention provides a model training method for electronic equipment, wherein the electronic equipment acquires N pieces of image data, N is an integer larger than 1, and specifically, the electronic equipment directly acquires a plurality of pieces of image data through a shooting device or receives a plurality of pieces of image data sent by other electronic equipment. The plurality of image data are labeled to generate a first data set, and the first data set generated by the plurality of image data can ensure that the number of features in the first data set is large, so that the effect of a subsequent training model is improved. In the process of generating the data set, it is necessary to identify image features in the plurality of image data, where the image features include a plurality of image features such as a first feature, a second feature, and a third feature, and mark different image features respectively. After the first data set is generated, the first features in the plurality of image data are updated, the first features in the N image data are all converted into the second features, and a second data set is generated through the plurality of updated image data. And training the model by taking the first data set and the second data set as target data sets, thereby obtaining a detection model. Under the condition that the number of the first features in the first data set is small, if the model is trained only through the first data set, the false detection rate of the model on the target features is increased, and the detection effect of the model on the key category features is reduced. According to the method and the device, the first features in the plurality of image data are updated to the second features, the second data set is generated through the plurality of updated image data, the first data set and the second data set are combined to form the target data set, the quantity proportion of each different type of features in the data set used for training the model is balanced, and the accuracy of the detection model obtained through final training for identifying the object to be detected is improved because only the first features are updated and converted into the second features and the other image features in the image data are not affected.
Specifically, the image data is an image of a construction site, and the first data set is obtained by labeling felt, bare soil, green grass and yellow grass in the image data. And (3) extracting green grass characteristics in the image data, updating the green grass into yellow grass through color conversion, uniformly modifying labels of the green grass into the yellow grass, and obtaining a second data set through the updated image data. The first data set and the second data set are combined to serve as a target data set, and the detection model is trained, so that the detection model can effectively identify felt, bare soil, green grass and yellow grass, and the false detection rate of the detection model to the bare soil in an uncovered state is reduced.
The green grass feature in the image of the construction site is replaced with the yellow grass feature by means of the Value re-quantization of HSV (Hue, Saturation, Value, color model). Specifically, a range of yellow HSV is obtained, where H has a range value of 26 to 34, S has a range value of 43 to 255, and V has a range value of 46 to 255. And replacing all elements of the position of the green grass characteristic in the image of the construction site with values in the yellow HSV range, and uniformly modifying the label of the position into a label corresponding to the yellow grass.
In any of the above embodiments, generating the first data set from the N image data comprises: and labeling the first feature and the second feature in the N image data through the first mark and the second mark to obtain a first data set.
In this embodiment, a first data set including a plurality of first features and a plurality of second features in a plurality of image data is obtained by labeling the first features and the second features in the plurality of image data.
Specifically, the image data includes image features such as green grass, yellow grass, felt, and bare soil. Green grass is labeled by "g _ grass", yellow grass is labeled by "y _ grass", felt is labeled by "fugai", and bare soil is labeled by "weikugai", wherein green grass is the first feature, yellow grass is the second feature, "g _ grass" is the first mark, and "y _ grass" is the second mark. The image features to be recognized in the image are marked through different marks in the plurality of images, so that the first data set can be guaranteed to comprise a plurality of image features required by training.
As shown in fig. 2, in any of the above embodiments, updating the first feature to the second feature in the N image data includes:
in step 206, the N image data are updated with the M target image data.
In this embodiment, in the process of updating the first feature of the N image data to the second feature, the image data to be processed in the N image data needs to be identified, the image data to be processed includes the second feature, the number of the image data to be processed is M, and the number of the image data to be processed is less than or equal to the number of the image data. Extracting a first feature in the image data to be processed, replacing the first feature with a second feature, and correspondingly replacing a first mark corresponding to the first feature, completing updating of the M image data to be processed to obtain updated M target image data, and obtaining a second data set according to the updated M target image data. The detection model is trained through the first data set and the second data set, the quantity proportion of different types of features in the data set used for training the model is balanced, the false detection rate of the detection model is reduced, and the detection accuracy of the detection model on the target features is improved.
In some embodiments, the second data set is generated directly from the updated M target image data.
In some other embodiments, M target image data are substituted for M to-be-processed image data of the N image data, thereby obtaining a second data set.
The image processing method and the device reduce the data processing amount in the process of updating the image data by identifying M pieces of image data to be processed in the N pieces of image data, wherein only the image data including the first characteristic is updated. In the process of updating the first features in the image data to be processed, the first marks corresponding to the first features are replaced by the second marks, so that the updated N image features are in a data set labeling completion state, the updated N image features do not need to be labeled again, and the steps of labeling the data set are reduced.
Specifically, the N image data are traversed, and when it is detected that the image data includes the first feature, the first feature in the image data is updated to the second feature, and the corresponding first marker of the first feature is replaced with the second marker. And finally, storing the image data as updated image data, namely renaming the image data to distinguish the image data before updating.
As shown in fig. 3, in any of the above embodiments, replacing a first feature in the M pieces of image data to be processed with a second feature includes:
and step 306, replacing the first features in the M image data with the second features according to the position information.
In this embodiment, the position information of the first marker in the target image data is determined, thereby determining the position information of the first feature, and a pixel matrix of the first feature in the image data is acquired, from which the area of the first feature in the image data and the shape of the first feature in the image data can be determined. An updated pixel matrix corresponding to the original pixel matrix is randomly generated through the color channel, and the color of the updated pixel matrix is different from that of the original pixel matrix. And replacing the updated pixel matrix with the original pixel matrix according to the position information, thereby completing the step of replacing the first characteristic with the second characteristic. By the method, the color of the first feature can be adjusted to form the second feature while parameters such as the shape of the first feature in the target image data are kept unchanged. The invention ensures that other image characteristics in the image data are not influenced by identifying the first characteristic and only replacing and updating the first characteristic.
As shown in fig. 4, in any of the above embodiments, training the detection model with the target data set includes:
In the embodiment, the feature vector of each image data in the target data set is extracted, and the target data set comprises the first data set and the second data set, so that the extracted feature vector comprises the first feature and the second feature, and the detection model is trained according to the feature vector, so that the types of the data sets for training the model are more balanced, the detection accuracy of the detection model on the target feature is improved, and the false detection rate is reduced.
Specifically, in feature extraction, channel information obtained by a convolution kernel is directly superimposed with equal weight, which results in that features of important information cannot be highlighted. Aiming at the problem, the global information of the target class can be better screened by respectively carrying out average pooling and maximum pooling on the channel dimensions by using an attention mechanism, and meanwhile, the salient features of the target object are better highlighted.
As shown in fig. 5, in any of the above embodiments, acquiring the feature vector of each image data includes:
In this embodiment, the image data in the target dataset is treated with X ∈ RH×W×CWhere X is image data, H is image height, W is image width, C is image color channel, and X ═ X1,xc,…,xc]Wherein x iscRepresenting the parameters of the c-th convolution kernel.
Spatial compression using global average pooling and global maximum pooling for X, respectively, with average pooling now being setThe chemical process and the maximum pooling process are respectively FaveAnd FmaxThe corresponding outputs are respectively AaveAnd AmaxWherein A isaveIs the first average pooled vector, AmaxIs the first largest pooling vector. The specific calculation formula is as follows:
wherein A isaveIs the first average pooled vector, FaveFor the average pooling process, XCDenotes the parameter of the c-th convolution kernel, H is the height of the image and W is the width of the image.
Wherein A ismaxIs the first largest pooling vector, FmaxFor maximum pooling process, XCDenotes the parameter of the c-th convolution kernel, H is the height of the image and W is the width of the image.
The first average pooling vector and the first maximum pooling vector obtained by calculation are weighted, so that a channel attention map, specifically a channel attention map, can be obtained, and the specific calculation formula is as follows:
Mc(X)=σ(MLP(Aave)+MLP(Amax))
=σ(W1(W0*Aave)+W0(W1*Amax));
wherein M isC(X) denotes a channel attention map, σ denotes a sigmoid function, AmaxIs the first largest pooling vector, AaveIs the first average pooled vector, MLP is the multilayer perceptron, W1And W0Is a weight value.
Through the formula, a channel attention map can be obtained through calculation, the first average pooling vector and the first maximum pooling vector are respectively calculated through a shared network formed by multilayer perceptrons (MLPs), the calculated results are added to obtain the channel attention map, and then the channel attention map is subjected to matrix multiplication to complete feature weighting operation, so that channel features are obtained, specifically as follows:
wherein X is image data, Mc(X) denotes a channel attention map, W being a channel characteristic.
As shown in fig. 6, in any of the above embodiments, determining the feature vector according to the channel feature includes:
In this embodiment, the channel average pooling and the channel maximum pooling are performed on the channel characteristics W along the channel dimension, the obtained pooling results are spliced in series and then convolved with the standard convolutional layer, and the calculation formula is as follows:
wherein M isS(W) represents a spatial attention map, Fs aveDenotes average pooling operation, Fs maxRepresenting maximum pooling operation, f7×7Represents the convolution operation with a convolution kernel of 7 × 7, avepool (w) represents the second average pooling vector, maxppool (w) represents the second maximum pooling vector.
Finally, completing the feature weighting operation through matrix multiplication to finally obtain a feature vector, which is as follows:
wherein, OutfeatureRepresenting a feature vector, MS(W) represents a spatial attention map, and W represents a channel characteristic.
Example two:
as shown in fig. 7, a second embodiment of the present invention provides a model training apparatus 700, comprising:
an obtaining unit 702, configured to obtain N pieces of image data, and generate a first data set according to the N pieces of image data, where N is an integer greater than 1;
an updating unit 704, configured to update the first feature in the N image data to a second feature, and generate a second data set according to the updated N image data;
a training unit 706, configured to obtain a target data set according to the first data set and the second data set, and train the detection model through the target data set.
The model training device 700 provided in the embodiment of the present application is used for an electronic device, where the electronic device acquires N image data, where N is an integer greater than 1, and specifically, the electronic device directly acquires a plurality of image data through a shooting device, or receives a plurality of image data sent by other electronic devices. The plurality of image data are labeled to generate a first data set, and the first data set generated by the plurality of image data can ensure that the number of features in the first data set is large, so that the effect of a subsequent training model is improved. In the process of generating the data set, it is necessary to identify image features in the plurality of image data, where the image features include a plurality of image features such as a first feature, a second feature, and a third feature, and mark different image features respectively. After the first data set is generated, the first features in the plurality of image data are updated, the first features in the N image data are all converted into the second features, and a second data set is generated through the plurality of updated image data. And training the model by taking the first data set and the second data set as target data sets, thereby obtaining a detection model. Under the condition that the number of the first features in the first data set is small, if the model is trained only through the first data set, the false detection rate of the model on the target features is increased, and the detection effect of the model on the key category features is reduced. According to the method and the device, the first features in the plurality of image data are updated to the second features, the second data set is generated through the plurality of updated image data, the first data set and the second data set are combined to form the target data set, the quantity proportion of different types of features in the data set used for training the model is balanced, the quantity of the first features and the quantity of the second features in the target data set are guaranteed, and the accuracy of identification of the detection model obtained through final training on the object to be detected is improved as the first features are only updated and converted into the second features, the influence on other image features in the image data is avoided.
Specifically, the image data is an image of a construction site, and the first data set is obtained by labeling felt, bare soil, green grass and yellow grass in the image data. And (3) extracting green grass characteristics in the image data, updating the green grass into yellow grass through color conversion, uniformly modifying labels of the green grass into the yellow grass, and obtaining a second data set through the updated image data. The first data set and the second data set are combined to serve as a target data set, and the detection model is trained, so that the detection model can effectively identify felt, bare soil, green grass and yellow grass, and the false detection rate of the detection model to the bare soil in an uncovered state is reduced.
The green grass feature in the image of the construction site is replaced with the yellow grass feature by means of the Value re-quantization of HSV (Hue, Saturation, Value, color model). Specifically, a range of yellow HSV is obtained, where H has a range value of 26 to 34, S has a range value of 43 to 255, and V has a range value of 46 to 255. And replacing all elements of the position of the green grass characteristic in the image of the construction site with values in the yellow HSV range, and uniformly modifying the label of the position into a label corresponding to the yellow grass.
In any of the above embodiments, the model training apparatus 700 further comprises:
and the labeling unit is used for labeling the first feature and the second feature in the N image data through the first mark and the second mark to obtain a first data set.
In this embodiment, a first data set including a plurality of first features and a plurality of second features in a plurality of image data is obtained by labeling the first features and the second features in the plurality of image data.
Specifically, the image data includes image features such as green grass, yellow grass, felt, and bare soil. Green grass is labeled by "g _ grass", yellow grass is labeled by "y _ grass", felt is labeled by "fugai", and bare soil is labeled by "weikugai", wherein green grass is the first feature, yellow grass is the second feature, "g _ grass" is the first mark, and "y _ grass" is the second mark. The image features to be recognized in the image are marked through different marks in the plurality of images, so that the first data set can be guaranteed to comprise a plurality of image features required by training.
In any of the above embodiments, the updating unit 704 is specifically configured to: identifying M image data to be processed with first characteristics in the N image data, wherein M is less than or equal to N; replacing the first features in the M pieces of image data to be processed with second features, and replacing the first marks with second marks to obtain M pieces of target image data; the N image data are updated by the M target image data.
In this embodiment, in the process of updating the first feature of the N image data to the second feature, the image data to be processed in the N image data needs to be identified, the image data to be processed includes the second feature, the number of the image data to be processed is M, and the number of the image data to be processed is less than or equal to the number of the image data. Extracting a first feature in the image data to be processed, replacing the first feature with a second feature, and correspondingly replacing a first mark corresponding to the first feature, completing updating of the M image data to be processed to obtain updated M target image data, and obtaining a second data set according to the updated M target image data. The detection model is trained through the first data set and the second data set, the quantity proportion of different types of features in the data set used for training the model is balanced, the false detection rate of the detection model is reduced, and the detection accuracy of the detection model on the target features is improved.
In some embodiments, the second data set is generated directly from the updated M target image data.
In some other embodiments, M target image data are substituted for M to-be-processed image data of the N image data, thereby obtaining a second data set.
The image processing method and the device reduce the data processing amount in the process of updating the image data by identifying M pieces of image data to be processed in the N pieces of image data, wherein only the image data including the first characteristic is updated. In the process of updating the first features in the image data to be processed, the first marks corresponding to the first features are replaced by the second marks, so that the updated N image features are in a data set labeling completion state, the updated N image features do not need to be labeled again, and the steps of labeling the data set are reduced.
Specifically, the N image data are traversed, and when it is detected that the image data includes the first feature, the first feature in the image data is updated to the second feature, and the corresponding first marker of the first feature is replaced with the second marker. And finally, storing the image data as updated image data, namely renaming the image data to distinguish the image data before updating.
In any of the above embodiments, the updating unit 704 is specifically configured to: acquiring position information and pixel matrixes of a first feature in M pieces of target image data; generating a second characteristic according to the pixel matrix and a set color model; the first feature in the M image data is replaced with the second feature according to the position information.
In this embodiment, the position information of the first marker in the target image data is determined, thereby determining the position information of the first feature, and a pixel matrix of the first feature in the image data is acquired, from which the area of the first feature in the image data and the shape of the first feature in the image data can be determined. An updated pixel matrix corresponding to the original pixel matrix is randomly generated through the color channel, and the color of the updated pixel matrix is different from that of the original pixel matrix. And replacing the updated pixel matrix with the original pixel matrix according to the position information, thereby completing the step of replacing the first characteristic with the second characteristic. By the method, the color of the first feature can be adjusted to form the second feature while parameters such as the shape of the first feature in the target image data are kept unchanged. The invention ensures that other image characteristics in the image data are not influenced by identifying the first characteristic and only replacing and updating the first characteristic.
In any of the above embodiments, the training unit 706 is specifically configured to: extracting a feature vector of each image data in the target data set; the detection model is trained by the feature vectors.
In the embodiment, the feature vector of each image data in the target data set is extracted, and the target data set comprises the first data set and the second data set, so that the extracted feature vector comprises the first feature and the second feature, and the detection model is trained according to the feature vector, so that the types of the data sets for training the model are more balanced, the detection accuracy of the detection model on the target feature is improved, and the false detection rate is reduced.
Specifically, in feature extraction, channel information obtained by a convolution kernel is directly superimposed with equal weight, which results in that features of important information cannot be highlighted. Aiming at the problem, the global information of the target class can be better screened by respectively carrying out average pooling and maximum pooling on the channel dimensions by using an attention mechanism, and meanwhile, the salient features of the target object are better highlighted.
In any of the above embodiments, the training unit 706 is specifically configured to: performing spatial compression on the image data through global average pooling and global maximum pooling to obtain a first average pooling vector and a first maximum pooling vector; performing weighted calculation on the first average pooling vector and the first maximum pooling vector to obtain channel characteristics; and determining a feature vector according to the channel features.
In this embodiment, the image data in the target dataset is treated with X ∈ RH×W×CWhere X is image data, H is image height, W is image width, C is image color channel, and X ═ X1,xc,…,xcWherein x iscRepresenting the parameters of the c-th convolution kernel.
Respectively using global average pooling and global maximum pooling to perform space compression on X, and setting the average pooling process and the maximum pooling process to be FaveAnd FmaxThe corresponding outputs are respectively AaveAnd AmaxWherein A isaveIs the first average pooled vector, AmaxIs the first largest pooling vector. The specific calculation formula is as follows:
wherein A isaveIs the first average pooled vector, FaveFor the average pooling process, XCDenotes the parameter of the c-th convolution kernel, H is the height of the image and W is the width of the image.
Wherein A ismaxIs the first largest pooling vector, FmaxFor maximum pooling process, XCDenotes the parameter of the c-th convolution kernel, H is the height of the image and W is the width of the image.
The first average pooling vector and the first maximum pooling vector obtained by calculation are weighted, so that a channel attention map, specifically a channel attention map, can be obtained, and the specific calculation formula is as follows:
Mc(X)=σ(MLP(Aave)+MLP(Amax))
=σ(W1(W0*Aave)+W0(W1*Amax));
wherein M isC(X) denotes a channel attention map, σ denotes a sigmoid function, AmaxIs the first largest pooling vector, AaveIs the first average pooled vector, MLP is the multilayer perceptron, W1And W0Is a weight value.
Through the formula, a channel attention map can be obtained through calculation, the first average pooling vector and the first maximum pooling vector are respectively calculated through a shared network formed by multilayer perceptrons (MLPs), the calculated results are added to obtain the channel attention map, and then the channel attention map is subjected to matrix multiplication to complete feature weighting operation, so that channel features are obtained, specifically as follows:
wherein X is image data, Mc(X) denotes a channel attention map, W being a channel characteristic.
And calculating to obtain a feature vector according to the channel features.
In any of the above embodiments, the training unit 706 is specifically configured to: performing channel average pooling and channel maximum pooling on the channel characteristics to obtain a second average pooling vector and a second maximum pooling vector; and performing weighted calculation on the second average pooling vector and the second maximum pooling vector to obtain a feature vector.
In this embodiment, the channel average pooling and the channel maximum pooling are performed on the channel characteristics W along the channel dimension, the obtained pooling results are spliced in series and then convolved with the standard convolutional layer, and the calculation formula is as follows:
wherein M isS(W) represents a spatial attention map, Fs aveDenotes average pooling operation, Fs maxRepresenting maximum pooling operation, f7×7Represents the convolution operation with a convolution kernel of 7 × 7, avepool (w) represents the second average pooling vector, maxppool (w) represents the second maximum pooling vector.
Finally, completing the feature weighting operation through matrix multiplication to finally obtain a feature vector, which is as follows:
wherein, OutfeatureRepresenting a bit vector, MS(W) represents a spatial attention map, and W represents a channel characteristic.
Example three:
as shown in fig. 8, a third embodiment of the present invention provides an electronic device 800 including: a memory 802, the memory 802 having programs or instructions stored therein; the processor 804 and the processor 804 execute the program or the instructions stored in the memory 802 to implement the steps of the model training method in any of the embodiments described above, so that all the beneficial technical effects of the model training method in any of the embodiments described above are achieved, and redundant description is not repeated herein.
Optionally, the electronic device 800 comprises image acquisition means for acquiring image data.
Optionally, the electronic device 800 comprises a communication means for receiving image data sent by other devices.
Example four:
in a fifth embodiment of the present invention, a readable storage medium is provided, on which a program is stored, which when executed by a processor implements the model training method as in any of the above embodiments, thereby having all the advantageous technical effects of the model training method as in any of the above embodiments.
The readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It is to be understood that, in the claims, the specification and the drawings of the specification of the present invention, the term "plurality" means two or more, unless explicitly defined otherwise, the terms "upper", "lower" and the like indicate orientations or positional relationships based on those shown in the drawings, and are used only for the purpose of describing the present invention more conveniently and simplifying the description, and are not used to indicate or imply that the device or element referred to must have the specific orientation described, be constructed in a specific orientation, and be operated, and thus the description should not be construed as limiting the present invention; the terms "connect," "mount," "secure," and the like are to be construed broadly, and for example, "connect" may refer to a fixed connection between multiple objects, a removable connection between multiple objects, or an integral connection; the multiple objects may be directly connected to each other or indirectly connected to each other through an intermediate. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art from the above data specifically.
In the claims, specification, and drawings that follow the present disclosure, the description of the terms "one embodiment," "some embodiments," "specific embodiments," and so forth, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In the claims, specification and drawings of the present invention, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A method of model training, comprising:
acquiring N image data, and generating a first data set according to the N image data, wherein N is an integer greater than 1;
updating the first feature in the N image data into a second feature, and generating a second data set according to the updated N image data;
and obtaining a target data set according to the first data set and the second data set, and training a detection model through the target data set.
2. The model training method of claim 1, wherein the generating a first data set from the N image data comprises:
labeling the first feature and the second feature in the N image data through a first mark and a second mark to obtain the first data set.
3. The model training method of claim 2, wherein the updating a first feature of the N image data to a second feature comprises:
identifying M image data to be processed with the first characteristic in the N image data, wherein M is less than or equal to N;
replacing the first features in the M pieces of image data to be processed with second features, and replacing the first markers with the second markers to obtain M pieces of target image data;
updating the N image data by the M target image data.
4. The model training method of claim 3, wherein the replacing the first feature in the M image data to be processed with the second feature comprises:
acquiring position information and pixel matrixes of the first features in the M pieces of target image data;
generating the second characteristic according to a set color model according to the pixel matrix;
replacing the first feature in the M image data with the second feature according to the position information.
5. The model training method according to any one of claims 1 to 4, wherein the training of the detection model by the target data set comprises:
extracting a feature vector for each of the image data in the target dataset;
and training the detection model through the feature vectors.
6. The model training method of claim 5, wherein the obtaining the feature vector of each of the image data comprises:
performing spatial compression on the image data through global average pooling and global maximum pooling to obtain a first average pooling vector and a first maximum pooling vector;
performing weighted calculation on the first average pooling vector and the first maximum pooling vector to obtain channel characteristics;
and determining the feature vector according to the channel feature.
7. The model training method of claim 6, wherein the determining the feature vector from the channel features comprises:
performing channel average pooling and channel maximum pooling on the channel characteristics to obtain a second average pooling vector and a second maximum pooling vector;
and performing weighted calculation on the second average pooling vector and the second maximum pooling vector to obtain the feature vector.
8. A model training apparatus, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring N pieces of image data and generating a first data set according to the N pieces of image data, and N is an integer greater than 1;
the updating unit is used for updating the first feature in the N image data into a second feature and generating a second data set according to the updated N image data;
and the training unit is used for obtaining a target data set according to the first data set and the second data set and training the detection model through the target data set.
9. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions, when executed by the processor, implementing the steps of the model training method according to any one of claims 1 to 7.
10. A readable storage medium, on which a program or instructions are stored which, when executed by a processor, carry out the steps of the model training method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111069317.7A CN113822348A (en) | 2021-09-13 | 2021-09-13 | Model training method, training device, electronic device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111069317.7A CN113822348A (en) | 2021-09-13 | 2021-09-13 | Model training method, training device, electronic device and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113822348A true CN113822348A (en) | 2021-12-21 |
Family
ID=78914436
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111069317.7A Pending CN113822348A (en) | 2021-09-13 | 2021-09-13 | Model training method, training device, electronic device and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113822348A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101072289A (en) * | 2007-06-11 | 2007-11-14 | 北京中星微电子有限公司 | Automatic generating method and device for image special effect |
CN108305296A (en) * | 2017-08-30 | 2018-07-20 | 深圳市腾讯计算机系统有限公司 | Iamge description generation method, model training method, equipment and storage medium |
CN108509915A (en) * | 2018-04-03 | 2018-09-07 | 百度在线网络技术(北京)有限公司 | The generation method and device of human face recognition model |
CN111401375A (en) * | 2020-03-09 | 2020-07-10 | 苏宁云计算有限公司 | Text recognition model training method, text recognition device and text recognition equipment |
-
2021
- 2021-09-13 CN CN202111069317.7A patent/CN113822348A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101072289A (en) * | 2007-06-11 | 2007-11-14 | 北京中星微电子有限公司 | Automatic generating method and device for image special effect |
CN108305296A (en) * | 2017-08-30 | 2018-07-20 | 深圳市腾讯计算机系统有限公司 | Iamge description generation method, model training method, equipment and storage medium |
CN108509915A (en) * | 2018-04-03 | 2018-09-07 | 百度在线网络技术(北京)有限公司 | The generation method and device of human face recognition model |
CN111401375A (en) * | 2020-03-09 | 2020-07-10 | 苏宁云计算有限公司 | Text recognition model training method, text recognition device and text recognition equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wu et al. | Autonomous detection of plant disease symptoms directly from aerial imagery | |
CN107392091B (en) | Agricultural artificial intelligence crop detection method, mobile terminal and computer readable medium | |
CN109902548B (en) | Object attribute identification method and device, computing equipment and system | |
WO2023015743A1 (en) | Lesion detection model training method, and method for recognizing lesion in image | |
CN111340141A (en) | Crop seedling and weed detection method and system based on deep learning | |
CN109635875A (en) | A kind of end-to-end network interface detection method based on deep learning | |
EP3455783A1 (en) | Recognition of weed in a natural environment | |
CN108615046A (en) | A kind of stored-grain pests detection recognition methods and device | |
CN110147836A (en) | Model training method, device, terminal and storage medium | |
CN108629289B (en) | Farmland identification method and system and agricultural unmanned aerial vehicle | |
CN110363176B (en) | Image analysis method and device | |
CN115880558B (en) | Farming behavior detection method and device, electronic equipment and storage medium | |
CN113052295B (en) | Training method of neural network, object detection method, device and equipment | |
CN108229274A (en) | Multilayer neural network model training, the method and apparatus of roadway characteristic identification | |
CN110826581A (en) | Animal number identification method, device, medium and electronic equipment | |
CN112668675B (en) | Image processing method and device, computer equipment and storage medium | |
CN110298366A (en) | Crops are distributed extracting method and device | |
Menezes et al. | Pseudo-label semi-supervised learning for soybean monitoring | |
CN110751163B (en) | Target positioning method and device, computer readable storage medium and electronic equipment | |
CN113807143A (en) | Crop connected domain identification method and device and operation system | |
CN116739739A (en) | Loan amount evaluation method and device, electronic equipment and storage medium | |
CN113822348A (en) | Model training method, training device, electronic device and readable storage medium | |
CN116246184A (en) | Papaver intelligent identification method and system applied to unmanned aerial vehicle aerial image | |
CN116597246A (en) | Model training method, target detection method, electronic device and storage medium | |
CN113486879B (en) | Image area suggestion frame detection method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |