CN114445691A

CN114445691A - Model training method and device, electronic equipment and storage medium

Info

Publication number: CN114445691A
Application number: CN202111654018.XA
Authority: CN
Inventors: 何烨林; 魏新明; 肖嵘; 王孝宇
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-05-06

Abstract

The invention discloses a model training method, a device, electronic equipment and a storage medium, wherein the model training method comprises the following steps: performing pooling treatment on the feature map to obtain a target feature vector; the characteristic graph is obtained by performing characteristic extraction on a plurality of human body sample pictures in a characteristic extraction network model; determining classification categories corresponding to the plurality of human body sample pictures according to the target feature vector; calculating loss values of label categories corresponding to the human body sample pictures according to the classification categories so as to train the feature extraction network model according to the loss values; wherein the label category is used as a label to point to the corresponding human body sample picture. Therefore, the expression capability of the model is enhanced, and the accuracy rate of human body re-identification is improved.

Description

Model training method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of deep neural networks, in particular to a model training method and device, electronic equipment and a storage medium.

Background

With the development and progress of artificial intelligence technology, the human body weight recognition technology can realize tracking, matching and identity identification of target people across time and space, is widely applied to aspects of social life, and is one of the research hotspots in the field of computer vision in recent years. Although the existing method for extracting human body image features based on the feature extraction network model has the advantages of high inference speed, mature deployment tools and the like, the model has insufficient expression capability and low accuracy in human body re-identification.

Disclosure of Invention

In a first aspect, the present invention provides a model training method, including:

performing pooling treatment on the feature map to obtain a target feature vector; the characteristic graph is obtained by performing characteristic extraction on a plurality of human body sample pictures in a characteristic extraction network model;

determining classification categories corresponding to the plurality of human body sample pictures according to the target feature vector;

calculating loss values of label categories corresponding to the human body sample pictures according to the classification categories so as to train the feature extraction network model according to the loss values; wherein the label category is used as a label to point to the corresponding human body sample picture.

Optionally, the pooling the feature map to obtain the target feature vector includes:

dividing the characteristic graph into a plurality of pixel point regions;

respectively carrying out generalized average calculation on each pixel point region to obtain a generalized average value corresponding to each pixel point region;

and combining the generalized average values corresponding to the pixel point regions to obtain the target feature vector.

Optionally, the calculating the loss values of the label categories corresponding to the multiple human body sample pictures according to the classification categories to train the feature extraction network model according to the loss values includes:

judging the training times of the human body sample pictures in the feature extraction network model, wherein each training iteration comprises a fixed number of batches of pictures;

when the training times are odd numbers, determining a plurality of first batch of pictures from the training set of the feature extraction network model; wherein the plurality of first batch pictures are obtained by random extraction in the training set;

calculating first loss values corresponding to the first batch of pictures according to a first loss function and a second loss function;

and updating parameters of the feature extraction network model according to the first loss value.

Optionally, the calculating loss values of the label categories corresponding to the multiple human body sample pictures according to the classification categories to train the feature extraction network model according to the loss values further includes:

when the training times are even numbers, determining a plurality of second batch of pictures from the training set of the feature extraction network model; wherein the plurality of second batch of pictures are pictures corresponding to a plurality of different human identity information in the training set;

calculating second loss values corresponding to the second batches of pictures according to the second loss function and a third loss function;

and updating parameters of the feature extraction network model according to the second loss value.

Optionally, the method further comprises:

after the plurality of human body sample pictures are input into the feature extraction network model for first training, calculating the feature amplitude of the human body sample pictures in a training set of the feature extraction network model;

judging whether the characteristic amplitude corresponding to each human body sample picture meets a preset condition or not;

when the human body sample picture is a first type human body sample picture meeting a preset condition, deleting the first type human body sample picture;

and when the human body sample picture is a second type human body sample picture which does not meet the preset condition, reserving the second type human body sample picture, and inputting the feature extraction network model again in the next training process for training.

Optionally, the method further comprises:

acquiring a scene picture acquired by image acquisition equipment;

detecting the scene picture to determine a human body coordinate area in the scene picture;

cutting the human body coordinate area to obtain a human body sample picture;

and inputting the human body sample picture into a database to construct an image base.

Optionally, the method further comprises:

determining a reference characteristic vector corresponding to the picture to be retrieved according to the input picture to be retrieved;

querying a plurality of target feature vectors corresponding to the reference feature vectors in the image base;

calculating feature similarity between a reference feature vector and the plurality of target feature vectors;

and determining a human body sample picture corresponding to the picture to be retrieved according to the characteristic similarity.

In a second aspect, an embodiment of the present invention provides a model training apparatus, including:

the pooling module is used for pooling the feature map to obtain a target feature vector; the characteristic graph is obtained by performing characteristic extraction on a plurality of human body sample pictures in a characteristic extraction network model;

the determining module is used for determining classification categories corresponding to the plurality of human body sample pictures according to the target feature vector;

the calculation module is used for calculating loss values of the label categories corresponding to the human body sample pictures according to the classification categories, and the training module is used for training the feature extraction network model according to the loss values; wherein the label category is used as a label to point to the corresponding human body sample picture.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor executes the computer program to implement the steps of the model training method as described above.

In a fourth aspect, the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the model training method as described above.

The scheme of the invention at least comprises the following beneficial effects:

firstly, performing pooling treatment on a characteristic graph to obtain a target characteristic vector; the characteristic graph is obtained by performing characteristic extraction on a plurality of human body sample pictures in a characteristic extraction network model; determining classification categories corresponding to the plurality of human body sample pictures according to the target feature vector; calculating loss values of label categories corresponding to the human body sample pictures according to the classification categories, and finally training the feature extraction network model according to the loss values; wherein the label category is used as a label to point to the corresponding human body sample picture. Therefore, the expression capability of the model is enhanced, and the accuracy rate of human body re-identification is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

FIG. 1 is a schematic overall flow chart of a model training method according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of step S10 according to an embodiment of the present invention;

FIG. 3 is another schematic flow chart diagram of a model training method according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of a model training method according to an embodiment of the present invention;

FIG. 5 is a schematic flowchart of constructing an image base according to an embodiment of the present invention;

fig. 6 is a schematic flow chart of a human body weight recognition method according to an embodiment of the present invention;

FIG. 7 is a block diagram of a model training apparatus according to an embodiment of the present invention;

fig. 8 is a block diagram of an electronic device according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," and "third," etc. in the description and claims of the present invention and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprises" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The following embodiments of the present application will be described by way of example with reference to the accompanying drawings.

As shown in fig. 1, a specific embodiment of the present invention provides a model training method, including:

s10, performing pooling treatment on the feature map to obtain a target feature vector; the characteristic graph is obtained by performing characteristic extraction on a plurality of human body sample pictures in a characteristic extraction network model.

In this embodiment, the human body sample picture may be a picture of a certain scene captured and cut by an image capturing device, such as a camera or a video camera, the human body sample picture can be obtained by cutting the human body area, the human body sample picture has the image characteristics corresponding to the human body sample picture, the image features comprise color features, edge features, shape features, texture features and the like, and as the human body sample picture comprises a lot of data, the calculation is very slow, so that the human body sample picture can be subjected to feature extraction in the feature extraction network model, the feature extraction represents that effective features in the human body sample picture are extracted, the features can be expressed as numerical values, and when the absolute value of the numerical value is larger, the more obvious the representation characteristic is, extracting the human body sample picture to obtain a characteristic diagram, and further performing pooling treatment on the characteristic diagram; it will be appreciated that pooling is represented as a dimension reduction compression of the input feature map to enable machine learning to speed up computation.

As shown in fig. 2, a specific implementation manner of the step S10 includes:

s11, dividing the feature map into a plurality of pixel point regions;

s12, respectively carrying out generalized average calculation on each pixel point region to obtain a generalized average value corresponding to each pixel point region;

and S13, combining the generalized average values corresponding to the pixel point regions to obtain a target feature vector.

In this embodiment, a human body sample picture is a feature map obtained by extracting a feature extraction network model, in the pooling process, each feature map can be firstly divided into a plurality of pixel point regions, each pixel point region has a plurality of pixel points, each pixel point corresponds to a numerical value, generalized average calculation is performed on the plurality of numerical values of each pixel point region, then a generalized average value corresponding to each pixel point region can be determined, and the plurality of generalized average values are combined, so that a target feature vector can be obtained, and the target feature vector can be used for representing a plurality of features of the human body sample picture; for example, if the target feature vector is a 512-dimensional feature vector, it indicates that the corresponding human sample picture is projected to 512-dimensional euclidean space, so that whether the corresponding human sample picture belongs to the same person can be distinguished by the distance between the target feature vectors.

For example, the feature extraction network model extracts a feature map composed of 20 pixel point regions, each pixel point region includes 4 × 4 — 16 pixel points, the generalized average of the 16 pixel points is calculated through the generalized average, and since there are 20 pixel point regions, 20 generalized average values can be obtained, and the 20 generalized average values are combined, so that the corresponding target feature vector can be obtained.

In an alternative embodiment, the feature extraction network model may employ a ResNet50 residual neural network, and a generalized average pooling (GeM) with an exponential parameter p may be used to convert the feature map to a fixed size during pooling; the formula for the generalized average calculation is as follows:

wherein f is^(g)Representing a characteristic diagram obtained by extracting a characteristic extraction network model, K representing the characteristic diagram is decomposed into K regions, x representing the numerical value of each pixel point on each characteristic diagram in each region, p is a pooling adjustment factor, and χ_kRepresents each region, | χ_kI represents the number of areas, the generalized average pooling strategy mainly divides the characteristic diagram into K areas, and each area is averaged and then adjusted by using an index p

And (4) indexing, and finally combining the generalized average values obtained from different areas to obtain a target characteristic vector, so that information loss can be reduced to the maximum extent, and the accuracy is higher when human body weight recognition is carried out.

And S20, determining classification categories corresponding to the plurality of human body sample pictures according to the target feature vector.

In this embodiment, each human body sample picture can be represented by one target feature vector, the target feature vector can be a high-dimensional feature vector, human body features can be accurately described through the high-dimensional feature vector, the probability that different human body images are the same human body label can be judged by comparing the distances between different feature vectors, and the probability that the same person is obtained is higher as the feature distance is shorter.

S30, calculating loss values of label types corresponding to the human body sample pictures according to the classification types, and training the feature extraction network model according to the loss values; wherein the label category is used as a label to point to a corresponding human sample picture.

In this embodiment, human body sample pictures can be labeled manually or by machine before training, each human body sample picture has a plurality of label categories, target feature vectors obtained after the human body sample pictures are subjected to composite pooling in a pooling layer can be input into a full-connection sorting layer to determine the classification categories, and then the classification categories and the label categories are compared to calculate a loss value, so that the feature extraction network model is trained and updated by gradient reverse transmission, the loss value is reduced, the expression capability of the model is enhanced, and the accuracy is higher during human body heavy identification; wherein, gradient back propagation is expressed by a loss value (transmitted layer by layer backwards, then each layer of network can calculate the gradient according to the transmitted error, and then the parameter of the model is updated, so as to improve the accuracy and stability of the model.

The model training method provided by the invention comprises the steps of firstly performing pooling treatment on a feature map to obtain a target feature vector; the characteristic graph is obtained by performing characteristic extraction on a plurality of human body sample pictures in a characteristic extraction network model; determining classification categories corresponding to the multiple human body sample pictures according to the target characteristic vector; calculating loss values of label categories corresponding to the multiple human body sample pictures according to the classification categories, and finally training the feature extraction network model according to the loss values; wherein the label category is used as a label to point to the corresponding human body sample picture. Therefore, the expression capability of the model is enhanced, and the accuracy rate of human body re-identification is improved.

As shown in fig. 3, the steps after training the feature extraction network model include:

101. judging the training times of a plurality of human body sample pictures in the feature extraction network model, wherein each training iteration comprises a fixed number of batch pictures;

102. when the training times are odd, determining a plurality of first batch images from the training set of the feature extraction network model; wherein, a plurality of first batch pictures are obtained by random extraction in a training set;

103. calculating first loss values corresponding to a plurality of first batch of pictures according to the first loss function and the second loss function;

104. when the training times are even numbers, determining a plurality of second batch of pictures from the training set of the feature extraction network model; the plurality of second batch of pictures are pictures corresponding to a plurality of different human body identity labels in the training set;

105. calculating second loss values corresponding to the second batches of pictures according to the second loss function and the third loss function;

106. and updating parameters of the feature extraction network model according to the first loss value and the second loss value.

In this embodiment, the first loss function may be an arcfacce (additive Angular Margin loss) loss function, and the arcfacce loss function is represented as an additive Angular interval loss function, and may be used to enlarge the distance between different categories; the second loss function may cross the entropy loss function, which is used to calculate the probability of each class; the third loss function may be a triplet loss function, which is a function that learns to make the sample distance inside a class smaller than the distance of samples of different classes.

The method comprises the steps that a model is trained and updated by determining a loss value between a classification category and a label category because the label category has the problems of missing and the like; the first loss value is obtained by respectively calculating losses by the first loss function and the second loss function and summing the calculated losses, and the second loss value is obtained by respectively calculating losses by the second loss function and the third loss function and summing the calculated losses; after each training is finished, whether the training frequency is an odd number or an even number can be judged, when the training frequency is the odd number, n first batch images can be randomly extracted from a training set of the feature extraction network model to be trained again, first loss values corresponding to classification categories and label categories in the n first batch images are calculated through a first loss function and a second loss function, and then the parameter of the feature extraction network model is updated through the first loss values; when the training times are even numbers, because the human body sample pictures in the training set correspond to a plurality of different human body identity labels, a plurality of second batches of pictures corresponding to the human body sample pictures can be determined through each human body identity label, for example, k second batches of pictures are selected from p personal body identity labels for training, meanwhile, a second loss value corresponding to the classification category and the label category in the k second batches of pictures is calculated through a second loss function and a third loss function, and then the parameter updating is carried out on the feature extraction network model through the second loss value; it can be understood that after the first training is completed, n first batches of pictures can be randomly extracted from the training set to be trained again, and after the second training is completed, k second batches of pictures corresponding to p personal identity labels can be determined from the training set to be trained, so that the model is continuously trained, and the accuracy of the model is higher.

As shown in fig. 4, in an alternative embodiment, the steps of the model training method provided by the present invention further include:

201. after a plurality of human body sample pictures are input into the feature extraction network model for first training, calculating the feature amplitude of the human body sample pictures in a training set of the feature extraction network model;

202. judging whether the characteristic amplitude corresponding to each human body sample picture meets a preset condition or not;

203. when the human body sample picture is a first type human body sample picture meeting a preset condition, deleting the first type human body sample picture;

204. and when the human body sample picture is a second type human body sample picture which does not meet the preset condition, reserving the second type human body sample picture, and inputting the feature extraction network model again in the next training process for training.

In this embodiment, because the number of the pictures in the training set is large, the collection method is various, noise pictures inevitably appear in the training set, for example, strong light, low light, human body picture truncation or non-human body pictures and the like appear in the training set, the features extracted from these pictures have a large difference from those extracted from normal human body sample pictures, and the specific expression is that the amplitude distribution of the feature vectors is different; after the feature extraction network model is trained for the first time, the feature amplitude corresponding to all human sample pictures in the training set can be calculated; the preset condition is represented as the first 5% of the characteristic amplitude, the first type of human body sample picture is represented as a human body sample picture with the characteristic amplitude of the first 5%, and the second type of human body sample picture is represented as a human body sample picture with the characteristic amplitude of lower than the first 5%; that is, the larger the feature amplitude is, the lower the quality of the human body sample picture is, so that the first type human body sample picture 5% before the feature amplitude can be deleted, and the second type human body sample picture is input into the feature extraction network model again for training.

It can be understood that the magnitude of the feature amplitude obtained by extracting the feature extraction network model can be used to represent the quality of the human body sample picture, and therefore, the feature amplitude calculation formula is as follows:

wherein | A | represents a characteristic amplitude value, and x represents a response value of each pixel point in the characteristic diagram; the patent finds that the characteristic amplitude of the low-quality noise picture is generally large, so that after a first round of model training, 5% of training pictures with the maximum characteristic amplitude are removed and a new training model is selected, and therefore the stability of the model is improved.

For example, after the first round of training is completed, 3000 human body sample pictures are included in the training set, and the characteristic amplitudes of 3000 human body sample pictures are calculated, after the calculation is completed, 150 human body sample pictures have the largest characteristic amplitudes, and the remaining 2850 human body sample pictures are all lower than the 150 human body sample pictures, so that the 150 human body sample pictures can be deleted, and the remaining 2850 human body sample pictures are input into the model again for training, so that the stability of the model can be improved greatly, and the process of identifying the weight of the person through the model is more accurate.

As shown in fig. 5, the steps of the model training method provided by the present invention further include:

301. acquiring a scene picture acquired by image acquisition equipment;

302. detecting the scene picture to determine a human body coordinate area in the scene picture;

303. cutting the human body coordinate area to obtain a human body sample picture;

304. and inputting the human body sample picture into a database to construct an image base.

In this embodiment, image acquisition equipment can be the camera, the machine of taking a candid photograph, after image acquisition equipment gathered the scene picture of video stream or taking a candid photograph, can detect the scene picture through human detection model, and determine the human coordinate region, through cutting out the human coordinate region, and then determine human sample picture and input database, thereby constitute the image end library, when needing to train feature extraction network model, can extract human sample picture input feature extraction network model from the image end library, and then pooling again and sampling in turn through multiple loss function, input feature network model trains at last, therefore stability and the degree of accuracy with the promotion model.

For example, a scene picture contains a square, a plurality of pedestrians, trees and the like, the human body detection model can identify the travel human foreground image and the square and the trees as background images, so that the pedestrian area can be positioned and cut, the human body sample pictures of a plurality of different pedestrians can be determined, and the human body sample pictures are input into the database to construct an image base.

As shown in fig. 6, in an alternative embodiment, after training a model, the method for recognizing a human body weight provided in the embodiment of the present invention includes:

401. determining a reference characteristic vector corresponding to the picture to be retrieved according to the input picture to be retrieved;

402. querying a plurality of target feature vectors corresponding to the reference feature vectors in an image base;

403. calculating feature similarity between the reference feature vector and the plurality of target feature vectors;

404. and determining a corresponding human body sample picture with the picture to be retrieved according to the characteristic similarity.

In the embodiment, the human body weight recognition is expressed as human body re-recognition and is mainly used for judging whether a specific pedestrian exists in an image or a video sequence so as to make up for the visual limitation of a fixed camera; the picture to be retrieved can be acquired by image acquisition equipment, the picture to be retrieved is pooled to determine a reference characteristic vector, then the reference characteristic vector is compared with a plurality of target characteristic vectors in an image base, and further the characteristic similarity between the reference characteristic vector and the target characteristic vectors is determined, when the characteristic similarity between a certain target characteristic vector and the reference characteristic vector is closest, the human body sample picture corresponding to the target characteristic vector is consistent with the picture to be retrieved, so that the human body sample picture can be recommended, and further the characteristics such as figure information and the like corresponding to the picture to be retrieved are determined, so that the accuracy is higher during human body re-identification; optionally, the reference feature vectors and the plurality of target feature vectors may be ranked according to feature similarity, so as to determine corresponding target feature vectors.

As shown in fig. 7, an embodiment of the present invention provides a model training apparatus 10, including:

the pooling module 11 is used for pooling the feature map to obtain a target feature vector; the characteristic graph is obtained by performing characteristic extraction on a plurality of human body sample pictures in a characteristic extraction network model;

the determining module 12 is configured to determine classification categories corresponding to the multiple human body sample pictures according to the target feature vector;

the calculation module 13 is used for calculating loss values of label categories corresponding to the human body sample pictures according to the classification categories, and the training module 14 is used for training the feature extraction network model according to the loss values; wherein the label category is used as a label to point to the corresponding human body sample picture.

The model training device 10 provided by the invention firstly performs pooling processing on the characteristic diagram to obtain a target characteristic vector; the characteristic graph is obtained by performing characteristic extraction on a plurality of human body sample pictures in a characteristic extraction network model; determining classification categories corresponding to a plurality of human body sample pictures according to the target characteristic vector; calculating loss values of label categories corresponding to the multiple human body sample pictures according to the classification categories, and finally training the feature extraction network model according to the loss values; wherein the label category is used as a label to point to the corresponding human body sample picture. Therefore, the expression capability of the model is enhanced, and the accuracy rate of human body re-identification is improved.

It should be noted that the model training device 10 provided in the embodiment of the present invention is a device corresponding to the model training method, all embodiments of the model training method are applicable to the model training device 10, and corresponding modules in the embodiment of the model training device 10 correspond to the steps in the model training method, so that the same or similar beneficial effects can be achieved, and in order to avoid too many repetitions, each module in the model training device 2 is not described in detail herein.

As shown in fig. 8, the embodiment of the present invention further provides an electronic device 20, which includes a memory 202, a processor 201, and a computer program stored in the memory 202 and executable on the processor 201, wherein the processor 201 implements the steps of the model training method when executing the computer program.

Specifically, the processor 201 is configured to call the computer program stored in the memory 202, and execute the following steps:

pooling the characteristic diagram to obtain a target characteristic vector; the characteristic graph is obtained by performing characteristic extraction on a plurality of human body sample pictures in a characteristic extraction network model;

determining classification categories corresponding to a plurality of human body sample pictures according to the target characteristic vector;

calculating loss values of label categories corresponding to the multiple human body sample pictures according to the classification categories, and training the feature extraction network model according to the loss values; wherein the label category is used as a label to point to the corresponding human body sample picture.

Optionally, the pooling of the feature map performed by the processor 201 to obtain the target feature vector includes:

dividing the characteristic graph into a plurality of pixel point regions;

and combining the generalized average values corresponding to the pixel point regions to obtain a target feature vector.

Optionally, the calculating, by the processor 201, the loss values of the label categories corresponding to the multiple human body sample pictures according to the classification categories to train the feature extraction network model according to the loss values includes:

judging the training times of a plurality of human body sample pictures in the feature extraction network model, wherein each training iteration comprises a fixed number of batches of pictures;

when the training times are odd, determining a plurality of first batch of pictures from the training set of the feature extraction network model; wherein, a plurality of first batch pictures are obtained by random extraction in a training set;

calculating first loss values corresponding to a plurality of first batch of pictures according to the first loss function and the second loss function;

Optionally, the tag category includes a corresponding human identity tag; the processor 201 calculates loss values of label categories corresponding to the multiple human body sample pictures according to the classification categories, so as to train the feature extraction network model according to the loss values, and further includes:

when the training times are even numbers, determining a plurality of second batch of pictures from the training set of the feature extraction network model; the plurality of second batch of pictures are pictures corresponding to a plurality of different human body identity labels in the training set;

calculating second loss values corresponding to a plurality of second batches of pictures according to the second loss function and the third loss function;

Optionally, the method executed by the processor 201 further includes:

after a plurality of human body sample pictures are input into the feature extraction network model for first training, calculating the feature amplitude of the human body sample pictures in a training set of the feature extraction network model;

when the human body sample picture is a first type human body sample picture meeting the preset condition, deleting the first type human body sample picture;

and when the human body sample picture is a second type human body sample picture which does not meet the preset condition, reserving the second type human body sample picture, and inputting the feature extraction network model again in the next training process to train.

Optionally, the method executed by the processor 201 further includes:

acquiring a scene picture acquired by image acquisition equipment;

cutting the human body coordinate area to obtain a human body sample picture;

Optionally, the method executed by the processor 201 further includes:

querying a plurality of target feature vectors corresponding to the reference feature vectors in an image base;

calculating feature similarity between the reference feature vector and the plurality of target feature vectors;

and determining a corresponding human body sample picture with the picture to be retrieved according to the characteristic similarity.

That is, in the embodiment of the present invention, the processor 201 of the electronic device 20 implements the steps of the model training method when executing the computer program, so as to enhance the expression capability of the model and improve the accuracy rate of human body re-recognition.

It should be noted that, since the steps of the model training method described above are implemented when the processor 201 of the electronic device 20 executes the computer program, all embodiments of the model training method described above are applicable to the electronic device 20, and can achieve the same or similar beneficial effects.

The computer-readable storage medium provided in the embodiments of the present invention stores a computer program thereon, and when the computer program is executed by a processor, the computer program implements each process of the model training method or the application-side model training method provided in the embodiments of the present invention, and can achieve the same technical effect, and is not described herein again to avoid repetition.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method of model training, comprising:

2. The model training method of claim 1, wherein pooling the feature maps to obtain the target feature vector comprises:

dividing the feature map into a plurality of pixel point regions;

and combining the generalized average values corresponding to the pixel point areas to obtain the target characteristic vector.

3. The model training method according to claim 1, wherein the calculating the loss values of the label classes corresponding to the plurality of human body sample pictures according to the classification classes to train the feature extraction network model according to the loss values comprises:

judging the training times of the plurality of human body sample pictures in the feature extraction network model, wherein each training iteration comprises a fixed number of batches of pictures;

4. The model training method of claim 3, wherein the label categories include corresponding human identity labels;

the calculating the loss values of the label categories corresponding to the human body sample pictures according to the classification categories so as to train the feature extraction network model according to the loss values, further comprising:

when the training times are even numbers, determining a plurality of second batch of pictures from the training set of the feature extraction network model; wherein the plurality of second batch of pictures are pictures corresponding to a plurality of different human identity labels in the training set;

5. The model training method of claim 1, further comprising:

6. The model training method of claim 1, further comprising:

acquiring a scene picture acquired by image acquisition equipment;

cutting the human body coordinate area to obtain a human body sample picture;

7. The model training method of claim 6, further comprising:

determining a reference characteristic vector corresponding to an input picture to be retrieved according to the input picture to be retrieved;

and determining a human body sample picture corresponding to the picture to be retrieved according to the feature similarity.

8. A model training apparatus, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the model training method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the model training method according to any one of claims 1 to 7.