CN112861689A

CN112861689A - Searching method and device of coordinate recognition model based on NAS technology

Info

Publication number: CN112861689A
Application number: CN202110137913.8A
Authority: CN
Inventors: 王蔚; 田晓玮; 聂学成
Original assignee: Shanghai Yitu Network Science and Technology Co Ltd
Current assignee: Shanghai Yitu Network Science and Technology Co Ltd
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2021-05-28

Abstract

The application relates to the technical field of computer vision, in particular to a coordinate recognition model searching method and device based on NAS (network attached storage) technology, and particularly relates to a coordinate recognition model training method and device, wherein the value range of each hyper-parameter of a baseline model is obtained, and the hyper-parameters under different values are combined according to the value range of each hyper-parameter to generate a plurality of value combinations; respectively setting each hyper-parameter of the baseline model to each value in any one numerical combination aiming at each numerical combination to obtain a candidate coordinate identification model under the numerical combination; respectively inputting the image sample set into any candidate coordinate recognition model for training aiming at each candidate coordinate recognition model, and calculating the error value of the candidate coordinate recognition model; and taking the candidate coordinate recognition model meeting the preset error value condition as the finally optimized coordinate recognition model, thus realizing the automatic design of the coordinate recognition model by combining the NAS.

Description

Searching method and device of coordinate recognition model based on NAS technology

Technical Field

The application relates to the technical field of computer vision, in particular to a coordinate recognition model searching method and device based on NAS technology. The application particularly provides a training method and device of a coordinate recognition model.

Background

At present, when recognizing the motion type of a human body included in an image to be recognized, the recognition is generally realized by a neural network model which is manually designed and trained in advance. However, since the manual model design method lacks targeted optimization of the human motion category, the accuracy and speed of the neural network model in recognizing the human motion are reduced.

In order to solve the above problem, in the related art, a network structure included in a Neural network model may be automatically designed by a Neural network Search (NAS). NAS typically includes three main modules, a search space, a search policy, and a performance evaluation policy. Under different scenes, models generally have different requirements on a search space, a search strategy and a performance evaluation strategy, so that if the NAS is applied to coordinate identification, the three main modules of the NAS need to be specifically designed according to the actual requirements of the coordinate identification scene. Therefore, how to realize automatic design of the seat identity model by combining the NAS becomes a problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a training method and a training device for a coordinate recognition model, which are used for realizing automatic design of a coordinate recognition model by combining an NAS (network attached storage).

The embodiment of the application provides the following specific technical scheme:

a training method of a coordinate recognition model comprises the following steps:

acquiring the value range of each hyper-parameter of the baseline model, and combining the hyper-parameters under different values according to the value range of each hyper-parameter to generate a plurality of value combinations, wherein the value combinations comprise each hyper-parameter and the value of each hyper-parameter;

respectively setting each hyper-parameter of the baseline model to each value in any one numerical combination aiming at each numerical combination to obtain a candidate coordinate identification model under the numerical combination;

respectively inputting an image sample set into any candidate coordinate recognition model for training aiming at each candidate coordinate recognition model, and calculating an error value of the candidate coordinate recognition model, wherein the image sample set comprises each image sample and a corresponding sample label, and the sample label represents a real two-dimensional coordinate of each human body key point contained in the image sample;

and taking the candidate coordinate recognition model meeting the preset error value condition as the finally optimized coordinate recognition model.

Optionally, respectively setting each hyper-parameter of the baseline model to each value in any one numerical combination for each numerical combination, and after obtaining the candidate coordinate identification model under the numerical combination, further includes:

acquiring decision information, wherein the decision information is a search strategy of random sampling, a search strategy based on reinforcement learning or a search strategy based on an evolutionary algorithm, and the search strategy information;

and determining each candidate coordinate recognition model for final training from each candidate coordinate recognition model by adopting the decision information.

Optionally, before obtaining the value range of each hyper-parameter of the baseline model, the method further includes:

acquiring target operation conditions input by a user, wherein the target operation conditions at least comprise target speed conditions and/or target precision conditions;

and searching the baseline model meeting the target operation condition from the candidate baseline models contained in the preset model database.

Optionally, the inputting the image sample set into any candidate coordinate recognition model for training, and calculating an error value of the candidate coordinate recognition model specifically includes:

respectively inputting any image sample into any candidate coordinate identification model aiming at each image sample in the obtained image sample set, identifying and obtaining each human body key point contained in the human body in the image sample and a predicted two-dimensional coordinate corresponding to each human body key point, and respectively calculating a Euclidean distance value between each predicted two-dimensional coordinate and the corresponding real two-dimensional coordinate;

and determining the error value of the candidate coordinate recognition model according to the calculated Euclidean distance values, the area of the image sample and a preset recognition difficulty coefficient.

Optionally, the step of using the candidate coordinate recognition model meeting the preset error value condition as the finally optimized coordinate recognition model specifically includes:

and taking the candidate coordinate identification model corresponding to the minimum error value as the finally optimized coordinate identification model.

Optionally, the hyper-parameter includes at least one or any combination of the following: the number of convolution channels, the number of convolution layers, and the type of convolution.

Optionally, after the candidate coordinate recognition model corresponding to the minimum error value is used as the finally optimized coordinate recognition model, the method further includes:

acquiring an image to be recognized, wherein the image to be recognized comprises a human body;

based on the optimized coordinate recognition model, recognizing each human body key point contained in the image to be recognized by taking the image to be recognized as an input parameter, and acquiring two-dimensional coordinates of each human body key point;

and identifying the human body action category of the human body contained in the image to be identified according to the two-dimensional coordinates of the key points of the human body.

A training apparatus of a coordinate recognition model, comprising:

the first acquisition module is used for acquiring the value range of each hyper-parameter of the baseline model, and combining the hyper-parameters under different values according to the value range of each hyper-parameter to generate a plurality of value combinations, wherein each value combination comprises each hyper-parameter and the value of each hyper-parameter;

the combination module is used for setting each hyper-parameter of the baseline model to each value in any numerical combination respectively aiming at each numerical combination to obtain a candidate coordinate identification model under the numerical combination;

the training module is used for inputting the image sample set into any one candidate coordinate recognition model for training respectively aiming at each candidate coordinate recognition model and calculating an error value of the candidate coordinate recognition model, wherein the image sample set comprises each image sample and a corresponding sample label, and the sample label represents a real two-dimensional coordinate of each human body key point contained in the image sample;

and the selection module is used for taking the candidate coordinate recognition model meeting the preset error value condition as the finally optimized coordinate recognition model.

the second acquisition module is used for acquiring decision information, wherein the decision information is a search strategy of random sampling, a search strategy based on reinforcement learning or a search strategy based on an evolutionary algorithm, and the search strategy information is obtained;

and the determining module is used for determining each candidate coordinate recognition model for final training from each candidate coordinate recognition model by adopting the decision information.

the third acquisition module is used for acquiring target operation conditions input by a user, wherein the target operation conditions at least comprise a target speed condition and/or a target precision condition;

and the searching module is used for searching the baseline model meeting the target operation condition from the candidate baseline models contained in the preset model database.

Optionally, when the image sample set is input into any candidate coordinate recognition model for training, and an error value of the candidate coordinate recognition model is calculated, the training module is specifically configured to:

Optionally, the selection module is specifically configured to:

the fourth acquisition module is used for acquiring an image to be identified, wherein the image to be identified comprises a human body;

the first identification module is used for identifying each human body key point contained in the image to be identified based on the optimized coordinate identification model by taking the image to be identified as an input parameter and acquiring a two-dimensional coordinate of each human body key point;

and the second identification module is used for identifying the human body action category of the human body contained in the image to be identified according to the two-dimensional coordinates of the key points of the human body.

An electronic device comprises a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor implements the steps of the training method of the coordinate recognition model when executing the program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned training method of the coordinate recognition model.

In the embodiment of the application, the value ranges of the hyper-parameters of the baseline model are obtained, the hyper-parameters under different values are combined according to the value ranges of the hyper-parameters to generate a plurality of value combinations, the hyper-parameters of the baseline model are set to be the values of any value combination respectively aiming at the value combinations to obtain candidate coordinate recognition models under the value combinations, the image sample set is input into any candidate coordinate recognition model respectively aiming at the candidate coordinate recognition models to be trained, the error values of the candidate coordinate recognition models are calculated, the candidate coordinate recognition models meeting the preset error value conditions are taken as the finally optimized coordinate recognition models, therefore, the finally optimized coordinate recognition models are obtained by combining the values of the hyper-parameters of the models, the performance requirements of the baseline model are reduced, the time for manually designing the baseline model can be saved, the baseline model is subjected to targeted optimization according to the target data and the error value, the automatic design of the coordinate identification model is realized by combining the NAS, and the precision and the speed of the model can be further improved on the basis of manually designing the baseline model.

Drawings

FIG. 1 is a diagram illustrating a posture estimation of a human body in the related art;

FIG. 2 is a flowchart illustrating a method for training a coordinate recognition model according to an embodiment of the present disclosure;

FIG. 3 is a diagram illustrating identification of key points of a human body according to an embodiment of the present disclosure;

FIG. 4 is another flowchart of a method for training a coordinate recognition model according to an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of a training apparatus for a coordinate recognition model according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Currently, human pose estimation is one of the most challenging research directions in the field of computer vision.

In the related art, when estimating the human body posture included in an image, an original image is generally input into a trained neural network model, the estimation of the human body posture is realized through a neural network model manually designed and trained in advance, and based on a mainstream network structure in the neural network model, the depth, the width, the jump link and the up-sampling mode of the neural network model are adjusted to train the neural network model, so that the human body posture estimation is realized based on the trained neural network model, which is shown in fig. 1 and is a schematic diagram of the related art for estimating the human body posture. Moreover, the expression capability of the backbone network of the neural network model is a main module for determining the performance of estimating the human posture, and in the manner in the related art, due to the lack of targeted optimization for estimating the human posture, the accuracy and the speed of the coordinate recognition model are reduced.

In order to solve the above problem, in the related art, a network structure included in a Neural network model may be automatically designed by a Neural network Search (NAS). NAS is a technology that automatically designs a network structure for a specific task objective, and is generally used to replace hyper-parameters in a manually designed model. A complete NAS algorithm typically includes three main modules, search space, search strategy and performance evaluation strategy. The search space defines the hyper-parameters and their optional ranges that can be used for searching, each of which in combination constitutes a complete network configuration. The search strategy defines a strategy for finding the optimal model in the search space. The performance evaluation policy is a performance indicator for evaluating a network configuration. Under different scenes, the model generally has different requirements for a search space, a search strategy and a performance evaluation strategy, and therefore, if the NAS is to be applied to the process of generating the coordinate recognition model, the three main modules of the NAS need to be specifically designed in combination with the actual requirements of the scene recognized by coordinates. Therefore, how to realize automatic design of the seat identity model by combining the NAS becomes a problem to be solved urgently.

In the embodiment of the application, the value ranges of the hyper-parameters of the baseline model are obtained, the hyper-parameters under different values are combined according to the value ranges of the hyper-parameters to generate a plurality of value combinations, the hyper-parameters of the baseline model are set to be the values in any one value combination respectively aiming at the value combinations to obtain candidate coordinate recognition models under the value combinations, the image sample set is input into any one candidate coordinate recognition model respectively aiming at the candidate coordinate recognition models to be trained, the error values of the candidate coordinate recognition models are calculated, the candidate coordinate recognition models meeting the preset error value conditions are taken as the finally optimized coordinate recognition models, therefore, the model search is utilized to optimize the network structure of the coordinate recognition models, on one hand, the hyper-parameters of the coordinate recognition models are mainly obtained through the value combinations, the performance requirement on the baseline model is lowered, the time for manually designing the model can be saved, on the other hand, the automatic design of the coordinate identification model can be realized by combining the NAS, the model search can be optimized in a targeted mode according to the target operation information, and the obtained coordinate identification model can be further improved on the basis of manually designing the model.

Based on the foregoing embodiment, the present application relates to a search of a coordinate recognition model based on the NAS technology, and in particular, to a training method of a coordinate recognition model, which is shown in fig. 2 and is a flowchart of the training method of the coordinate recognition model in the present application embodiment, and specifically includes:

step 200: and acquiring the value range of each hyper-parameter of the baseline model, and combining the hyper-parameters under different values according to the value range of each hyper-parameter to generate a plurality of value combinations.

The numerical combination comprises all super parameters and values of all super parameters.

In the embodiment of the application, a baseline model is obtained, the value range of each hyper-parameter of the baseline model is obtained, then the hyper-parameters under different values are randomly combined according to the value range of each hyper-parameter, and a plurality of value combinations are generated, wherein each value combination comprises each hyper-parameter and the value of each hyper-parameter.

For example, assuming that the hyper-parameters in the baseline model are the number of convolution channels and the number of convolution layers, the range of values of the number of convolution channels is 2 to 4, and the range of values of the number of convolution layers is 3 to 4, the hyper-parameters under different values are randomly combined to generate a plurality of value combinations, where the value combinations are: the number of convolution channels is 2 and the number of convolution layers is 3, the number of convolution channels is 3 and the number of convolution layers is 3, the number of convolution channels is 4 and the number of convolution layers is 3, the number of convolution channels is 2 and the number of convolution layers is 4, the number of convolution channels is 3 and the number of convolution layers is 4, the number of convolution channels is 4 and the number of convolution layers is 4.

Of course, the type and number of the hyper-parameters in the embodiment of the present application are not limited.

It should be noted that the training method of the coordinate recognition model in the embodiment of the present application may be implemented based on NAS.

Further, since the coordinate identification model is obtained by optimizing the baseline model, in order to improve the operation speed and processing accuracy of the coordinate identification model, it is also required to ensure that the operation speed and processing accuracy of the baseline model can meet the requirements of the user. Therefore, in the embodiment of the present application, a baseline model that can substantially meet the user requirements needs to be obtained first. The following describes in detail a manner of obtaining the baseline model in the embodiment of the present application, which specifically includes:

s1: and acquiring target operating conditions input by a user.

Wherein the target operating condition comprises at least a target speed condition and/or a target accuracy condition.

In the embodiment of the application, a user inputs the target operating conditions which can be reached by the expected baseline model into the server, so that the server obtains the target operating conditions input by the user.

The content included in the target operating condition can be at least divided into the following three types:

the first method comprises the following steps: a target speed condition.

In the embodiment of the application, the target speed condition of the baseline model is preset, so that the speed of the baseline model is more likely to be close to the speed range of the coordinate recognition model, and the probability that each candidate coordinate recognition model in the value-taking range of the over-parameter can reach the target speed is higher.

The target speed condition represents an operation speed range which should be met when the two-dimensional coordinate of the key point of the human body is identified by the baseline model.

And the second method comprises the following steps: target accuracy conditions.

In the embodiment of the application, the target precision condition of the baseline model is preset, so that the higher the probability that each candidate coordinate recognition model of the baseline model in the value range exceeding the parameters can reach the target precision is, and the probability of obtaining a better coordinate recognition model can be improved.

The target precision condition represents a processing precision range which the baseline model should meet when the two-dimensional coordinate of the key point of the human body is identified.

And the third is that: a target speed condition and a target accuracy condition.

In the embodiment of the application, the target running conditions comprise a target speed condition and a target precision condition, so that the coordinate recognition models of all candidates of the baseline model in the value range exceeding the parameters can be ensured to reach the target precision and the target speed.

It should be noted that, in order to further enhance the optimization of the model, the velocity of the baseline model needs to be closer to the target velocity range of the coordinate recognition model, and the accuracy of the candidate coordinate recognition model should also float on the basis of the baseline model.

A2: and searching the baseline model meeting the target operation condition from the candidate baseline models contained in the preset model database.

In the embodiment of the application, the baseline model meeting the target operation condition is found from the candidate baseline models contained in the preset model database according to the preset target operation condition.

Further, the baseline model may also be input by the user in advance, each level of the basic network layer input by the user in advance is obtained, and the baseline model is generated according to each level of the basic network layer.

The baseline model is generally composed of convolution blocks composed of a plurality of convolution layers, a Batch regularization (BN) layer, and an active layer, and the convolution blocks may also include common convolution blocks such as a residual convolution layer and a depth separable convolution layer, which is not limited in the embodiment of the present application.

It should be noted that each layer of each volume block of the baseline model includes a plurality of hyper-parameters, and each hyper-parameter corresponds to one value.

Step 210: and respectively setting each hyper-parameter of the baseline model to each value in any one numerical combination aiming at each numerical combination, and obtaining a candidate coordinate identification model under the numerical combination.

In the embodiment of the application, after each numerical combination is obtained, the values of the hyper-parameters in the numerical combination are correspondingly filled into the hyper-parameters of the baseline model respectively for each numerical combination, so that the candidate coordinate identification model under the numerical combination is obtained. Thus, candidate coordinate recognition models for each combination of numerical values can be obtained.

Thus, the configuration of the network architecture included in the coordinate recognition model can be determined according to the neural network parameters given based on the numerical combination, and then a coordinate recognition model corresponding to the neural network architecture and the neural network parameters can be defined, thereby obtaining each candidate coordinate recognition model.

Further, in order to reduce the amount of computation, and ensure the accuracy and speed of the coordinate recognition model while minimizing the amount of computation as much as possible, the method may further ensure that the optimal coordinate recognition model is selected, in this embodiment of the present application, after obtaining the coordinate recognition models of each candidate, the method may further determine, by using decision information, the coordinate recognition model of each candidate that needs to be finally trained from the coordinate recognition models of each candidate, and specifically includes:

s1: and acquiring decision information.

The decision information is a search strategy of random sampling, a search strategy based on reinforcement learning or a search strategy based on an evolutionary algorithm, and search strategy information.

In the embodiment of the present application, the decision information is used to help determine the next numerical combination to be trained, and therefore, the decision information input by the user is obtained.

The decision information may be a randomly sampled search strategy, that is, N sets of numerical combinations are randomly selected from each numerical combination for training.

The decision information may also be a search strategy based on reinforcement learning. And randomly selecting a group of numerical combinations from the numerical combinations as the initial state of the coordinate recognition model. In the process of training the candidate coordinate recognition model in the initial state, random parameters are added to the candidate coordinate recognition model, and the candidate coordinate recognition model is disturbed for iteration. And after the random parameter disturbs the candidate coordinate identification model, evaluating the precision of the candidate coordinate identification model, calculating a corresponding value score when the random parameter disturbs the candidate coordinate identification model according to the change condition of the precision of the candidate coordinate identification model, and determining the probability of selecting the random parameter disturbs the candidate coordinate identification model in the subsequent iteration process according to the calculated value score. In this way, after multiple rounds of iterative training, the candidate coordinate recognition model with the highest precision obtained in the iterative training process is selected as the candidate coordinate recognition model for final training.

It should be noted that the value score represents the probability of disturbance of the candidate coordinate recognition model by selecting the random parameter in the subsequent iterative training process.

The decision information may also be a search strategy based on an evolutionary algorithm. The search strategy based on the evolutionary algorithm in the embodiment of the present application is described in detail below by using a specific example.

Firstly, binary coding is carried out on each hyper-parameter in a candidate coordinate recognition model, a plurality of groups of numerical value combinations are randomly selected from the numerical value combinations, the selected plurality of groups of numerical value combinations are used as an initial population N, the candidate coordinate recognition models generated by the numerical value combinations in the population N are trained, and the candidate coordinate recognition models meeting the precision condition are selected from the trained candidate coordinate recognition models. And carrying out pairwise cross pairing on the hyper-parameter configuration of the selected candidate coordinate recognition model according to the binary code. And carrying out random variation on the numerical value combination corresponding to the candidate coordinate identification model with lower precision to generate a new population for continuous precision evaluation. And after iteration is carried out for T times, selecting the candidate coordinate identification model with the highest precision obtained in the iteration process as a search result.

S2: and determining the candidate coordinate recognition models which are finally trained from the candidate coordinate recognition models by adopting the decision information.

In the embodiment of the application, a search strategy is adopted to determine candidate coordinate recognition models which need to be trained finally from the candidate coordinate recognition models.

For example, the type of convolution module, the number of layers of convolution and the number of convolution channels designed for neural network search, and the type of convolution module, the range of values of the number of layers of convolution and the range of values of the number of convolution channels are designed. The method comprises the steps of taking a baseline model as a template, selecting hyper-parameters in the baseline model as objects of neural network search, obtaining each numerical combination, filling each numerical combination into the baseline model to obtain a coordinate recognition model of each candidate, and searching the coordinate recognition model of each candidate based on a search strategy of an evolutionary algorithm, wherein the search strategy has the function of helping to judge the next parameter combination to be tried.

It should be noted that for each value combination for which a search is initiated, a smaller number of iterations is usually selected to save training time, provided that the accuracy of the model at the number of iterations is better preserved than the accuracy of the model when the model is completely trained.

Step 220: respectively inputting the image sample set into any candidate coordinate recognition model for training aiming at each candidate coordinate recognition model, and calculating the error value of the candidate coordinate recognition model, wherein the image sample set comprises each image sample and a corresponding sample label, and the sample label represents the real two-dimensional coordinate of each human body key point contained in the image sample.

In the embodiment of the present application, first, an image sample set is obtained.

It should be noted that the image sample set includes each image sample and a sample label corresponding to each image sample, and the sample label represents the real two-dimensional coordinates of each human body key point included in the image sample.

Then, after the image sample set is obtained, the obtained image sample set is input into any candidate coordinate recognition model for training respectively aiming at the candidate coordinate recognition models, each human body key point contained in each image sample and a predicted two-dimensional coordinate corresponding to each human body key point are obtained, and an error value of the candidate coordinate recognition model is obtained based on the predicted two-dimensional coordinates.

The following describes in detail a manner of obtaining an error value of each candidate coordinate recognition model in the embodiment of the present application, which specifically includes:

s1: respectively inputting any image sample into any candidate coordinate identification model aiming at each image sample in the obtained image sample set, identifying and obtaining each human body key point contained in the human body in the image sample and the predicted two-dimensional coordinate corresponding to each human body key point, and respectively calculating the Euclidean distance value between each predicted two-dimensional coordinate and the corresponding real two-dimensional coordinate.

The following operation steps are executed respectively for each image sample contained in the image sample set:

firstly, inputting any image sample into any candidate coordinate identification model, and identifying and obtaining each human body key point contained in the human body in the image sample and the predicted two-dimensional coordinate corresponding to each human body key point.

And then, respectively calculating Euclidean distances between each predicted two-dimensional coordinate and the real two-dimensional coordinate in the corresponding sample label.

S2: and determining the error value of the candidate coordinate recognition model according to the calculated Euclidean distance values, the area of the image sample and a preset recognition difficulty coefficient.

In the embodiment of the application, the average value of Euclidean distance values corresponding to each human body key point in any one image sample is calculated respectively for each image sample in an image sample set, the Euclidean distance value of the image sample is obtained, the ratio between the Euclidean distance and the area of the image sample is calculated, so that the ratio of each image sample can be obtained, and then the error value of the candidate coordinate identification model is determined by determining the product between the ratio corresponding to each image sample and the identification difficulty coefficient.

It should be noted that the error value of the candidate coordinate recognition model is a calculated numerical value, and if the error value is larger, the recognition effect of the candidate coordinate recognition model is determined to be better, and if the error value is smaller, the recognition effect of the candidate coordinate recognition model is determined to be worse.

Step 230: and taking the candidate coordinate recognition model meeting the preset error value condition as the finally optimized coordinate recognition model.

In this embodiment, when step 230 is executed, the method specifically includes:

in the embodiment of the application, after the error value of each candidate coordinate recognition model is obtained, the candidate coordinate recognition model corresponding to the minimum error value is used as the finally selected coordinate recognition model, and the coordinate recognition model is the optimal model.

Further, after obtaining the finally optimized coordinate recognition model, the human body motion recognition can be performed according to the optimized coordinate recognition model, which specifically includes:

s1: and acquiring an image to be identified.

Wherein, the image to be identified comprises a human body.

In the embodiment of the application, after the image acquisition equipment acquires the image to be identified, the image to be identified is sent to the server, and thus the server can receive the image to be identified sent by the image acquisition equipment.

It should be noted that the image to be recognized includes a human body, and the image to be recognized may include one human body or a plurality of human bodies, which is not limited in this embodiment of the application.

The image capturing device may be, for example, a camera, which is not limited in the embodiment of the present application.

It should be further noted that the image to be recognized in the embodiment of the present application may be an image only including a human body, the image to be recognized may also be an image including a human body and other objects, if the image to be recognized also includes other objects, the image to be recognized needs to be subjected to human body detection first, the human body is marked out from the image to be recognized through the external rectangular frame, and the image only including the human body is obtained by capturing, so that the image obtained after capturing is used as the image to be recognized that needs to be partitioned.

The image to be recognized may be an RGB image, for example.

S2: and based on the optimized coordinate recognition model, recognizing each human body key point contained in the image to be recognized by taking the image to be recognized as an input parameter, and acquiring the two-dimensional coordinates of each human body key point.

In the embodiment of the application, an image to be recognized is input into an optimized coordinate recognition model, feature extraction is performed on the image to be recognized, key point features of the image to be recognized are obtained, the occurrence probability of each human body key point in each position of the image to be recognized is predicted, each human body key point included in a human body in the image to be recognized is obtained according to the probability value, two-dimensional coordinates of each human body key point are respectively obtained, and the method is shown in fig. 3 and is a schematic diagram of the image to be recognized for the human body key points.

S3: and identifying the human body action category of the human body contained in the image to be identified according to the two-dimensional coordinates of each human body key point.

In the embodiment of the application, in the related technology, a human body posture estimation network design process usually selects a model structure from an existing mainstream open source framework aiming at a data set collected under a batch of actual scenes, and trains and tests by combining a new data set, but the mode in the related technology usually has the condition that the model precision cannot be expected or the speed does not meet the requirement, so that a basic model is adjusted by combining some model optimization strategies, and a model meeting the target requirement can be obtained by often needing multiple rounds of iterative experiments, in order to solve the problem in the related technology, in the embodiment of the application, the value range of each hyper-parameter of a baseline model is obtained, and the hyper-parameters under different values are combined according to the value range of each hyper-parameter to generate a plurality of value combinations aiming at each value combination, setting each hyper-parameter of the baseline model as each value in any numerical combination, obtaining a candidate coordinate recognition model under the numerical combination, inputting the image sample set into any candidate coordinate recognition model for training respectively aiming at each candidate coordinate recognition model, calculating the error value of the candidate coordinate recognition model, and taking the candidate coordinate recognition model meeting the preset error value condition as the finally optimized coordinate recognition model. Therefore, a user only needs to provide a simply designed baseline model, a batch of training and verification data and an evaluation index, and a target model with performance meeting the requirements can be automatically searched. Compared with a manually designed network structure, the automatic network structure search can effectively shorten the development time and has important use value.

Based on the foregoing embodiment, referring to fig. 4, another flowchart of a method for training a coordinate recognition model in the embodiment of the present application is shown, which specifically includes:

step 400: and acquiring target operating conditions input by a user.

Wherein the target operating conditions include at least a target speed condition and a target accuracy condition.

Step 410: and searching the baseline model meeting the target operation condition from the candidate baseline models contained in the preset model database.

Step 402: and acquiring the value range of each hyper-parameter of the baseline model, and combining the hyper-parameters under different values according to the value range of each hyper-parameter to generate a plurality of value combinations.

Step 403: and respectively setting each hyper-parameter of the baseline model to each value in any one numerical combination aiming at each numerical combination, and obtaining a candidate coordinate identification model under the numerical combination.

Step 404: and acquiring decision information.

Step 405: and determining the candidate coordinate recognition models which are finally trained from the candidate coordinate recognition models by adopting the decision information.

In the embodiment of the present application, when determining candidate coordinate recognition models to be finally trained from the coordinate recognition models, the candidate coordinate recognition models may be implemented by neural network search, which is a technique for automatically designing a network structure for a specific task target and is generally used to replace manually designed network parameters. A complete neural network search algorithm generally includes three main modules, a search space, a search strategy and a performance evaluation strategy, wherein the search space defines network parameters that can be used for searching and a selectable range thereof, each selectable parameter in combination constitutes a complete network configuration, the search strategy defines a strategy for finding an optimal model in the search space, and the performance evaluation strategy is a performance index for evaluating a network configuration.

Step 406: respectively inputting any image sample into any candidate coordinate identification model aiming at each image sample in the obtained image sample set, identifying and obtaining each human body key point contained in the human body in the image sample and the predicted two-dimensional coordinate corresponding to each human body key point, and respectively calculating the Euclidean distance value between each predicted two-dimensional coordinate and the corresponding real two-dimensional coordinate.

Step 407: and determining the error value of the candidate coordinate recognition model according to the calculated Euclidean distance values, the area of the image sample and a preset recognition difficulty coefficient.

Step 408: and taking the candidate coordinate identification model corresponding to the minimum error value as the finally optimized coordinate identification model.

Step 409: and acquiring an image to be identified.

Wherein, the image to be identified comprises a human body.

Step 410: and based on the optimized coordinate recognition model, recognizing each human body key point contained in the image to be recognized by taking the image to be recognized as an input parameter, and acquiring the two-dimensional coordinates of each human body key point.

Step 411: and identifying the human body action category of the human body contained in the image to be identified according to the two-dimensional coordinates of each human body key point.

In the related art, a human body posture estimation network design process usually takes an image to be recognized as an input parameter, a full convolution neural network is used for extracting key point features and obtaining two-dimensional coordinates of each key point, the full convolution neural network in the related art usually selects a model structure from an existing mainstream open source framework and combines a new data set for training and testing, but because the existing model and the data set are not completely matched, the situations that the model precision cannot reach the expectation or the speed does not meet the requirement usually occur. And the basic model is adjusted by combining with some model optimization strategies, and the model meeting the target requirement can be obtained through multiple rounds of iterative experiments. Therefore, in order to solve the above problem, the embodiment of the present application provides a method, which does not need to manually optimize the coordinate recognition model, and can also optimize the baseline model, thereby further improving the accuracy and speed of the model.

Based on the same inventive concept, the embodiment of the present application provides a training device of a coordinate recognition model, and the training device of the coordinate recognition model may be a hardware structure, a software module, or a hardware structure plus a software module. Based on the above embodiment, referring to fig. 5, a schematic structural diagram of a training device for a coordinate recognition model in the embodiment of the present application is shown, which specifically includes:

a first obtaining module 500, configured to obtain value ranges of the hyper-parameters of the baseline model, and combine the hyper-parameters under different values according to the value ranges of the hyper-parameters to generate a plurality of value combinations, where the value combinations include the hyper-parameters and values of the hyper-parameters;

a combination module 501, configured to set each hyper-parameter of the baseline model to each value in any one numerical combination, respectively, to obtain a candidate coordinate identification model under the numerical combination;

a training module 502, configured to input an image sample set into any one candidate coordinate recognition model for training, and calculate an error value of the candidate coordinate recognition model, respectively for each candidate coordinate recognition model, where the image sample set includes each image sample and a corresponding sample label, and the sample label represents a real two-dimensional coordinate of each human body key point included in the image sample;

and a selecting module 503, configured to use the candidate coordinate recognition model meeting the preset error value condition as the finally optimized coordinate recognition model.

a second obtaining module 504, configured to obtain decision information, where the decision information is a search strategy of random sampling, a search strategy based on reinforcement learning, or a search strategy based on an evolutionary algorithm, and the search strategy information;

a determining module 505, configured to determine, from the candidate coordinate recognition models, candidate coordinate recognition models that are finally trained, by using the decision information.

a third obtaining module 506, configured to obtain a target operation condition input by a user, where the target operation condition at least includes a target speed condition and/or a target accuracy condition;

the searching module 507 is configured to search, from the candidate baseline models included in the preset model database, a baseline model that meets the target operating condition.

Optionally, when the image sample set is input into any candidate coordinate recognition model for training, and an error value of the candidate coordinate recognition model is calculated, the training module 502 is specifically configured to:

Optionally, the selecting module 503 is specifically configured to:

a fourth obtaining module 508, configured to obtain an image to be recognized, where the image to be recognized includes a human body;

a first identification module 509, configured to identify, based on the optimized coordinate identification model, each human body key point included in the image to be identified by using the image to be identified as an input parameter, and obtain a two-dimensional coordinate of each human body key point;

and a second identifying module 510, configured to identify a human body motion category of a human body included in the image to be identified according to the two-dimensional coordinates of the key points of the human body.

Based on the above embodiments, referring to fig. 6, a schematic structural diagram of an electronic device in an embodiment of the present application is shown.

An embodiment of the present application provides an electronic device, which may include a processor 610 (CPU), a memory 620, an input device 630, an output device 640, and the like, wherein the input device 630 may include a keyboard, a mouse, a touch screen, and the like, and the output device 640 may include a Display device, such as a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), and the like.

Memory 620 may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides processor 610 with program instructions and data stored in memory 620. In the embodiment of the present application, the memory 620 may be used to store a program of any one of the training methods of the coordinate recognition model in the embodiment of the present application.

The processor 610 is configured to execute the training method of any coordinate recognition model in the embodiment of the present application according to the obtained program instructions by calling the program instructions stored in the memory 620.

Based on the above embodiments, in the embodiments of the present application, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program, when executed by a processor, implements the training method of the coordinate recognition model in any of the above method embodiments.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for training a coordinate recognition model, comprising:

2. The method of claim 1, wherein for each value combination, setting each hyper-parameter of the baseline model to each value in any one value combination, and after obtaining the candidate coordinate recognition model under the value combination, further comprises:

3. The method of claim 1, wherein prior to obtaining the range of values for each hyper-parameter of the baseline model, further comprising:

4. The method as claimed in claim 3, wherein inputting the image sample set into any candidate coordinate recognition model for training, and calculating an error value of the candidate coordinate recognition model specifically comprises:

5. The method according to claim 4, wherein the step of using the candidate coordinate recognition model satisfying the preset error value condition as the finally optimized coordinate recognition model specifically comprises:

6. The method according to any of claims 1-5, wherein the hyper-parameters comprise at least one or any combination of: the number of convolution channels, the number of convolution layers, and the type of convolution.

7. The method of claim 5, wherein after the candidate coordinate recognition model corresponding to the minimum error value is used as the final optimized coordinate recognition model, the method further comprises:

8. An apparatus for training a coordinate recognition model, comprising:

9. The apparatus of claim 8, wherein the setting of the hyper-parameters of the baseline model to values in any one of the value combinations respectively for each value combination, and after obtaining the candidate coordinate recognition model under the value combination, further comprises:

10. The apparatus of claim 8, wherein prior to obtaining the range of values for each hyper-parameter of the baseline model, further comprising:

11. The apparatus of claim 10, wherein when the set of image samples is input to any one of the candidate coordinate recognition models for training, and an error value of the candidate coordinate recognition model is calculated, the training module is specifically configured to:

12. The apparatus of claim 11, wherein the selection module is specifically configured to:

13. The apparatus according to any of claims 8-12, wherein the hyper-parameters comprise at least one or any combination of: the number of convolution channels, the number of convolution layers, and the type of convolution.

14. The apparatus as claimed in claim 12, wherein after the candidate coordinate recognition model corresponding to the minimum error value is used as the final optimized coordinate recognition model, further comprising:

15. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of claims 1-7 are implemented when the program is executed by the processor.

16. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 7.