CN109117753B

CN109117753B - Part recognition method, device, terminal and storage medium

Info

Publication number: CN109117753B
Application number: CN201810820840.0A
Authority: CN
Inventors: 曾梓华
Original assignee: Guangzhou Huya Information Technology Co Ltd
Current assignee: Guangzhou Huya Information Technology Co Ltd
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2021-04-20
Anticipated expiration: 2038-07-24
Also published as: CN112911393B; CN109117753A; CN112911393A

Abstract

The embodiment of the invention discloses a part identification method, a part identification device, a terminal and a storage medium. The part identification method comprises the following steps: acquiring a posture image on which at least one part of a user is displayed; inputting the attitude image into a part recognition model; acquiring position information of at least one part of the user output by the part recognition model; the part identification model comprises a feature extraction submodel and a part detection submodel, wherein the feature extraction submodel comprises: a plurality of groups of convolution modules and down-sampling modules which are connected in sequence; the part detection submodel includes: the detection module and the convolution module are connected in sequence; the detection module comprises: the device comprises a down-sampling module, a plurality of groups of convolution modules and an up-sampling module which are connected in sequence. The part identification method provided by the embodiment of the invention has the advantages of small operand and simple structure, meets the real-time requirement of live broadcast, and can be operated at a terminal.

Description

Part recognition method, device, terminal and storage medium

Technical Field

The embodiment of the invention relates to a computer vision technology, in particular to a part identification method, a device, a terminal and a storage medium.

Background

With the development of live broadcast technology, more and more users acquire information through live broadcast video, participate in activities, such as posture correction or action guidance through live broadcast video.

A more important step in the process of performing posture correction or motion guidance is to recognize the part of the user. Each part of the user includes organs such as limbs, joints, eyes, ears, mouths and noses. With the development of computer vision technology, the position, size and other information of each part in the image can be identified through an image identification algorithm, and posture correction or action guide suggestions are provided for a user.

However, the existing image recognition algorithm has a complex structure and a large calculation amount, requires a large storage space and a long calculation time, and is difficult to operate at a terminal and meet the real-time requirement of live broadcasting.

Disclosure of Invention

The embodiment of the invention provides a part identification method, a part identification device, a terminal and a storage medium, which are used for realizing the rapid identification of user parts, meeting the real-time requirement of live broadcast and being capable of running on the terminal.

In a first aspect, an embodiment of the present invention provides a method for identifying a part, including:

acquiring a posture image on which at least one part of a user is displayed;

inputting the attitude image into a part recognition model;

acquiring position information of at least one part of the user output by the part recognition model;

the part identification model comprises a feature extraction submodel and a part detection submodel, wherein the feature extraction submodel comprises: the system comprises a plurality of groups of convolution modules and down-sampling modules which are connected in sequence and used for inputting attitude images and outputting characteristics in the attitude images;

the part detection submodel includes: the detection modules and the convolution modules are connected in sequence and used for inputting the features in the attitude image and outputting the position information of the corresponding part;

the detection module comprises: the device comprises a down-sampling module, a plurality of groups of convolution modules and an up-sampling module which are connected in sequence.

In a second aspect, an embodiment of the present invention further provides a device for identifying a part, where the device includes:

the first acquisition module is used for acquiring a posture image of at least one part on which a user is displayed;

the input module is used for inputting the attitude image into a part recognition model;

the second acquisition module is used for acquiring the position information of at least one part of the user output by the part identification model;

In a third aspect, an embodiment of the present invention further provides a terminal, including:

one or more processors;

a memory for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors implement the part recognition method according to any of the embodiments.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the part identification method according to any embodiment.

The embodiment of the invention provides a part identification model, and a part identification algorithm is realized based on the model. The part identification model comprises a feature extraction sub-model for feature extraction and a part detection sub-model for part identification, the feature extraction sub-model comprises a plurality of groups of convolution modules and down-sampling modules which are sequentially connected, and the sampling modules are used for scaling the extracted features, so that the feature dimension is reduced, and the data redundancy is reduced; the part detection submodel includes: the detection modules and the convolution modules are connected in sequence, and the identification result is corrected step by step through the detection modules connected in sequence, so that a better identification effect is achieved; the detection module includes: the system comprises a plurality of connected down-sampling modules, a plurality of groups of convolution modules and up-sampling modules which are sequentially connected, wherein the up-sampling module converts a received low-resolution image into a high-resolution image so as to acquire position information with enough resolution, and therefore, the position recognition model is simple in structure, few in parameters, capable of being stored in a terminal and high in recognition speed, and further based on the position recognition model, a position recognition algorithm can be operated in the terminal, and the real-time requirement of live broadcasting can be met.

Drawings

Fig. 1a is a flowchart of a method for identifying a part according to an embodiment of the present invention;

FIG. 1b is a schematic structural diagram of a part identification model according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a part identification model according to a second embodiment of the present invention;

fig. 3a is a flowchart of a method for identifying a part according to a third embodiment of the present invention;

FIG. 3b is a thermodynamic diagram provided by a third embodiment of the present invention;

FIG. 3c is a schematic diagram of the posture information of the user provided by the third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a part identification apparatus according to a sixth embodiment of the present invention;

fig. 5 is a schematic structural diagram of a terminal according to a seventh embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1a is a flowchart of a method for recognizing a part according to an embodiment of the present invention, where the embodiment is applicable to a situation where a terminal acquires a gesture image of a user and recognizes a part of the user according to the gesture image, and the method may be executed by a part recognition device, where the device may be formed by hardware and/or software and integrated in the terminal, and specifically includes the following steps:

s110, acquiring a posture image of at least one part of the user.

Alternatively, the pose image may be acquired in real time or may be pre-stored.

The manner of real-time acquisition is described in detail below: the terminal is provided with a camera facing to one side of the user, and the user is shot through the camera so as to acquire the posture image of the user. Alternatively, the posture image may be acquired at a preset time, acquired after a photographing instruction, or periodically acquired. The pose image may be one, two or more.

The user parts displayed on the gesture image include, but are not limited to, head, neck, left/right shoulders, left/right elbows, left/right wrists, left/right hips, left/right knees, left/right ankles. In some embodiments, the user locations may be referred to as key points of the user.

And S120, inputting the posture image into the part recognition model.

And S130, acquiring the position information of at least one part of the user output by the part identification model.

The input of the part recognition model is a pose image, and the output is position information of at least one part in the pose image. Based on this, the posture image is input to the part recognition model, and the position information of at least one part output by the part recognition model is acquired. Alternatively, the position information of the at least one location may be position coordinates or a range of areas, such as a thermodynamic diagram.

Fig. 1b is a schematic structural diagram of a part identification model according to an embodiment of the present invention. As shown in fig. 1b, includes: a feature extraction submodel 11 and a part detection submodel 12, wherein the feature extraction submodel 11 comprises: and the plurality of groups of convolution modules 13 and down-sampling modules 14 which are connected in sequence are used for inputting the attitude image and outputting the characteristics in the attitude image.

The convolution module 13 is used for feature extraction of the input content. The convolution module 13 comprises at least one convolution layer for extracting features. The down-sampling module 14 is configured to down-sample the features output by the convolution module 13, and optionally 1/2 down-sample the features output by the convolution module 13 by using a max boosting method. The purpose of downsampling is to scale the received features, thereby reducing feature dimensions and reducing data redundancy.

The site detection submodel 12 includes: a plurality of detection modules 15 (shown in the dashed box in fig. 1 b) and convolution modules 13 connected in sequence, for inputting the features in the pose image and outputting the position information of the corresponding part.

Wherein, detection module 15 includes: the system comprises a connected down-sampling module 14, a plurality of groups of convolution modules 13 and up-sampling modules 16 which are connected in sequence. The recognition result is corrected step by a plurality of connected detection modules 15, thereby achieving a better recognition effect. Fig. 1b shows 2 detection modules 15 connected in sequence, but the detection sub-model 12 may also include a plurality of detection modules 15 connected in sequence, and experiments prove that the recognition effect achieved when 5 detection modules 15 are used is ideal.

The up-sampling module 16 is configured to up-sample the features output by the convolution module 13, and optionally 1/2 up-sample the features output by the convolution module 13 by using a nearest neighbor difference or bilinear difference algorithm. The purpose of up-sampling is to convert a received low resolution image into a high resolution image in order to acquire an image with sufficient resolution.

The embodiment of the invention provides a part identification model, and a part identification algorithm is realized based on the model. The part identification model comprises a feature extraction sub-model for feature extraction and a part detection sub-model for part identification, the feature extraction sub-model comprises a plurality of groups of convolution modules and down-sampling modules which are sequentially connected, and the sampling modules are used for scaling the extracted features, so that the feature dimension is reduced, and the data redundancy is reduced; the part detection submodel includes: the detection modules and the convolution modules are connected in sequence, and the identification result is corrected step by step through the detection modules connected in sequence, so that a better identification effect is achieved; the detection module includes: the lower sampling module, the plurality of groups of convolution modules and the up sampling module are connected in sequence, the up sampling is performed on the received low-resolution images, the received low-resolution images are converted into high-resolution images, so that the position information with enough resolution can be acquired, therefore, the position recognition model is simple in structure, correspondingly, less in parameters, capable of being stored in the terminal and high in recognition speed, and further the position recognition algorithm based on the position recognition model can be operated at the terminal, and the real-time requirement of live broadcasting can be met.

Example two

In this embodiment, the part recognition model in the above embodiment is further optimized, and specifically, a skip layer is added to the part recognition model. Fig. 2 is a schematic structural diagram of a part identification model according to a second embodiment of the present invention, and based on fig. 1b, the part identification model further includes a skip layer 17.

Through a large number of experiments, if the part recognition model comprises a feature extraction submodel and a part detection submodel, the recognition result is not ideal enough, for example, the situation of part recognition error occurs. Based on this, a skip layer 17 is added between the feature extraction submodel and the part detection submodel. The skip layer 17 is used for splicing the features output by at least one down-sampling module 14 in the feature extraction submodel 11 and then splicing the features output by the corresponding up-sampling module 16 in the part detection submodel 12.

The jump layer 17 splices the features output by different down-sampling modules 14 in the feature extraction submodel 11; then, the above-mentioned spliced features are spliced with the features of the same resolution output by the up-sampling module 16 in the region detection sub-model 12. It is worth noting that the resolution of the features that are stitched to each other should be the same. If the resolutions are different, the resolutions may be adjusted to be the same using either the up-sampling module 16 or the down-sampling module 14. In fig. 2, 3 modules for adjusting resolution are shown, and the fill color of the modules for adjusting resolution is black.

In this embodiment, by adding the skip layer 17, the part detection sub-model 12 learns multi-scale and global features, so that the part detection sub-model 22 can acquire sufficient global information, and the part recognition model 22 can accurately recognize the position information of the corresponding part from a complex attitude image.

In some embodiments, each convolution module 13 includes M bottleeck 18, each bottleeck 18 including connected dilation, convolution, and compression layers. This is because the features extracted by the convolutional layer are limited by the number of channels input, and if the channels are "compressed" before the features are extracted, fewer features can be extracted. Thus, the channels are expanded, features are extracted by convolution, and then compressed to extract a sufficient number of features. Like the bottleck in the MobileNet-V2 network structure, in this embodiment, the expanded layer includes 1 × 1 convolution kernel, the convolutional layer includes 3 × 3 convolution kernel, and the compressed layer includes 1 × 1 convolution kernel. Optionally, a large number of experiments prove that the identification accuracy is higher when M is more than or equal to 3 and less than or equal to 5.

The number of output channels in the Bottleneck18 is N, N is more than or equal to 16 and less than or equal to 96, and is less than that of output channels in a MobileNet-V2 network structure, so that network parameters are reduced, and the calculation efficiency is improved; meanwhile, the space required by a storage network is saved, so that the part recognition model can run at the terminal.

The liter channel multiple, convolution size, output channel number, and repetition number (i.e., the number of bottleeck) of bottleeck 18 are labeled above the corresponding convolution module as shown in fig. 2. Of course, the parameters of the convolution module 13 in fig. 2 are only examples, and those skilled in the art can adjust the parameters of the convolution module to obtain a better recognition effect.

It should be noted that, in general, the more convolution modules in the feature extraction submodel 11 and the part detection submodel 12, the more bottleeck contained in each convolution module, the deeper the extracted features, and the better the recognition effect. But results in poor performance when the network reaches a certain depth. Those skilled in the art can adjust the number of detection modules, the number of convolution modules, and the number of bottleecks included in each convolution module based on the network structure shown in fig. 2 to achieve a better recognition effect.

In the embodiment, at least one bottleeck forms a convolution module for feature extraction, so that multiple channels and sufficient number of features can be extracted; the number of output channels in the convolution module is less than that of output channels in a MobileNet-V2 network structure, so that network parameters are reduced, and the calculation efficiency is improved; meanwhile, the space required by a storage network is saved, so that image recognition can be carried out on the terminal; the feature output by the convolution module is subjected to down sampling, namely the feature is scaled, so that the feature dimension is reduced, and the data redundancy is reduced; the part detection submodel learns multi-scale and global characteristics through layer skipping, so that the part detection submodel can acquire enough global information, and the part recognition model can accurately recognize the position information of the corresponding part from a complex attitude image.

In some embodiments, to increase the rate of operation of the site detection model at the terminal, the bottleeck is replaced with a separable convolution module.

The specific demonstration process is as follows:

assume that the input size of the convolution is (D)_F,D_FM), using a convolution kernel K of (D)_K,D_KM, N), step size is 1, and size of output feature graph G is (D)_G,D_GN). Then the amount of computation for the conventional convolution is: d_K·D_K·M·N·D_F·D_FWhere M is the number of input channels and N is the number of output channels.

The separable convolution is to divide the conventional convolution into a depth convolution and a point-by-point convolution, and the corresponding calculation amount is: d_K·D_K·M·D_F·D_F+M·N·D_F·D_FThe ratio of the two convolution calculations is as follows:

as can be seen from the ratio, under the same hyper-parameter, the calculation amount of the separable convolution is the calculation amount of the traditional convolution

And (4) doubling. Furthermore, the precision is not very different, and under the same model and input data, the probability of correlation keypoint (PCKh) of the traditional convolution is 98, while PCKh under separable convolution is 97.2)

EXAMPLE III

Fig. 3a is a flowchart of a part identification method according to a third embodiment of the present invention, including the following steps:

s310, acquiring a posture image of at least one part of the user.

And S320, inputting the posture image into the part recognition model.

And S330, acquiring the thermodynamic diagram of the corresponding part of the user output by the part recognition model.

In this embodiment, the posture image is input to the part recognition model, the part recognition model outputs thermodynamic diagrams corresponding to the respective parts, and one thermodynamic diagram shows one part. Of course, a thermodynamic diagram may also be two or more locations. The thermodynamic diagram in this embodiment is a diagram showing the region where the corresponding portion is located in a particularly highlighted form.

As shown in fig. 3b, it is assumed that the part recognition model outputs 3 thermodynamic diagrams in total, the first thermodynamic diagram corresponding to the left wrist, the second thermodynamic diagram corresponding to the right wrist, and the third thermodynamic diagram corresponding to the neck.

And S340, respectively determining the position coordinates of each part from the thermodynamic diagrams corresponding to each part.

Optionally, a point (x, y) is selected from the highlighted area on the thermodynamic diagram of each part as the position coordinates of the corresponding part. Preferably, the center point or the brightest point is selected as the position coordinates of the corresponding part from the highlighted area on the thermodynamic diagram of each part. Specifically, the thermodynamic diagrams of the respective portions are filtered (for example, gaussian filtering, wiener filtering, or mean filtering), and the coordinates of the point with the lowest gradation or the highest brightness are selected as the position coordinates of the corresponding portion in the filtered image.

And S350, calculating the posture information of the user according to the position coordinates of at least one part.

When the user's body parts are in different positions, the user exhibits different gestures, for example, the left wrist is above the right wrist, and the user exhibits different gestures than the right wrist is above the left wrist. In this embodiment, the posture information of the user includes information such as an angle and a position of a limb formed by at least one part, and/or a relative angle and a relative distance between the limb and the limb. Of course, the requirements for the posture are different according to the different types of the user's motions, and any posture information that can be obtained from the position information of the part is within the scope of the present application.

Alternatively, the position information of the at least one part may be a thermodynamic diagram of the corresponding part or position coordinates of the respective parts determined from the thermodynamic diagram.

It is worth mentioning that if the position information is position coordinates, the posture information of the user is calculated directly from the position information of the at least one part. Specifically, according to the position information of at least one part, the bending angle between the connected limbs of the user is calculated; and/or calculating the azimuth angle of each limb of the user relative to the preset direction according to the position information of at least one part. If the position information is thermodynamic diagram, S340 is first executed to determine the corresponding position coordinates according to the position information of the at least one part, and then the posture information of the user is calculated according to the position coordinates of the at least one part. Specifically, according to the position coordinates of at least one part, the bending angle between the connected limbs of the user is calculated; and/or calculating the orientation angle of each limb of the user relative to the preset direction according to the position coordinates of at least one part.

The connected limbs are, for example, upper and lower arms, head and neck, thigh and calf, etc. The bending angle between the connected limbs is the angle between the two connected limbs. The preset direction may be a horizontal direction and/or a vertical direction.

Fig. 3c is a schematic diagram of posture information of a user according to a third embodiment of the present invention, and fig. 3c shows a part including a left shoulder, a left elbow, and a left wrist. Vector for upper arm

It is shown that,

left shoulder coordinate-left elbow coordinate, vector for lower arm

It is shown that,

left wrist coordinate-left elbow coordinate. Bending angle between upper and lower arms

The method for calculating the azimuth angle of the upper arm comprises the following steps: assuming that the predetermined direction is a vertical direction, the corresponding vector

Computing vectors

To vector

Angle h of (d).

And S360, prompting posture correction information to the user according to the posture information of the user and the reference posture information.

Reference posture information corresponding to the posture information of the user is stored in advance, and for example, if the posture information of the user is tree-type, the reference posture information is the reference tree-type posture information.

Posture correction information is obtained by comparing the posture information of the user with the reference posture information, and the posture correction information is displayed on a terminal display screen and/or is subjected to voice prompt, so that the posture is corrected automatically and timely.

Illustratively, the reference attitude information includes: a reference bending angle between the connected limbs, and/or a reference orientation angle of each limb relative to a predetermined direction.

In this embodiment, the posture correction information is presented to the user according to the posture information of the user and the reference posture information, which includes the following two implementation manners.

The first embodiment: first, a first score for the connected limbs is calculated based on the bend angles between the connected limbs and the corresponding reference bend angle ranges. Wherein, the reference angle range is defined by 3 thresholds: std, min, max. std denotes the median value, min denotes the minimum value, and max denotes the maximum value. Optionally, the first score of the connected limb is calculated according to the following piecewise function. Of course, the first score may be directly obtained by the difference between the bending angle and std.

Where x represents the bend angle between the connected limbs.

Then, a second score of each limb is calculated based on the azimuth angle of each limb with respect to the preset direction and the corresponding reference azimuth angle range.

Alternatively, as in the calculation method of the first score, the reference azimuth angle range is also calculated by 3 thresholds: std, min, max. And calculating a second score of each limb according to the segmentation function. At this time, x in the piecewise function represents the azimuth angle of each limb.

Then, carrying out weighted summation on the first score and the second score to obtain a comprehensive score of the user; and prompting the posture correction information corresponding to the comprehensive score to the user.

By the formula

A composite score for the user is calculated. Where n is the sum of the number of azimuth angles of each limb and the number of flexion angles between the connected limbs, x_iIs the azimuth angle of the ith limb or the azimuth angle of the connected limb, f (x)_i) Is the score for the ith limb or connected limb, e.g. f (x)₁) Score, f (x), representing the azimuth angle of the upper arm₂) A score representing the bending angle between the upper and lower arms. w is a_iIs corresponding to f (x)_i) The weight of (a) is determined,

the weight of the first score for each connected limb and the weight of the second score for each limb may be the same or different. The score for the limb with the highest requirement is given a first weight and the score for the limb with the lowest requirement is given a second weight, taking into account the different requirements of the different poses for the different limbs. The first weight is greater than the second weight.

The different composite scores correspond to different posture correction information, and the posture correction information is correction of the body dimension of the user. For example, if the composite score of the user is greater than the first segment value, e.g., 80 points, the user is prompted to "better flexibility, stretch of limbs, and overall reach a higher level"; if the comprehensive score of the user is greater than the second segmentation value, for example, 60 points, and less than the first segmentation value, the method prompts that the flexibility of the user is normal, the limbs are stretched, and the whole standard level is reached; if the comprehensive score of the user is smaller than the second segmentation value, the user is prompted to have the general flexibility, the limbs of the user are not enough to stretch, and the general level of the user is reached.

The second embodiment: calculating a first difference value between the bending angle between the connected limbs and the corresponding reference bending angle, and prompting posture correction information corresponding to the first difference value to a user; and/or calculating a second difference value between the azimuth angle of each limb relative to the preset direction and the corresponding reference azimuth angle, and prompting posture correction information corresponding to the second difference value to the user.

The posture correction information in the present embodiment is correction of the user limb dimension. For example, a first difference between the flexion angle between the connected limbs minus the corresponding reference flexion angle is greater than 0, prompting the user to tighten the corresponding limb; the first difference is less than 0, prompting the user to expand the corresponding limb. For another example, if the second difference between the orientation angle of the limb relative to the preset direction and the corresponding reference orientation angle is greater than 0, prompting the user to move the limb in the direction of reducing the second difference; the second difference is less than 0, prompting the user to move the limb in a direction that increases the second difference.

In the embodiment, the gesture information of the user is calculated by acquiring the gesture image of the user and recognizing the thermodynamic diagram of the part from the gesture image by adopting the part recognition model, so that the gesture information of the user is acquired through the part recognition model; through posture information and benchmark posture information according to the user, the posture correction information is prompted to the user, so that the benchmark posture can be compared, the irregular posture of the user can be automatically corrected, the automatic correction of the posture of the user is realized, the user is not required to compare and correct by itself, the anchor real-time online correction is also not required, the labor cost and the operation cost are saved, and the problem of correction error or untimely correction is avoided.

Furthermore, the posture of the user is obtained by calculating the bending angle between the connected limbs and/or the azimuth angle of the limbs relative to the preset direction, and the posture is consistent with the requirement on the posture in real life, so that the posture correction is more reasonable and effective. And then, calculating the score of the user by combining the reference bending angle range and/or the reference azimuth angle range, scoring the posture from the dimensions of the bending angle and the azimuth angle, accurately evaluating the accuracy of the posture of the user, and achieving the effect of effectively correcting the posture.

Example four

The embodiment of the invention provides an application scene, which describes the process of recognizing the part and correcting the posture of a user through a terminal in detail.

In an application scenario of the embodiment, the user turns on the posture correction function of the terminal and selects one posture, for example, a pigeon posture. The terminal starts the camera and collects the area image in the camera collecting area. The camera acquisition area can be an area in front of the camera, and no blocking object or obstacle is required in the area, so that the camera acquisition area is suitable for the movement of a user. And identifying the user part in the preset area of the area image by adopting a part identification model. The preset region may be a central region of the region image, and the waist and the chest are preferentially identified. When the user is identified within a preset area, for example, the waist and chest of the user, it is determined that the user is located at a suitable position and the activity can be started. On the contrary, if the user is not identified in the preset area, the moving direction is prompted to the user according to the identification result in the whole area image, so that the user is located in the preset area.

And when the user is identified in the preset area of the area image, playing a teaching video corresponding to the reference posture information through the display screen. The reference attitude information refers to reference attitude information corresponding to the attitude selected by the user. The reference attitude information is stored in a configuration file of the terminal.

After the teaching video is played or in the playing process, the terminal prompts a user to start intelligent correction, and the camera collects the posture image of the user in real time, projects the posture image of the user onto a display screen of the terminal and displays the posture image of the user and the teaching video in a regional mode.

Then, the part recognition device acquires a posture image on which at least one part of the user is displayed, and inputs the posture image into the part recognition model; acquiring position information of at least one part of the user output by the part recognition model; calculating the posture information of the user according to the position information of at least one part; based on the posture information of the user and the reference posture information, posture correction information such as "arm re-height point", "good hold", and the like is presented to the user.

After the posture correction is finished, the terminal displays the posture correction information of the previous time and the corresponding comprehensive score for the user to compare and check so as to obtain the correction effect.

EXAMPLE five

The embodiment further optimizes the above embodiment, and specifically defines the training process of the part recognition model.

Specifically, before acquiring the posture image on which at least one part of the user is displayed, a training process of the part recognition model is further included. The training process of the part recognition model comprises the following steps: the method comprises the steps of obtaining a part recognition model to be trained, obtaining a sample image displaying at least one part of a user and position label information corresponding to the at least one part, and optionally, the position label information corresponding to the at least one part refers to position coordinates or an area range corresponding to the part. Wherein the area range can be represented by a coordinate range, a geometric mark or a thermodynamic diagram.

Then, a part recognition model is trained according to the sample image and the position label information corresponding to at least one part. Specifically, a sample image is input to a part recognition model to be trained, and parameters in the part recognition model are iterated, so that the model outputs position label information approximate to the corresponding position of at least one part in the input sample image.

Further, training a part recognition model according to the position label information corresponding to the sample image and the at least one part, comprising: generating a thermodynamic diagram corresponding to each part according to the position label information corresponding to at least one part; and training the part recognition model according to the sample image and the thermodynamic diagrams corresponding to all parts.

Illustratively, 1 ten thousand sample images are taken in a hundred-degree crowd-sourced form, and then each part is marked on the sample images. The marked sample images are processed by a gaussian fuzzy algorithm to obtain thermodynamic diagrams corresponding to each part, such as a thermodynamic diagram corresponding to the head, a thermodynamic diagram corresponding to the left shoulder, a thermodynamic diagram corresponding to the right shoulder and the like. Inputting the unmarked sample image into a part recognition model to be trained, and iterating parameters in the part recognition model to enable the model output to approach a thermodynamic diagram corresponding to at least one part in the input sample image.

EXAMPLE six

Fig. 4 is a schematic structural diagram of a position identification apparatus according to a sixth embodiment of the present invention, including a first obtaining module 41, an input module 42, and a second obtaining module 43.

A first obtaining module 41, configured to obtain a posture image in which at least one part of a user is displayed;

an input module 42, configured to input the pose image into a part recognition model;

a second obtaining module 43, configured to obtain position information of at least one part of the user output by the part recognition model;

Optionally, the site recognition model further comprises a skip layer; and after splicing the characteristics output by at least one down-sampling module in the characteristic extraction submodel by the jump layer, splicing the characteristics output by the corresponding up-sampling module in the position detection submodel.

Optionally, the convolution module includes M bottleecks, each of which includes an expansion layer, a convolution layer, and a compression layer connected to each other, and the number of output channels is N, where M, N is a natural number, M is greater than or equal to 3 and less than or equal to 5, and N is greater than or equal to 16 and less than or equal to 96. Each bottleeck includes a depth convolution and a point-by-point convolution.

Optionally, the second obtaining module 43, when obtaining the position information of at least one part of the user output by the part recognition model, is specifically configured to: and acquiring the thermodynamic diagram of the corresponding part of the user output by the part identification model.

Optionally, the apparatus further comprises a determining module configured to: after acquiring the thermodynamic diagrams of the user-corresponding parts output by the part recognition model, the position coordinates of each part are specified from the thermodynamic diagrams corresponding to each part.

Optionally, the device further includes a calculation module and a prompt module, where the calculation module is configured to calculate, after obtaining the position information of at least one part of the user output by the part recognition model, the posture information of the user according to the position information of the at least one part. And the prompting module is used for prompting the posture correction information to the user according to the posture information of the user and the reference posture information.

Optionally, when the calculation module calculates the posture information of the user according to the position information of the at least one part, the calculation module is specifically configured to: calculating the bending angle between the limbs connected by the user according to the position information of the at least one part; and/or calculating the azimuth angle of each limb of the user relative to the preset direction according to the position information of the at least one part.

Optionally, the reference posture information includes: a reference bending angle range between the connected limbs and a reference azimuth angle range of each limb relative to a preset direction; correspondingly, when the prompting module prompts the posture correction information to the user according to the posture information of the user and the reference posture information, the prompting module is specifically configured to: calculating a first score of the connected limbs according to the bending angle between the connected limbs and the corresponding reference bending angle range; calculating a second score of each limb according to the azimuth angle of each limb relative to the preset direction and the corresponding reference azimuth angle range; carrying out weighted summation on the first score and the second score to obtain a comprehensive score of the user; and prompting the posture correction information corresponding to the comprehensive score to the user.

Optionally, the apparatus further comprises a collection module and a playing module. The acquisition model is used for acquiring an area image in an acquisition area of the camera before acquiring a posture image showing at least one part of the user. And when the user is identified in the preset area of the area image, playing the teaching video corresponding to the reference posture information through the playing module.

Optionally, the apparatus further includes a training module, configured to obtain a part recognition model to be trained, obtain a sample image showing at least one part of a user and position label information corresponding to the at least one part, and train the part recognition model according to the sample image and the position label information corresponding to the at least one part.

The part recognition device provided by the embodiment of the invention can execute the part recognition method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

EXAMPLE seven

Fig. 5 is a schematic structural diagram of a terminal according to a seventh embodiment of the present invention, and as shown in fig. 5, the terminal includes a processor 50, a memory 51, an input device 52, and an output device 53; the number of the processors 50 in the terminal may be one or more, and one processor 50 is taken as an example in fig. 5; the processor 50, the memory 51, the input device 52 and the output device 53 in the terminal may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 5.

The memory 51 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the part recognition method in the embodiment of the present invention (for example, the first obtaining module 41, the input module 42, and the second obtaining module 43 in the part recognition apparatus). The processor 50 executes various functional applications and data processing of the terminal, that is, implements the above-described part recognition method, by executing software programs, instructions, and modules stored in the memory 51.

The memory 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 51 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 51 may further include memory located remotely from the processor 50, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 52 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function controls of the terminal, such as user selection of a gesture through the input device 52. The output device 53 may include a display device such as a display screen and an audio device such as a speaker.

Optionally, the terminal may further include a camera for acquiring a posture image on which at least one part of the user is displayed. The camera may be a front-facing camera on the terminal.

Example eight

An eighth embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program, which when executed by a computer processor is configured to perform a method for part identification, the method including:

acquiring a posture image on which at least one part of a user is displayed;

inputting the attitude image into a part recognition model;

Of course, the computer program provided by the embodiments of the present invention is not limited to the above method operations, and may also perform related operations in the part identification method provided by any embodiments of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods of the embodiments of the present invention.

It should be noted that, in the embodiment of the above-mentioned part identification apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments illustrated herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for identifying a location, comprising:

acquiring a posture image on which at least one part of a user is displayed;

inputting the attitude image into a part recognition model;

the detection module comprises: the system comprises a down-sampling module, a plurality of groups of convolution modules and an up-sampling module which are connected in sequence;

the part recognition model further comprises a skip layer;

and after splicing the characteristics output by at least one down-sampling module in the characteristic extraction submodel, the jump layer is spliced with the characteristics output by the corresponding up-sampling module in the position detection submodel.

2. The method of claim 1, wherein the convolution module comprises M bottleecks, each bottleeck comprises an expansion layer, a convolution layer and a compression layer which are connected, the number of output channels is N, wherein M, N is a natural number, M is greater than or equal to 3 and less than or equal to 5, and N is greater than or equal to 16 and less than or equal to 96.

3. The method of claim 2, wherein each bottleeck comprises a depth convolution and a point-by-point convolution.

4. The method of claim 1, wherein the obtaining of the location information of at least one location of the user output by the location recognition model comprises:

and acquiring the thermodynamic diagram of the corresponding part of the user output by the part identification model.

5. The method of claim 4, after obtaining the thermodynamic diagram of the corresponding part of the user output by the part recognition model, further comprising:

the position coordinates of each part are determined from the thermodynamic diagrams corresponding to each part.

6. The method according to any one of claims 1-5, further comprising, after said obtaining the location information of at least one location of the user output by the location recognition model:

calculating the posture information of the user according to the position information of the at least one part;

and prompting posture correction information to the user according to the posture information of the user and the reference posture information.

7. A part identification device, comprising:

the part recognition model further comprises a skip layer;

8. A terminal, comprising:

one or more processors;

a memory for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a method of site recognition as claimed in any one of claims 1-6.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for recognizing a part according to any one of claims 1 to 6.