WO2023071806A1 - Apriori space generation method and apparatus, and computer device, storage medium, computer program and computer program product - Google Patents

Apriori space generation method and apparatus, and computer device, storage medium, computer program and computer program product Download PDF

Info

Publication number
WO2023071806A1
WO2023071806A1 PCT/CN2022/124931 CN2022124931W WO2023071806A1 WO 2023071806 A1 WO2023071806 A1 WO 2023071806A1 CN 2022124931 W CN2022124931 W CN 2022124931W WO 2023071806 A1 WO2023071806 A1 WO 2023071806A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
motion
sample
target
motion data
Prior art date
Application number
PCT/CN2022/124931
Other languages
French (fr)
Chinese (zh)
Inventor
许嘉晨
汪旻
刘文韬
钱晨
马利庄
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023071806A1 publication Critical patent/WO2023071806A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the technical field of computer vision, and in particular to a method, device, computer equipment, storage medium, computer program and computer program product for generating a priori space.
  • Three-dimensional human motion requires not only that each posture corresponding to the motion is reasonable, but also that the transition between continuous postures is also reasonable, so as to ensure the rationality of the overall three-dimensional human motion.
  • using prior space to constrain the rationality of motion can make the 3D human motion reconstructed by neural network more reasonable.
  • Embodiments of the present disclosure at least provide a method, device, computer equipment, storage medium, computer program, and computer program product for generating a priori space.
  • the embodiment of the present disclosure provides a method for generating a priori space, including:
  • the three-dimensional motion data includes: posture data corresponding to at least two postures corresponding to the motion;
  • a target prior space is generated.
  • the 3D motion data corresponding to each type of motion is encoded to remove the global orientation to generate target motion data that can represent the posture characteristics of the motion, and the global orientation information can be removed from the data space, reducing the data space.
  • Complexity; the target prior space generated based on the target motion data can improve the rationality and accuracy of the generated target prior space; furthermore, using the target prior space to constrain the rationality and accuracy of the movement can reduce the neural network
  • the difficulty of modeling motion data for networks is encoded to remove the global orientation to generate target motion data that can represent the posture characteristics of the motion, and the global orientation information can be removed from the data space, reducing the data space.
  • an apparatus for generating a priori space including:
  • the acquisition module is configured to acquire three-dimensional motion data corresponding to at least two kinds of motions of the target object; the three-dimensional motion data includes: posture data corresponding to at least two postures corresponding to the motion; the encoding module is configured to correspond to each type of motion The three-dimensional motion data of the three-dimensional motion data is subjected to encoding processing to remove the global orientation to obtain the target motion data corresponding to each type of motion; the determination module is configured to generate a priori space of the target based on the target motion data corresponding to the at least two types of motion.
  • an embodiment of the present disclosure provides a computer device, including: a processor and a memory, the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the instructions stored in the memory Machine-readable instructions, when the machine-readable instructions are executed by the processor, when the machine-readable instructions are executed by the processor, the first aspect above, or any possible implementation manner in the first aspect is executed in the steps.
  • an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the above-mentioned first aspect, or any possibility in the first aspect may be executed. steps in the implementation of .
  • the embodiments of the present disclosure provide a computer program, including computer readable codes.
  • the processor in the device executes the above first aspect, or the first aspect.
  • an embodiment of the present disclosure provides a computer program product configured to store computer-readable instructions.
  • the computer executes the above-mentioned first aspect, or any one of the possible options in the first aspect. steps in the implementation.
  • FIG. 1 shows a flowchart of a method for generating a priori space provided by an embodiment of the present disclosure
  • Fig. 2 shows a schematic structural diagram of a network structure for determining second 3D motion data corresponding to a scale provided by an embodiment of the present disclosure
  • FIG. 3 shows a schematic structural diagram of a network structure for performing feature extraction on the first 3D motion data at multiple scales to obtain second 3D motion data corresponding to multiple scales according to an embodiment of the present disclosure
  • FIG. 4 shows a flow chart of a method for training an encoding neural network provided by an embodiment of the present disclosure
  • FIG. 5 shows a schematic diagram of obtaining target motion data by using an encoding neural network and a decoding neural network to process acquired original three-dimensional motion data provided by an embodiment of the present disclosure
  • FIG. 6 shows a schematic diagram of an apparatus for generating a priori space provided by an embodiment of the present disclosure
  • Fig. 7 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.
  • the environmental information in the 3D motion data determines the orientation of the motion. For the same motion, under different environmental information, the orientations of the postures corresponding to the motion are different, and the 3D motion data are also different. In addition to the orientation of each posture corresponding to the motion, each posture in the 3D motion data is the same, and the motion orientation caused by this environmental information leads to a high data space complexity, which increases the difficulty of modeling human motion data.
  • the present disclosure provides a method, device, computer equipment, storage medium, computer program, and computer program product for generating a priori space.
  • Generating target motion data that can represent the gesture characteristics of motion can remove the global orientation information from the data space, reducing the complexity of the data space, and the target prior space generated based on the target motion data can improve the generated target prior space.
  • the rationality and accuracy of the model, and then, using the target prior space to constrain the rationality and accuracy of the motion can reduce the difficulty of neural network modeling motion data.
  • Yaw Indicates that the direction from the bottom to the top of the posture corresponding to the first frame of posture data in the 3D motion data is the y-axis, the posture corresponding to the first frame of posture data is from the left to the right is the x-axis, and the posture corresponding to the first frame of posture data is from the front to the top.
  • DCT Discrete Cosine Transform, discrete cosine transform, transforms data from the spatial domain to the frequency domain, enabling data or image compression.
  • the execution subject of the method for generating a priori space provided by the embodiment of the present disclosure generally has certain computing power computer equipment, the computer equipment includes, for example: terminal equipment or server or other processing equipment, the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital assistant ( Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc.
  • the method for generating the prior space may be implemented by a processor invoking computer-readable instructions stored in a memory.
  • a flow chart of a method for generating a priori space may include the following steps:
  • S101 Acquire three-dimensional motion data respectively corresponding to at least two kinds of motions of a target object; the three-dimensional motion data includes: gesture data respectively corresponding to at least two postures corresponding to the motions.
  • the target object may include a moving object such as a target person or a target animal.
  • the three-dimensional motion data can be the posture data corresponding to each posture generated when the target object performs a certain movement.
  • the posture data can represent the posture of the target object, and can include posture and orientation information corresponding to the posture, wherein the orientation information corresponding to the posture is the orientation information affected by the global orientation of the target object.
  • the 3D motion data corresponding to the target object for example, it is possible to collect multiple frames of images when the target object is moving, and recover the 3D posture of the human body on the multiple frames of images to obtain a frame of posture data corresponding to each frame of image.
  • the corresponding posture data constitute a set of three-dimensional motion data of the target object.
  • the 3D motion data corresponding to each motion in the present disclosure includes a preset number of frames of pose data, for example, 128 frames of pose data.
  • the 3D motion data may be 128 frames*72 dimensional motion data.
  • the 72-dimensional data includes, for example, a 3-dimensional pose and 3-dimensional position information corresponding to 23 key points of the human body.
  • three-dimensional motion data corresponding to various motions of the target object may be acquired in the original prior space.
  • the original prior space may include three-dimensional motion data corresponding to at least one type of motion corresponding to the target object, and the three-dimensional motion data corresponding to each type of motion includes, for example, multiple groups.
  • the target object is different, and the corresponding target object's actions are also different; for example, the target object can be a "person", and the corresponding actions include: running, jumping, walking, raising legs, turning around, etc.
  • the posture data corresponding to each posture can be determined based on the various postures of the target object collected by the sensor when it is moving, and the determined posture data can be used as the three-dimensional motion data corresponding to the target object.
  • Each posture corresponding to each time is determined, and the posture data corresponding to each movement is determined.
  • the original prior space can also be obtained in the following manner:
  • the three-dimensional motion data is obtained based on the various postures of the target object collected by the sensor when it is moving;
  • the original prior space can be formed based on the three-dimensional motion data corresponding to the various motions of the target object.
  • the existing 3D motion data set can also be used to select the 3D motion data corresponding to various motions of the target object to obtain the original prior space; or, the original prior space can be pre-generated and stored in the A storage space is provided, and when the target prior space is generated, the original prior space is directly read from the preset storage space.
  • S102 Perform encoding processing for removing the global orientation on the three-dimensional motion data corresponding to each type of motion, to obtain target motion data corresponding to each type of motion.
  • the target motion data may be the motion data corresponding to each type of motion of the target object and does not include the global orientation.
  • the orientation corresponding to the pose of the first frame in each target motion data is a predetermined target direction.
  • the global orientation can represent the orientation of the motion corresponding to the three-dimensional motion data.
  • the present disclosure can further compress the 3D motion data into a smaller data space by encoding the 3D motion data, so that the distribution of the 3D motion data that is originally sparsely distributed in the data space in the compressed data space is more compact. Intensive, and thus better supervision of motion modeling, reducing the unreasonable and incoherent 3D motion data obtained by motion modeling.
  • the 3D motion data can be encoded to remove the global orientation, that is, the 3D motion data in the
  • the orientation corresponding to the first frame posture of is adjusted to the target direction, wherein, the target direction may be that the yaw corresponding to the first frame posture is 0 degrees.
  • target motion data corresponding to the data wherein, the relative angle and direction between the postures in the target motion data remain unchanged. Based on the above steps, target motion data corresponding to each type of motion can be determined.
  • the disclosure generates target motion data that can represent the posture characteristics of the motion by performing encoding processing for removing the global orientation on the three-dimensional motion data corresponding to each type of motion, and can remove the global orientation information from the data space, reducing the complexity of the data space , the target prior space generated based on the target motion data can improve the rationality and accuracy of the generated target prior space, and then use the target prior space to constrain the rationality and accuracy of the motion, which can reduce the neural network construction.
  • the difficulty of modulating motion data can represent the posture characteristics of the motion by performing encoding processing for removing the global orientation on the three-dimensional motion data corresponding to each type of motion, and can remove the global orientation information from the data space, reducing the complexity of the data space , the target prior space generated based on the target motion data can improve the rationality and accuracy of the generated target prior space, and then use the target prior space to constrain the rationality and accuracy of the motion, which can reduce the neural network construction.
  • the difficulty of modulating motion data can represent the posture characteristics of the motion by performing
  • the target prior space may be formed directly based on the target motion data corresponding to various sports as multiple sets of prior data in the target prior space.
  • the target motion data in the target prior space is further compressed in the data space compared with the data in the original prior space, which reduces the influence of the global orientation on the data, so the data space is simpler and more compact, which is more conducive to motion modeling.
  • the target prior space when the neural network models the motion data, the target prior space can be used to constrain the rationality of motion modeling to realize motion modeling.
  • the disclosure generates a target prior space based on target motion data, which can improve the rationality of the generated target prior space, and further, uses the target prior space to constrain the rationality of motion, which can reduce the difficulty of neural network modeling motion data, thereby Improve the rationality and accuracy of motion data reconstructed by neural networks.
  • the motion type of the target object may also be identified based on the target prior space.
  • the target prior space includes Among the posture data corresponding to each target motion data, determine the posture data that matches the posture data corresponding to the target object in the motion video, and use the motion type corresponding to the matched posture data as the motion type of the target object in the motion video.
  • the motion type of the target object can be identified as follows:
  • Step 1 Obtain a motion video of the target object when it is in motion.
  • a target acquisition device such as a camera, may be used to capture a motion video of the target object when it is in motion.
  • Step 2 performing feature extraction on the motion video to obtain motion feature data.
  • the motion feature data can represent posture data corresponding to each posture of the target object when it is moving, and each posture does not include orientation information.
  • feature extraction is performed on each frame of image corresponding to the motion video, the posture feature of the target object in each frame of image is determined, and the posture feature is used as the motion feature data corresponding to the frame of image, thereby, the corresponding motion feature of each frame of image can be obtained.
  • Motion characteristic data is
  • Step 3 Based on the motion feature data, determine the target motion data matching the motion feature data from the target prior space.
  • the posture features corresponding to each frame posture in the motion feature data and the posture data of each posture corresponding to each target motion data in the target prior space can be consistent matched, and the posture data of each posture included are respectively
  • the target motion data matched with each gesture feature corresponding to the motion feature data is used as the target motion data matched with the motion feature data.
  • Step 4 Determine the motion type of the target object based on the motion type corresponding to the target motion data matched with the motion feature data.
  • the motion type corresponding to the target motion data matched with the motion feature data may be used as the motion type of the target object in the motion video.
  • the extracted motion feature data that can represent the corresponding postures of the target object during motion is matched with the target motion data in the target prior space, and the target motion data that matches the motion feature data can be determined. Further, Based on the motion type corresponding to the target motion data matched with the motion feature data, the motion type of the target object can be accurately determined.
  • the pre-trained target encoding neural network can be used to perform encoding processing for removing the global orientation on the three-dimensional motion data corresponding to this type of motion, to obtain the The target motion data corresponding to this type of motion.
  • the target encoding neural network is a pre-trained encoding network, which can encode the input 3D motion data to remove the global orientation, so as to obtain more accurate target motion data.
  • S102-1 Determine first frequency domain data corresponding to the three-dimensional motion data in the frequency domain.
  • each attitude data corresponding to the three-dimensional motion data is data in the spatial domain
  • the first frequency domain data is used to represent the fusion coefficients of each attitude data on the first number of frequency domain components, wherein the first number is a preset
  • the frequency domain component is used to represent the amount of information corresponding to the attitude data.
  • the corresponding data dimension of the first frequency domain data may be preset, for example, the corresponding data dimension of the first frequency domain data may be a first number m.
  • the first frequency domain data may be m-dimensional fusion coefficients of the 128 frames of attitude data on m frequency domain components.
  • m is a positive integer.
  • DCT may be used to transform the three-dimensional motion data to obtain first frequency-domain data corresponding to the three-dimensional motion data in the frequency domain.
  • the overall position change corresponding to each key point of the target object can be determined based on the three-dimensional motion data, and then, DCT is used to convert the three-dimensional motion data that can represent the overall position change corresponding to each key point of the target object into a first frequency domain data.
  • Fourier transform may also be used to transform the three-dimensional motion data, so as to obtain first frequency domain data corresponding to the three-dimensional motion data in the frequency domain.
  • S102-2 Divide the three-dimensional motion data into at least two groups of three-dimensional motion sub-data, and determine second frequency-domain data respectively corresponding to the at least two groups of three-dimensional motion sub-data in the frequency domain.
  • the second frequency domain data is used to characterize the values of each attitude data corresponding to each group of three-dimensional motion sub-data on the second number of frequency domain components, and the second number is also preset.
  • the data dimension corresponding to the second frequency domain data may be preset, for example, the data dimension corresponding to the second frequency domain data may be n.
  • the multi-frame attitude data in the three-dimensional motion data can be divided into a second number of groups according to the order of the poses corresponding to each frame of pose data in the three-dimensional motion data according to the second number, and each group
  • the posture data is used as a set of three-dimensional motion sub-data, thus, multiple sets of three-dimensional motion sub-data are obtained. That is, the motion corresponding to the 3D motion data is divided into multiple segments, and the motion of one segment corresponds to a group of 3D motion sub-data.
  • position changes of each key point of the target object in the segmented motion corresponding to the set of three-dimensional motion sub-data may be determined.
  • DCT or Fourier transform can be used to convert the three-dimensional motion data capable of characterizing the position changes of each key point of the target object in the segmented motion corresponding to the group of three-dimensional motion sub-data into the second frequency domain data, The second frequency domain data corresponding to the group of three-dimensional motion sub-data is obtained.
  • the 128 frames of posture data can be divided into S groups first to obtain S groups of 3D motion sub-data, and each group of 3D motion sub-data Including 128/S frame attitude data. Then, DCT may be used to convert each set of three-dimensional motion sub-data to obtain n-dimensional second frequency-domain data corresponding to each set of three-dimensional motion sub-data. Furthermore, based on the conversion of each group of three-dimensional motion sub-data, S pieces of n-dimensional second frequency domain data can be obtained.
  • S102-3 Based on the first frequency domain data and the second frequency domain data, perform compression processing for removing the global orientation on the three-dimensional motion data to obtain target motion data.
  • the 3D motion data corresponding to each group of 3D motion sub-data can be compressed, and then the compressed motion data and the first frequency domain data can be fused , so as to complete the compression process of removing the global orientation on the three-dimensional motion data, and obtain the target motion data.
  • the target coding neural network After obtaining the 3D motion data, the 3D motion data can be input into the target coding neural network, and then the DCT conversion in the target coding neural network The module can output the first frequency domain data corresponding to the three-dimensional motion data, and the second frequency domain data corresponding to each group of three-dimensional motion sub-data, and then use the first frequency domain data and the second frequency domain data to analyze the three-dimensional motion data Perform compression processing to remove the global orientation, and output target motion data.
  • the frequency domain data can represent the change information of each key point of the target object in the three-dimensional motion data in the frequency domain, and each key point of the target object can accurately reflect the posture of the target object.
  • the three-dimensional motion data By converting the three-dimensional motion data to the frequency domain, it can Obtain the first frequency domain data that can reflect the overall change information of each key point of the target object during motion and the overall information amount of each key point of the target object during motion; by dividing the three-dimensional motion data into multiple groups of three-dimensional motion
  • the sub-data can realize the segmentation processing of the overall motion of the target object, and realize the detailed analysis of the overall motion of the target object;
  • the change information of key points in each segment of motion, and the second frequency domain data of the amount of information corresponding to each key point in each segment of motion; furthermore, based on the first frequency domain data and the second frequency domain data, it is beneficial to realize the three-dimensional motion Accurately remove the global orientation of the data to obtain accurate and reasonable target motion data.
  • S102-3-1 Based on the first frequency domain data, obtain frequency domain feature data of the three-dimensional motion data.
  • the fully connected layer in the target coding neural network can be used to perform mapping processing on the first frequency domain data, so as to obtain the frequency domain feature data corresponding to the three-dimensional motion data.
  • the m-dimensional first frequency-domain data may be converted into 512-dimensional frequency-domain feature data.
  • S102-3-2 Based on the second frequency domain data, determine weights corresponding to at least two sets of three-dimensional motion sub-data; based on weights corresponding to at least two sets of three-dimensional motion sub-data, perform weighting processing on the three-dimensional motion data to obtain the first 3D motion data.
  • the second frequency-domain data corresponding to multiple sets of three-dimensional motion sub-data can be fused first to obtain the fusion frequency domain data, wherein the dimension corresponding to the fused frequency domain data is the same as the number of groups of the three-dimensional motion sub-data.
  • the S pieces of n-dimensional second frequency domain data can be fused into Fused frequency domain data for S*n.
  • the second frequency domain data (n-dimensional second frequency domain data) corresponding to each group of three-dimensional motion sub-data in S groups of three-dimensional motion sub-data
  • the second frequency domain data can be firstly determined based on the second frequency domain data. domain data, and then, the determined weight vectors corresponding to each second frequency domain data may be fused to obtain S*n fused frequency domain data.
  • normalization processing can be performed on the fused frequency domain data to obtain weights corresponding to multiple sets of three-dimensional motion sub-data.
  • the softmax function may be used to normalize the S*n fused frequency domain data, and output S normalized weight vectors.
  • each normalized weight vector corresponds to each pose data in a set of three-dimensional motion sub-data.
  • the weight vector corresponding to each group of three-dimensional motion sub-data may be used as the weight corresponding to this group of three-dimensional motion sub-data.
  • weighting can be performed on each frame of attitude data corresponding to the set of three-dimensional motion sub-data based on the weight corresponding to each set of three-dimensional motion sub-data to obtain the set of three-dimensional motion
  • the weighted 3D motion data corresponding to the sub-data may be used as the first 3D motion data.
  • S102-3-3 Perform feature extraction in at least two scales on the first three-dimensional motion data to obtain second three-dimensional motion data respectively corresponding to the at least two scales.
  • a scale can correspond to one convolutional layer and at least one fully connected layer in a target encoding neural network.
  • the convolutional layer and the fully connected layer corresponding to the multiple scales deployed in the target encoding neural network can be used to sequentially extract the features of the first 3D motion data at multiple scales, and obtain multiple The second three-dimensional motion data corresponding to the scales respectively.
  • the convolution layer corresponding to the scale can be used to perform convolution processing on the input three-dimensional motion data corresponding to the scale, and the convolution The result of the product processing is subjected to full-connection mapping processing to obtain the second three-dimensional motion data corresponding to this scale.
  • the input 3D motion data includes: the second 3D motion data corresponding to the former scale of at least two scales, or the first 3D motion data. As shown in FIG.
  • FIG. 2 it is a schematic structural diagram of a network structure for determining second 3D motion data corresponding to a scale provided by an embodiment of the present disclosure, where the scale corresponds to a convolutional layer L101 and two full connections Layers L102 and L103.
  • the convolutional layer L101 performs convolution processing on the input 3D motion data
  • the fully connected layers L102 and L103 perform fully connected mapping processing on the result of the convolution processing to obtain the second 3D motion data corresponding to a scale.
  • FIG. 3 it is a schematic structural diagram of a network structure for performing feature extraction on the first 3D motion data at multiple scales to obtain second 3D motion data corresponding to multiple scales according to an embodiment of the present disclosure.
  • Fig. 3 includes three extraction modules corresponding to three kinds of scales, including the first extraction module L201, the second extraction module L202 and the third extraction module L203; wherein, each extraction module includes a volume as shown in Fig. 2 Laminate layer L101 and two fully connected layers L102 and L103.
  • Fig. 2 Laminate layer L101 and two fully connected layers L102 and L103.
  • the first extraction module L201, the second extraction module L202, and the third extraction module L203 sequentially perform feature extraction on the first three-dimensional motion data at different scales, and obtain the second output corresponding to the first scale of three-dimensional motion data , the output second three-dimensional motion data corresponding to the second scale and the output second three-dimensional motion data corresponding to the third scale.
  • the extraction module may be a residual block (Residual Block).
  • the number of extraction modules can be set according to needs, and is not limited here. In the embodiment of the present disclosure, the number of extraction modules is 3 as an example for illustration.
  • the 3D motion data input to the convolution layer corresponding to the first scale is the first 3D motion data
  • the convolution layer is used to convolve the first 3D motion data processing to obtain the convolution processing result; after that, input the convolution processing result to the first fully connected layer, and use the first fully connected layer to perform the first fully connected mapping processing on the convolution processing result to obtain the first mapping Processing results; then input the first mapping processing results to the second fully connected layer, and use the second fully connected layer to perform further fully connected mapping processing on the first mapping processing results to obtain the second mapping processing results; finally, the The convolution processing result is fused with the second mapping processing result to obtain second 3D motion data corresponding to the first scale.
  • the second three-dimensional motion data output by the previous scale can be used as the input corresponding to the scale, and the second three-dimensional motion data output by the previous scale can be analyzed by using the convolutional layer corresponding to the scale Perform convolution processing to obtain the convolution processing result corresponding to the scale; after that, input the convolution processing result corresponding to the scale to the first fully connected layer, and use the first fully connected layer to perform the first convolution processing result.
  • the second fully-connected mapping process is performed to obtain the first mapping processing result corresponding to the scale; then the first mapping processing result corresponding to the scale is input to the second fully-connected layer, and the second fully-connected layer is used to The first mapping processing result is further fully connected mapping processing to obtain the second mapping processing result corresponding to the scale; finally, the convolution processing result corresponding to the scale is fused with the second mapping processing result corresponding to the scale to obtain the The scale corresponds to the second 3D motion data.
  • the second three-dimensional motion data corresponding to the three scales can be obtained.
  • S102-3-4 Fusing the frequency domain feature data with the second three-dimensional motion data respectively corresponding to at least two scales to obtain target motion data.
  • the fully connected layer in the target encoding neural network can be used to fuse the frequency domain feature data and the second three-dimensional motion data corresponding to multiple scales to obtain the target motion data.
  • the present disclosure can determine each group of three-dimensional motion sub-data based on the information amount reflected by the second frequency domain data corresponding to each group of three-dimensional motion sub-data.
  • the weights corresponding to the sub-data and then based on the weights corresponding to multiple sets of three-dimensional motion sub-data, weight the three-dimensional motion data, which can realize high-precision compression of each segment of motion corresponding to the three-dimensional motion data, and improve the accuracy of the first three-dimensional motion data.
  • the second three-dimensional motion data corresponding to the first three-dimensional motion data at different depths can be obtained, thus, by extracting the first three-dimensional motion data corresponding to multiple scales respectively
  • the fusion of the 2D and 3D motion data and the frequency domain feature data can improve the accuracy of the target motion data thanks to the richness of the second 3D motion data in the depth dimension.
  • the frequency domain feature data and the second 3D motion data respectively corresponding to at least two scales may be spliced first to obtain spliced third 3D motion data.
  • the spliced third three-dimensional motion data is input to the fully connected layer in the target encoding neural network, and the fully connected layer is used to perform fully connected mapping processing on the third three-dimensional motion data to obtain the target motion data.
  • the target motion data obtained in this disclosure can be feature data of the target dimension, for example, the target motion data can be 1*256-dimensional feature data, that is, compress the 128*72-dimensional three-dimensional motion data into 1*256-dimensional target motion data.
  • the data unification of the frequency-domain feature data and the second three-dimensional motion data can be realized, and the unified third three-dimensional motion data can be obtained; Then, through the fully connected mapping processing of the third three-dimensional motion data, the third three-dimensional motion data in the hidden layer feature space can be restored to the sample initial space corresponding to the three-dimensional motion data, and the target motion data matching the sample initial space can be obtained .
  • the embodiments of the present disclosure also provide a method for training a target coding neural network.
  • sample data may be obtained first; wherein, the sample data includes: sample pose data corresponding to at least two sample poses; the sample pose data can represent the sample pose, for example, may include the sample pose and orientation information corresponding to the sample pose.
  • the multiple sample poses are multiple continuous poses corresponding to one motion, and the acquired sample data is the motion data with the global orientation removed.
  • the sample data may include posture data corresponding to multiple sample movements, and the posture data corresponding to each sample movement is sample posture data corresponding to multiple sample postures of the sample movement.
  • the step of obtaining sample data can be implemented according to the following steps:
  • P1 Obtain original 3D motion data corresponding to at least two sample motions; the original 3D motion data includes: attitude data corresponding to at least two sample poses.
  • the original 3D motion data is motion data including global orientation
  • the corresponding posture data in each original 3D motion data may include posture and posture orientation information, wherein the orientation information corresponding to the posture is affected by the global orientation of the target object Orientation information.
  • the original three-dimensional motion data respectively corresponding to various sample motions may be obtained first.
  • the orientation information corresponding to the first sample pose can be determined based on the pose data corresponding to the first sample pose among the multiple sample poses included in the original three-dimensional data. , and then, based on the orientation information corresponding to the first sample pose, the yaw corresponding to the first sample pose can be determined; then, with the yaw corresponding to the first sample pose being 0 degrees as the target, determine the steering direction corresponding to the first sample pose on the y-axis Angle and steering direction (clockwise or counterclockwise), that is, based on the steering angle and steering direction, the first sample pose is rotated on the y-axis, and the yaw corresponding to the rotated first sample pose is 0 degrees.
  • the steering angle and steering direction may be used as the steering angle and steering direction corresponding to each sample pose included in the original three-dimensional motion data.
  • the steering angle and steering direction can be represented in the form of a rotation matrix.
  • the steering angle corresponding to each original three-dimensional motion data can be determined respectively.
  • P3 Based on the steering angle, perform steering processing on each sample pose in the original 3D motion data to obtain sample data.
  • the steering angle and steering direction corresponding to the original 3D motion data can be used to sequentially perform steering processing on each sample pose in the original 3D motion data to obtain each original 3D motion data corresponding sample data.
  • the yaw corresponding to the first sample pose in the sample data is 0 degrees, so as to realize the normalization corresponding to the first sample pose. Since the relative angles and directions between the sample poses in the sample data are consistent with the relative angles and directions between the sample poses in the original 3D motion data corresponding to the sample data, the normalization of each sample pose in the original 3D motion data is realized .
  • the steering processing of the sample pose corresponding to each original 3D motion data can be realized, and then the normalization of each sample pose corresponding to each original 3D motion data can be realized. to obtain the sample data corresponding to each original 3D motion data.
  • the redundant information in the original 3D motion data can be reduced, thereby improving the reconstruction accuracy of the prior space.
  • FIG. 4 it is a flow chart of a method for training an encoding neural network provided by an embodiment of the present disclosure, which may include the following steps:
  • S401 Perform random global orientation and steering processing on the sample data to obtain first intermediate sample data; the first intermediate sample data includes: first pose data corresponding to at least two sample poses respectively.
  • the random rotation module can be used to uniformly sample the sample data to obtain a random rotation angle corresponding to the sample data within a preset range, and use the random rotation angle to perform a random rotation angle for each sample in the sample data
  • the attitude is subjected to random global orientation and steering processing to obtain the first intermediate sample data.
  • the sample pose can be rotated clockwise or counterclockwise on the y-axis, and the rotation angle is a random rotation angle, and then the rotated first sample pose corresponding to the sample pose can be obtained , and the first pose data corresponding to the first sample pose.
  • the first sample pose and the first pose data corresponding to each sample pose can be obtained respectively.
  • S402 Use the coding neural network to perform coding processing for removing the global orientation on the first intermediate sample data to obtain coded motion data.
  • the first intermediate sample data can be input into the encoding neural network to be trained, and the encoding neural network is used to perform encoding processing to remove the global orientation on the first intermediate sample data according to the encoding methods mentioned in the above-mentioned embodiments, to obtain encoding motion data; and, during the process of processing the first intermediate sample data, the encoding neural network eliminates the random rotation angle corresponding to the first intermediate sample data, and outputs encoded motion data without information related to the random rotation angle.
  • S403 Use the decoding neural network to decode the coded motion data to obtain second intermediate sample data; the second intermediate sample data includes: second pose data corresponding to at least two sample poses.
  • the decoding neural network is a neural network that matches the encoding neural network, and is used to decode and restore data encoded by the encoding neural network.
  • the coded motion data may be input into a decoding neural network to be trained, and the coded motion data may be decoded by using the decoding neural network, thereby outputting the decoded second intermediate sample data.
  • the second intermediate sample data includes the second attitude data corresponding to the plurality of sample attitudes predicted by the decoding neural network; since the encoded motion data is the output data without random rotation angle-related information, the second intermediate sample There is also no information about random rotation angles in the data.
  • FIG. 5 it is a schematic diagram of obtaining the second intermediate sample data by using the encoding neural network and the decoding neural network to process the acquired original 3D motion data provided by the embodiment of the present disclosure.
  • the steering angle corresponding to the original 3D motion data can be determined first, and the steering processing is performed on each sample pose in the original 3D motion data to obtain the sample data; and then Use the random rotation module L301 to perform random global orientation steering processing on the sample data to obtain the first intermediate sample data; then use the encoding neural network L302 to perform encoding processing to remove the global orientation on the first intermediate sample data, and output encoded motion data.
  • the coded motion data is decoded by the decoding neural network L303, and the second intermediate sample data is output.
  • the decoding neural network may include a decoding module L3031 and a restoration module L3032.
  • the decoding module L3031 includes multiple fully connected layers
  • the restoration module L3032 also includes a first decoding network and a second decoding network.
  • the first intermediate sample data can be DCT transformed first to obtain the first frequency domain data
  • the second frequency domain data respectively corresponding to the plurality of sets of intermediate sub-sample data, determining weights respectively corresponding to the plurality of sets of intermediate sub-sample data based on the plurality of second frequency domain data, and then using the obtained weights to perform weighting processing on the first intermediate sample data , to obtain the first sample three-dimensional motion data; and based on the DCT transformation performed on the first intermediate sample data, determine the first frequency domain data corresponding to the first intermediate sample data, and then determine the sample frequency domain feature data based on the first frequency domain data ; Afterwards, using the first extraction module, the second extraction module and the third extraction module to perform feature extraction of multiple scales on the first sample three-dimensional motion data, and obtain second sample three-dimensional motion data corresponding to multiple scales; finally,
  • Fig. 5 for the operation of outputting the second intermediate sample data, after obtaining the coded motion data, it can be input into the decoding module L3031 in the decoding neural network L303, using multiple fully connected layers (including the first fully connected layer , the second fully connected layer, the third fully connected layer, and the fourth fully connected layer) restore the encoded motion data at multiple scales, and finally output the predicted orientation feature data corresponding to each sample pose and the corresponding pose feature data. Then, the restoration module L3032 in the decoding neural network L303 can be used to perform feature splicing on the predicted orientation feature data corresponding to each sample pose and the pose feature data corresponding to each sample pose to obtain the second intermediate sample corresponding to the encoded motion data data.
  • the obtained coded motion data can be input to the first fully connected layer of the decoding module L3031, and the coded motion data is subjected to a fully connected mapping process using the first fully connected layer to obtain the first output characteristic data corresponding to the first fully connected layer; Then, the first output feature data is input to the second fully connected layer, and the second fully connected layer is used to perform full connection mapping processing on the first output feature data to obtain the second output feature data; the second output feature data and the first The output feature data is input to the third fully connected layer together, and the second output feature data and the first output feature data are used to perform full connection mapping processing on the third fully connected layer to obtain the third output feature data; the third output feature data and The second output feature data is input to the fourth fully connected layer together, and the fourth fully connected layer is used to perform fully connected mapping processing on the second output feature data and the third output feature data, and output the orientation feature data ⁇ g corresponding to each sample pose And the posture feature data ⁇ l of the specific shape of the posture corresponding to each sample posture.
  • ⁇ g can be 128*6-dimensional data
  • ⁇ l can be 128*32-dimensional data .
  • the obtained ⁇ g and ⁇ l are implicit feature data and cannot be directly output, it is necessary to use the first decoding network D cont in the restoration module to decode ⁇ g to obtain the displayed first target feature data , and use the second decoding network D vp in the restoration module to decode ⁇ l to obtain the displayed second target feature data, and then perform feature stitching on the first target feature data and the second target feature data to obtain the second target feature data Intermediate sample data.
  • the second target feature data can be 128*69-dimensional data
  • the second intermediate sample data of 128*72 dimensions can be obtained after splicing.
  • the gradient of the coded motion data can be prevented from disappearing, and the full fusion of the coded motion data can be realized.
  • the model loss corresponding to the encoding neural network and the decoding neural network can be determined, and the encoding neural network and the decoding neural network can be performed using the model loss (shown in FIG. 5 ).
  • the encoding neural network and decoding neural network completed in this round of training are obtained.
  • multiple rounds of training may be performed on the encoding neural network and the decoding neural network based on the above S401-S404.
  • S405 Determine the encoding neural network that has undergone at least two rounds of training as the target encoding neural network.
  • the encoding neural network that has undergone multiple rounds of training can be determined as the target encoding neural network
  • the decoding neural network that has undergone multiple rounds of training can be determined as the target decoding neural network.
  • the number of rounds of multi-round training can be a preset value, or, the number of rounds of multi-round training can be determined according to the accuracy of the trained encoding neural network and the decoding neural network, and the number of training completed encoding neural network and decoding neural network When the accuracy meets the preset accuracy, the target encoding neural network and the target decoding neural network are obtained.
  • the noise-adding processing of the sample data can be realized, and the encoding neural network is used to encode the first intermediate sample data after the noise-adding processing, which can improve the denoising ability of the encoding neural network ;
  • Using the decoding neural network to decode the coded motion data can realize the restoration of the coded motion data.
  • the second intermediate sample data output by the decoding neural network will also be relatively accurate , which is close to the sample data, therefore, based on the sample data and the second intermediate sample data, the corresponding model losses of the encoding neural network and the decoding neural network can be determined, and then multiple rounds of training are performed on the encoding neural network and the decoding neural network based on the model loss , can obtain the target encoding neural network with reliable accuracy and the decoding neural network with reliable accuracy.
  • the model loss can be determined using the following steps:
  • S1 Based on the sample data and the second intermediate sample data, at least one of the following is determined: a sample data reconstruction loss, a similarity loss between encoded motion data and a normal distribution.
  • the sample data reconstruction loss is used to characterize the loss of the decoding neural network when decoding the encoded motion data and the loss of the encoding neural network to determine the encoded motion data;
  • the similarity loss is used to characterize the encoding motion data in the first intermediate sample data as Similarity loss between conditional probability and normal distributions.
  • the sample data reconstruction loss in this step can be based on the orientation feature data corresponding to each frame of sample pose data in the sample data, the pose feature data corresponding to each frame of sample pose data, and the second intermediate sample data.
  • the orientation feature data corresponding to the second pose data of each frame and the pose feature data corresponding to the second pose data of each frame determine the sample data reconstruction loss.
  • each frame of sample pose data in the sample data may include orientation feature data that can characterize the orientation corresponding to each sample pose, and pose feature data that can characterize the specific pose shape corresponding to each sample pose.
  • the second pose data corresponding to the multiple sample poses included in the second intermediate sample data also include the orientation feature data corresponding to the second pose data of each frame, and the pose features corresponding to the second pose data of each frame. data.
  • the sample data reconstruction loss can be determined based on the following formula (1):
  • L rec represents the sample data reconstruction loss
  • M represents the encoding neural network and decoding neural network
  • represents the shape parameter
  • ⁇ l represents the pose feature data corresponding to the sample pose.
  • the sample data reconstruction loss can be determined.
  • L KL represents the similarity loss
  • KL represents the divergence
  • z mot represents encoded motion data
  • N(0, I) indicates a normal distribution with a mean value of 0 and a standard deviation of 1.
  • the disclosure determines the model loss based on at least one of the following: sample data reconstruction loss, similarity loss between encoded motion data and normal distribution.
  • model loss can be determined according to the following formula (3):
  • ⁇ rec represents the first loss coefficient corresponding to the sample data reconstruction loss
  • ⁇ KL represents the second loss coefficient corresponding to the similarity loss
  • the sum obtained by multiplying the first loss coefficient and the sample data reconstruction loss may be added to the sum of the second loss coefficient and the similarity loss, and the model loss may be determined based on the result obtained after the addition.
  • the loss of the decoding neural network when restoring the orientation information corresponding to the pose data can be determined based on the orientation feature data
  • the loss of the decoding neural network when restoring the pose of each frame can be determined based on the pose feature data, and the two losses are reconstructed as sample data
  • the loss trains the encoding neural network and decoding neural network, which can improve the accuracy of the output orientation feature data and attitude feature data.
  • the embodiment of the present disclosure also provides a priori space generation device corresponding to the generation method of the priori space. Since the problem-solving principle of the device in the embodiment of the present disclosure is the same as that of the priori space in the embodiment of the present disclosure The generation method of is similar, so the implementation of the device can refer to the implementation of the method.
  • FIG. 6 it is a schematic diagram of a device for generating a priori space provided by an embodiment of the present disclosure, including:
  • the acquisition module 601 is configured to acquire three-dimensional motion data corresponding to at least two kinds of motions of the target object; the three-dimensional motion data includes: posture data corresponding to at least two postures corresponding to the motion; the encoding module 602 is configured to for each The three-dimensional motion data corresponding to the motion is subjected to encoding processing for removing the global orientation to obtain the target motion data corresponding to each type of motion; the determination module 603 is configured to generate a priori target based on the target motion data corresponding to the at least two types of motion space.
  • the device further includes: an identification module 604 configured to acquire the target object's movement data after generating the target prior space based on the target motion data respectively corresponding to at least two types of motion. motion video; feature extraction is performed on the motion video to obtain motion feature data; based on the motion feature data, determine from the target prior space the target motion data that matches the motion feature data; The motion type corresponding to the target motion data matched with the motion feature data determines the motion type of the target object.
  • an identification module 604 configured to acquire the target object's movement data after generating the target prior space based on the target motion data respectively corresponding to at least two types of motion. motion video; feature extraction is performed on the motion video to obtain motion feature data; based on the motion feature data, determine from the target prior space the target motion data that matches the motion feature data; The motion type corresponding to the target motion data matched with the motion feature data determines the motion type of the target object.
  • the encoding module 602 is configured to determine the first frequency domain data corresponding to the 3D motion data in the frequency domain, and divide the 3D motion data into at least two groups of 3D motion sub-data , and determine the second frequency domain data respectively corresponding to the at least two groups of three-dimensional motion sub-data in the frequency domain; based on the first frequency domain data and the second frequency domain data, remove the overall Orientation compression processing to obtain the target motion data.
  • the encoding module 602 is configured to obtain frequency-domain feature data of the three-dimensional motion data based on the first frequency-domain data; and, based on the second frequency-domain data, determine weights corresponding to at least two groups of three-dimensional motion sub-data; based on weights corresponding to at least two groups of three-dimensional motion sub-data, performing weighting processing on the three-dimensional motion data to obtain first three-dimensional motion data; performing feature extraction on at least two scales to obtain second three-dimensional motion data corresponding to the at least two scales; fusing the frequency domain feature data with the second three-dimensional motion data corresponding to at least two scales to obtain The target motion data.
  • the encoding module 602 is configured to perform fusion processing on the second frequency domain data respectively corresponding to at least two groups of three-dimensional motion sub-data to obtain fusion frequency domain data; wherein, the fusion frequency domain data The dimensions of the data are the same as the number of groups of the 3D motion sub-data; the fusion frequency domain data is normalized to obtain weights corresponding to at least two groups of 3D motion sub-data.
  • the encoding module 602 is configured to perform convolution processing on the input 3D motion data corresponding to the scale for each scale of at least two scales, and perform full convolution processing on the result of the convolution processing. Connecting the mapping process to obtain the second three-dimensional motion data corresponding to the scale; wherein, the input three-dimensional motion data corresponding to the scale includes: the second three-dimensional motion data corresponding to the former scale of the at least two scales, the First 3D motion data.
  • the encoding module 602 is configured to concatenate the frequency-domain feature data and second 3D motion data corresponding to at least two scales to obtain third 3D motion data;
  • the third three-dimensional motion data is subjected to full connection mapping processing to obtain the target motion data.
  • the coding module 602 is configured to remove the three-dimensional motion data corresponding to each type of motion by using a pre-trained target coding neural network for the three-dimensional motion data corresponding to each type of motion.
  • the encoding process of the global orientation obtains the target motion data corresponding to each type of motion.
  • the device further includes: a training module 605 configured to train the target encoding neural network in the following manner: obtain sample data; the sample data includes: at least two sample poses respectively Corresponding sample attitude data; perform at least two rounds of training, and in each round of training, perform the following process: perform random global orientation and steering processing on the sample data to obtain first intermediate sample data; the first intermediate sample The data includes: at least two first posture data respectively corresponding to the sample postures; using a coding neural network to perform coding processing for removing the global orientation on the first intermediate sample data to obtain coded motion data; using a decoding neural network to encode the The encoded motion data is decoded to obtain second intermediate sample data; the second intermediate sample data includes: at least two second pose data corresponding to at least two sample poses; based on the sample data and the second intermediate sample
  • the current round of training is performed on the encoding neural network and the decoding neural network; the encoding neural network that has undergone at least two rounds of training is determined as the target encoding neural network.
  • the training module 605 is configured to determine a model loss based on the sample data and the second intermediate sample data;
  • the decoding neural network is trained for the current round.
  • the training module 605 is configured to determine at least one of the following based on the sample data and the second intermediate sample data: sample data reconstruction loss, encoded motion data, and normal distribution A similarity loss between them; determining the model loss based on at least one of the sample data reconstruction loss, the similarity loss between the coded motion data and the normal distribution.
  • the training module 605 is configured to be based on the orientation feature data corresponding to each frame of sample pose data in the sample data, the pose feature data corresponding to each frame of sample pose data, the second intermediate The orientation feature data corresponding to each frame of the second pose data in the sample data, and the pose feature data corresponding to each frame of the second pose data are used to determine the sample data reconstruction loss.
  • the training module 605 is configured to acquire original three-dimensional motion data corresponding to at least two sample motions; the original three-dimensional motion data includes: gesture data corresponding to at least two sample poses; Based on the orientation information of the attitude data corresponding to the first sample attitude in the attitude data corresponding to the at least two sample attitudes, determine the steering angle corresponding to the original three-dimensional motion data; Turning processing is performed on each sample pose in the data to obtain the sample data.
  • FIG. 7 is a schematic structural diagram of a computer device provided by an embodiment of the present disclosure, including:
  • Processor 71 and memory 72 stores machine-readable instructions executable by the processor 71, the processor 71 is used to execute the machine-readable instructions stored in the memory 72, and the machine-readable instructions are executed by the processor 71 During execution, the processor 71 performs the following steps: obtain the original prior space; the original prior space includes three-dimensional motion data corresponding to various movements of the target object; the three-dimensional motion data includes: Attitude data: The three-dimensional motion data corresponding to each type of motion is encoded to remove the global orientation, and the target motion data corresponding to each type of motion and the target motion data corresponding to multiple types of motion are obtained to generate the target prior space.
  • memory 72 comprises memory 721 and external memory 722;
  • Memory 721 here is also called internal memory, is used for temporarily storing computing data in processor 71, and the data exchanged with external memory 722 such as hard disk, processor 71 communicates with memory 721 through memory 721.
  • the external memory 722 performs data exchange.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the method for generating a priori space described in the above-mentioned method embodiments is executed. step.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • the computer program product of the method for generating a priori space provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program code can be used to execute the priori described in the above method embodiments. For the steps of the method for generating the space, refer to the foregoing method embodiments.
  • the computer program product can be realized by hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) or the like.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions described above are implemented in the form of software function units and sold or used as independent products, they can be stored in a volatile or non-volatile computer-readable storage medium executable by a processor.
  • the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

Provided in the present disclosure are an apriori space generation method and apparatus, and a computer device, a storage medium, a computer program and a computer program product. The method comprises: acquiring three-dimensional motion data respectively corresponding to at least two motions of a target object, wherein the three-dimensional motion data comprises posture data respectively corresponding to at least two postures of a corresponding motion; performing encoding processing, for removing global orientation, on the three-dimensional motion data corresponding to each motion, so as to obtain target motion data corresponding to each motion; and generating a target apriori space on the basis of the target motion data respectively corresponding to the at least two motions.

Description

先验空间的生成方法、装置、计算机设备、存储介质、计算机程序及计算机程序产品Method, device, computer equipment, storage medium, computer program, and computer program product for generating a priori space
相关申请的交叉引用Cross References to Related Applications
本公开基于申请号为202111275623.6、申请日为2021年10月29日、申请名称为“先验空间的生成方法、装置、计算机设备和存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。This disclosure is based on the Chinese patent application with the application number 202111275623.6, the application date is October 29, 2021, and the application title is "Method, device, computer equipment and storage medium for generating prior space", and requires the Chinese patent application Priority, the entire content of the Chinese patent application is hereby incorporated by reference into this disclosure.
技术领域technical field
本公开涉及计算机视觉技术领域,尤其涉及一种先验空间的生成方法、装置、计算机设备、存储介质、计算机程序及计算机程序产品。The present disclosure relates to the technical field of computer vision, and in particular to a method, device, computer equipment, storage medium, computer program and computer program product for generating a priori space.
背景技术Background technique
三维人体运动不仅要求运动对应的每个姿态是合理的,还要求连续的姿态之间的转换也是合理的,从而保证整体三维人体运动的合理性。在通过神经网络进行三维人体运动重建的过程中,利用先验空间进行运动合理性的约束,能够使得神经网络重建的三维人体运动更加合理。Three-dimensional human motion requires not only that each posture corresponding to the motion is reasonable, but also that the transition between continuous postures is also reasonable, so as to ensure the rationality of the overall three-dimensional human motion. In the process of reconstructing 3D human motion through neural network, using prior space to constrain the rationality of motion can make the 3D human motion reconstructed by neural network more reasonable.
但现有的先验空间存在人体运动数据中的上下文信息丢失,从而影响了神经网络重建的人体运动数据的合理性以及准确性的问题。However, in the existing prior space, the context information in the human motion data is lost, which affects the rationality and accuracy of the human motion data reconstructed by the neural network.
发明内容Contents of the invention
本公开实施例至少提供一种先验空间的生成方法、装置、计算机设备、存储介质、计算机程序及计算机程序产品。Embodiments of the present disclosure at least provide a method, device, computer equipment, storage medium, computer program, and computer program product for generating a priori space.
第一方面,本公开实施例提供了一种先验空间的生成方法,包括:In the first aspect, the embodiment of the present disclosure provides a method for generating a priori space, including:
获取目标对象的至少两种运动分别对应的三维运动数据;所述三维运动数据包括:对应运动的至少两个姿态分别对应的姿态数据;Acquiring three-dimensional motion data corresponding to at least two kinds of motions of the target object; the three-dimensional motion data includes: posture data corresponding to at least two postures corresponding to the motion;
对每种运动对应的三维运动数据进行去除全局朝向的编码处理,得到所述每种运动对应的目标运动数据;performing encoding processing for removing the global orientation on the three-dimensional motion data corresponding to each type of motion, to obtain target motion data corresponding to each type of motion;
基于所述至少两种运动分别对应的目标运动数据,生成目标先验空间。Based on the target motion data respectively corresponding to the at least two types of motion, a target prior space is generated.
该方法中,通过对每种运动对应的三维运动数据进行去除全局朝向的编码处理,生成能够表征运动的姿态特征的目标运动数据,能够将全局朝向信息从数据空间中去除,降低了数据空间的复杂度;基于目标运动数据生成的目标先验空间,能够提高生成的目 标先验空间的合理性以及准确性;进而,利用目标先验空间进行运动合理性、和准确性的约束,能够降低神经网络建模运动数据的难度。In this method, the 3D motion data corresponding to each type of motion is encoded to remove the global orientation to generate target motion data that can represent the posture characteristics of the motion, and the global orientation information can be removed from the data space, reducing the data space. Complexity; the target prior space generated based on the target motion data can improve the rationality and accuracy of the generated target prior space; furthermore, using the target prior space to constrain the rationality and accuracy of the movement can reduce the neural network The difficulty of modeling motion data for networks.
第二方面,本公开实施例提供一种先验空间的生成装置,包括:In the second aspect, an embodiment of the present disclosure provides an apparatus for generating a priori space, including:
获取模块,配置为获取目标对象的至少两种运动分别对应的三维运动数据;所述三维运动数据包括:对应运动的至少两个姿态分别对应的姿态数据;编码模块,配置为对每种运动对应的三维运动数据进行去除全局朝向的编码处理,得到所述每种运动对应的目标运动数据;确定模块,配置为基于所述至少两种运动分别对应的目标运动数据,生成目标先验空间。The acquisition module is configured to acquire three-dimensional motion data corresponding to at least two kinds of motions of the target object; the three-dimensional motion data includes: posture data corresponding to at least two postures corresponding to the motion; the encoding module is configured to correspond to each type of motion The three-dimensional motion data of the three-dimensional motion data is subjected to encoding processing to remove the global orientation to obtain the target motion data corresponding to each type of motion; the determination module is configured to generate a priori space of the target based on the target motion data corresponding to the at least two types of motion.
第三方面,本公开实施例提供一种计算机设备,包括:处理器、存储器,所述存储器存储有所述处理器可执行的机器可读指令,所述处理器用于执行所述存储器中存储的机器可读指令,所述机器可读指令被所述处理器执行时,所述机器可读指令被所述处理器执行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a third aspect, an embodiment of the present disclosure provides a computer device, including: a processor and a memory, the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the instructions stored in the memory Machine-readable instructions, when the machine-readable instructions are executed by the processor, when the machine-readable instructions are executed by the processor, the first aspect above, or any possible implementation manner in the first aspect is executed in the steps.
第四方面,本公开实施例提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被运行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the above-mentioned first aspect, or any possibility in the first aspect may be executed. steps in the implementation of .
第五方面,本公开实施例提供一种计算机程序,包括计算机可读代码,在所述计算机可读代码被计算机读取并执行的情况下,设备中的处理器执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In the fifth aspect, the embodiments of the present disclosure provide a computer program, including computer readable codes. When the computer readable codes are read and executed by a computer, the processor in the device executes the above first aspect, or the first aspect. A step in any one of the possible implementations of an aspect.
第六方面,本公开实施例提供一种计算机程序产品,配置为存储计算机可读指令,所述计算机可读指令被执行时使得计算机执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a sixth aspect, an embodiment of the present disclosure provides a computer program product configured to store computer-readable instructions. When the computer-readable instructions are executed, the computer executes the above-mentioned first aspect, or any one of the possible options in the first aspect. steps in the implementation.
关于上述先验空间的生成装置、计算机设备、计算机可读存储介质、计算机程序及计算机程序产品的效果描述参见上述先验空间的生成方法的说明。For the effect description of the generating device, computer equipment, computer-readable storage medium, computer program, and computer program product of the above-mentioned prior space, please refer to the description of the above-mentioned method for generating a priori space.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show the embodiments consistent with the present disclosure, and are used together with the description to explain the technical solutions of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make From these drawings other related drawings are obtained.
图1示出了本公开实施例提供的一种先验空间的生成方法的流程图;FIG. 1 shows a flowchart of a method for generating a priori space provided by an embodiment of the present disclosure;
图2示出了本公开实施例提供的一种用于确定一种尺度对应的第二三维运动数据的网络结构的结构示意图;Fig. 2 shows a schematic structural diagram of a network structure for determining second 3D motion data corresponding to a scale provided by an embodiment of the present disclosure;
图3示出了本公开实施例提供的一种用于对第一三维运动数据进行多种尺度下的特征提取,得到多种尺度分别对应的第二三维运动数据的网络结构的结构示意图;FIG. 3 shows a schematic structural diagram of a network structure for performing feature extraction on the first 3D motion data at multiple scales to obtain second 3D motion data corresponding to multiple scales according to an embodiment of the present disclosure;
图4示出了本公开实施例提供的一种训练编码神经网络的方法的流程图;FIG. 4 shows a flow chart of a method for training an encoding neural network provided by an embodiment of the present disclosure;
图5示出了本公开实施例提供的一种利用编码神经网络和解码神经网络对获取的原始三维运动数据进行处理,得到目标运动数据的示意图;FIG. 5 shows a schematic diagram of obtaining target motion data by using an encoding neural network and a decoding neural network to process acquired original three-dimensional motion data provided by an embodiment of the present disclosure;
图6示出了本公开实施例提供的一种先验空间的生成装置的示意图;FIG. 6 shows a schematic diagram of an apparatus for generating a priori space provided by an embodiment of the present disclosure;
图7示出了本公开实施例提供的一种计算机设备结构示意图。Fig. 7 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative effort shall fall within the protection scope of the present disclosure.
另外,本公开实施例中的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。In addition, the terms "first", "second" and the like in the description and claims in the embodiments of the present disclosure and the above drawings are used to distinguish similar objects, and not necessarily used to describe a specific order or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein can be practiced in sequences other than those illustrated or described herein.
经研究发现,合理的三维人体运动不仅要求运动对应的每个姿态是合理的,还要求连续的姿态之间的转换也是合理的,从而保证整体三维人体运动的合理性。在通过神经网络进行三维人体运动重建的过程中,利用合理的先验空间进行运动合理性的约束,能够使得神经网络重建的三维人体运动更加合理。After research, it is found that reasonable three-dimensional human motion requires not only that each posture corresponding to the motion is reasonable, but also that the transition between continuous postures is also reasonable, so as to ensure the rationality of the overall three-dimensional human motion. In the process of reconstructing 3D human motion through neural network, using a reasonable prior space to constrain the rationality of motion can make the 3D human motion reconstructed by neural network more reasonable.
而三维运动数据(包括三维人体运动)中的环境信息决定着运动的朝向,一段相同的运动,在不同的环境信息下,该运动对应的各个姿态的朝向不同,三维运动数据也不同,但除运动对应的各个姿态的朝向以外,三维运动数据中的各个姿态均相同,而这种环境信息造成的运动朝向导致了数据空间复杂度较高,提高了对人体运动数据建模的难度。当前为了降低神经网络建模人体运动数据的难度,通常通过减少运动所包含的帧数的方式来减少环境信息对人体运动数据的影响,从而降低人体运动数据对应数据空间的复杂度,但这样的方式得到的先验空间导致人体运动数据中的上下文信息丢失,影响了神经网络重建的人体运动数据的合理性以及准确性。The environmental information in the 3D motion data (including 3D human body motion) determines the orientation of the motion. For the same motion, under different environmental information, the orientations of the postures corresponding to the motion are different, and the 3D motion data are also different. In addition to the orientation of each posture corresponding to the motion, each posture in the 3D motion data is the same, and the motion orientation caused by this environmental information leads to a high data space complexity, which increases the difficulty of modeling human motion data. At present, in order to reduce the difficulty of neural network modeling human motion data, the impact of environmental information on human motion data is usually reduced by reducing the number of frames included in the motion, thereby reducing the complexity of the data space corresponding to human motion data, but such The prior space obtained by this method leads to the loss of context information in the human motion data, which affects the rationality and accuracy of the human motion data reconstructed by the neural network.
基于上述研究,本公开提供了一种先验空间的生成方法、装置、计算机设备、存储介质、计算机程序及计算机程序产品,通过对每种运动对应的三维运动数据进行去除全 局朝向的编码处理,生成能够表征运动的姿态特征的目标运动数据,能够将全局朝向信息从数据空间中去除,降低了数据空间的复杂度,基于目标运动数据生成的目标先验空间,能够提高生成的目标先验空间的合理性以及准确性,进而,利用目标先验空间进行运动合理性、和准确性的约束,能够降低神经网络建模运动数据的难度。Based on the above research, the present disclosure provides a method, device, computer equipment, storage medium, computer program, and computer program product for generating a priori space. By performing encoding processing for removing the global orientation on the three-dimensional motion data corresponding to each type of motion, Generating target motion data that can represent the gesture characteristics of motion can remove the global orientation information from the data space, reducing the complexity of the data space, and the target prior space generated based on the target motion data can improve the generated target prior space. The rationality and accuracy of the model, and then, using the target prior space to constrain the rationality and accuracy of the motion can reduce the difficulty of neural network modeling motion data.
针对以上方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案,都应该是发明人在本公开过程中对本公开做出的贡献。The defects in the above solutions are all the results obtained by the inventor after practice and careful research. Therefore, the discovery process of the above problems and the solutions proposed by the present disclosure below for the above problems should be the result of the inventor Contributions made to this disclosure during the course of this disclosure.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.
需要说明的是,本公开实施例中所提到的特定名词包括:It should be noted that the specific nouns mentioned in the embodiments of the present disclosure include:
Yaw:表示以三维运动数据中首帧姿态数据对应的姿态从底部到顶部的方向为y轴,首帧姿态数据对应的姿态从左边到右边为x轴,首帧姿态数据对应的姿态从前面到后面为z轴建立的三维坐标系中,在y轴上旋转的角度;Yaw: Indicates that the direction from the bottom to the top of the posture corresponding to the first frame of posture data in the 3D motion data is the y-axis, the posture corresponding to the first frame of posture data is from the left to the right is the x-axis, and the posture corresponding to the first frame of posture data is from the front to the top. The angle of rotation on the y-axis in the three-dimensional coordinate system established for the z-axis later;
DCT:Discrete Cosine Transform,离散余弦变换,将数据从空间域变换至频域,能够实现数据或图像的压缩。DCT: Discrete Cosine Transform, discrete cosine transform, transforms data from the spatial domain to the frequency domain, enabling data or image compression.
为便于对本实施例进行理解,首先对本公开实施例所公开的一种先验空间的生成方法进行详细介绍,本公开实施例所提供的先验空间的生成方法的执行主体一般为具有一定计算能力的计算机设备,该计算机设备例如包括:终端设备或服务器或其它处理设备,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实现方式中,该先验空间的生成方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。In order to facilitate the understanding of this embodiment, a method for generating a priori space disclosed in the embodiment of the present disclosure is firstly introduced in detail. The execution subject of the method for generating a priori space provided by the embodiment of the present disclosure generally has certain computing power computer equipment, the computer equipment includes, for example: terminal equipment or server or other processing equipment, the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital assistant ( Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc. In some possible implementation manners, the method for generating the prior space may be implemented by a processor invoking computer-readable instructions stored in a memory.
下面对本公开实施例提供的先验空间的生成方法加以详细说明。The method for generating a priori space provided by the embodiments of the present disclosure will be described in detail below.
如图1所示,为本公开实施例提供的一种先验空间的生成方法的流程图,可以包括以下步骤:As shown in FIG. 1, a flow chart of a method for generating a priori space provided by an embodiment of the present disclosure may include the following steps:
S101:获取目标对象的至少两种运动分别对应的三维运动数据;三维运动数据包括:对应运动的至少两个姿态分别对应的姿态数据。S101: Acquire three-dimensional motion data respectively corresponding to at least two kinds of motions of a target object; the three-dimensional motion data includes: gesture data respectively corresponding to at least two postures corresponding to the motions.
这里,目标对象可以包括目标人物、目标动物等能够运动的对象。三维运动数据可以为目标对象执行某一运动时,其产生的各个姿态分别对应的姿态数据,姿态数据能够表征目标对象的姿态,可以包括姿态、姿态对应的朝向信息,其中,姿态对应的朝向信息是受到目标对象的全局朝向影响的朝向信息。Here, the target object may include a moving object such as a target person or a target animal. The three-dimensional motion data can be the posture data corresponding to each posture generated when the target object performs a certain movement. The posture data can represent the posture of the target object, and can include posture and orientation information corresponding to the posture, wherein the orientation information corresponding to the posture is the orientation information affected by the global orientation of the target object.
在获取目标对象对应的三维运动数据时,例如可以采集目标对象运动时的多帧图像,对多帧图像分别进行人体三维姿态的恢复,得到每帧图像对应的一帧姿态数据,由多帧图像分别对应的姿态数据,构成了目标对象的一组三维运动数据。而在对多帧图像分别进行人体三维姿态的恢复时,各帧姿态会受到人体运动时的朝向的影响,其中,人体运动时的朝向即为全局朝向。When obtaining the 3D motion data corresponding to the target object, for example, it is possible to collect multiple frames of images when the target object is moving, and recover the 3D posture of the human body on the multiple frames of images to obtain a frame of posture data corresponding to each frame of image. The corresponding posture data constitute a set of three-dimensional motion data of the target object. When recovering the three-dimensional pose of the human body on multiple frames of images, the poses of each frame will be affected by the orientation of the human body during motion, wherein the orientation of the human body during motion is the global orientation.
本公开每个运动对应的三维运动数据中包括预设数量帧的姿态数据,例如,128帧姿态数据。在实施时,三维运动数据可以为128帧*72维的运动数据。在72维的数据中,例如包括3维的姿态、以及人体的23个关键点分别对应的三维位置信息。The 3D motion data corresponding to each motion in the present disclosure includes a preset number of frames of pose data, for example, 128 frames of pose data. During implementation, the 3D motion data may be 128 frames*72 dimensional motion data. The 72-dimensional data includes, for example, a 3-dimensional pose and 3-dimensional position information corresponding to 23 key points of the human body.
在一些实施例中,可以在原始先验空间中获取目标对象的多种运动分别对应的三维运动数据。其中,原始先验空间中可以包括目标对象对应的至少一种运动对应的三维运动数据,每种运动对应的三维运动数据,例如包括多组。根据图像处理任务的不同,目标对象不同,对应的目标对象的动作也不同;例如目标对象可以是“人”,对应的多种动作包括:跑、跳、走路、抬腿、转身等多种。In some embodiments, three-dimensional motion data corresponding to various motions of the target object may be acquired in the original prior space. Wherein, the original prior space may include three-dimensional motion data corresponding to at least one type of motion corresponding to the target object, and the three-dimensional motion data corresponding to each type of motion includes, for example, multiple groups. According to different image processing tasks, the target object is different, and the corresponding target object's actions are also different; for example, the target object can be a "person", and the corresponding actions include: running, jumping, walking, raising legs, turning around, etc.
实施时,可以基于传感器采集的目标对象在运动时产生的各个姿态,确定各个姿态对应的姿态数据,将确定的姿态数据作为目标对象对应的三维运动数据,基于获取的目标对象在执行多种运动时分别对应的各个姿态,确定每种运动对应的姿态数据。During implementation, the posture data corresponding to each posture can be determined based on the various postures of the target object collected by the sensor when it is moving, and the determined posture data can be used as the three-dimensional motion data corresponding to the target object. Each posture corresponding to each time is determined, and the posture data corresponding to each movement is determined.
另外,在目标对象对应的三维运动数据为从原始先验空间中获取的情况下,还可以按照以下方式获取原始先验空间:In addition, in the case where the 3D motion data corresponding to the target object is obtained from the original prior space, the original prior space can also be obtained in the following manner:
在基于利用传感器采集到的目标对象在运动时产生的各个姿态得到三维运动数据;The three-dimensional motion data is obtained based on the various postures of the target object collected by the sensor when it is moving;
在得到每种运动对应的三维运动数据之后,可以基于目标对象的多种运动分别对应的三维运动数据,构成原始先验空间。After obtaining the three-dimensional motion data corresponding to each type of motion, the original prior space can be formed based on the three-dimensional motion data corresponding to the various motions of the target object.
或者,也可以利用现有的三维运动数据集,从中选取目标对象的多种运动分别对应的三维运动数据,以得到原始先验空间;再或者,原始先验空间可以为预先生成并存储至预设存储空间,在生成目标先验空间时,直接从该预设存储空间读取原始先验空间。Or, the existing 3D motion data set can also be used to select the 3D motion data corresponding to various motions of the target object to obtain the original prior space; or, the original prior space can be pre-generated and stored in the A storage space is provided, and when the target prior space is generated, the original prior space is directly read from the preset storage space.
S102:对每种运动对应的三维运动数据进行去除全局朝向的编码处理,得到所述每种运动对应的目标运动数据。S102: Perform encoding processing for removing the global orientation on the three-dimensional motion data corresponding to each type of motion, to obtain target motion data corresponding to each type of motion.
在该实施例中,目标运动数据可以为目标对象的每种运动对应的不包含全局朝向的运动数据,例如,每个目标运动数据中的首帧姿态对应的朝向均为预先规定的目标方向。全局朝向能够表征三维运动数据对应的运动的朝向。In this embodiment, the target motion data may be the motion data corresponding to each type of motion of the target object and does not include the global orientation. For example, the orientation corresponding to the pose of the first frame in each target motion data is a predetermined target direction. The global orientation can represent the orientation of the motion corresponding to the three-dimensional motion data.
在对目标对象的运动进行建模时,需要使运动中各个姿态具有合理性、以及连贯性;而对于包含了三维运动数据的数据空间,并非每一个空间点对应的三维运动数据都是合理的,而合理的三维运动数据在该数据空间中分布较为稀疏;这导致了基于原始先验空间进行运动建模得到的三维运动数据存在不合理、不连贯等问题。本公开通过对三维运动数据进行编码处理,能够将三维运动数据进一步的压缩到更小的数据空间中,使得原本在数据空间内分布稀疏的三维运动数据在压缩后形成的数据空间中的分布更加的密集,进而能够对运动建模进行更好的监督,减少运动建模得到的三维运动数据不合理、不连贯等问题。When modeling the motion of the target object, it is necessary to make each posture in the motion reasonable and coherent; and for the data space containing 3D motion data, not every 3D motion data corresponding to each spatial point is reasonable , and reasonable 3D motion data are sparsely distributed in this data space; this leads to problems such as irrationality and incoherence in the 3D motion data obtained by motion modeling based on the original prior space. The present disclosure can further compress the 3D motion data into a smaller data space by encoding the 3D motion data, so that the distribution of the 3D motion data that is originally sparsely distributed in the data space in the compressed data space is more compact. Intensive, and thus better supervision of motion modeling, reducing the unreasonable and incoherent 3D motion data obtained by motion modeling.
在实施时,可以在获取原始先验空间之后,针对原始先验空间中每种运动对应的三维运动数据,对该三维运动数据进行去除全局朝向的编码处理,也即,将该三维运动数据中的首帧姿态对应的朝向调整为目标方向,其中,目标方向可以为首帧姿态对应的yaw为0度。During implementation, after the original priori space is obtained, for the 3D motion data corresponding to each type of motion in the original priori space, the 3D motion data can be encoded to remove the global orientation, that is, the 3D motion data in the The orientation corresponding to the first frame posture of is adjusted to the target direction, wherein, the target direction may be that the yaw corresponding to the first frame posture is 0 degrees.
然后,再对该三维运动数据中的除首帧姿态以外的每帧姿态,对除首帧姿态以外的每帧姿态进行方向的调整,然后基于调整后的各个姿态进行编码处理,得到该三维运动数据对应的目标运动数据;其中,目标运动数据中各个姿态之间的相对角度和方向不变。基于上述步骤,可以确定出每种运动对应的目标运动数据。Then, adjust the direction of each frame of posture except the first frame posture in the three-dimensional motion data, and then perform encoding processing based on the adjusted postures to obtain the three-dimensional motion The target motion data corresponding to the data; wherein, the relative angle and direction between the postures in the target motion data remain unchanged. Based on the above steps, target motion data corresponding to each type of motion can be determined.
本公开通过对每种运动对应的三维运动数据进行去除全局朝向的编码处理,生成能够表征运动的姿态特征的目标运动数据,能够将全局朝向信息从数据空间中去除,降低了数据空间的复杂度,基于目标运动数据生成的目标先验空间,能够提高生成的目标先验空间的合理性以及准确性,进而,利用目标先验空间进行运动合理性、和准确性的约束,能够降低神经网络建模运动数据的难度。The disclosure generates target motion data that can represent the posture characteristics of the motion by performing encoding processing for removing the global orientation on the three-dimensional motion data corresponding to each type of motion, and can remove the global orientation information from the data space, reducing the complexity of the data space , the target prior space generated based on the target motion data can improve the rationality and accuracy of the generated target prior space, and then use the target prior space to constrain the rationality and accuracy of the motion, which can reduce the neural network construction. The difficulty of modulating motion data.
S103:基于所述至少两种运动分别对应的目标运动数据,生成目标先验空间。S103: Based on the target motion data respectively corresponding to the at least two types of motion, generate a target prior space.
在本公开实施例中,可以直接基于多种运动分别对应的目标运动数据,作为目标先验空间中的多组先验数据,构成目标先验空间。其中,目标先验空间中的目标运动数据相对于原始先验空间中的数据,在数据空间上进行了进一步的压缩,减少了全局朝向对数据的影响,因此数据空间更加简单、紧凑,更利于运动建模。In the embodiments of the present disclosure, the target prior space may be formed directly based on the target motion data corresponding to various sports as multiple sets of prior data in the target prior space. Among them, the target motion data in the target prior space is further compressed in the data space compared with the data in the original prior space, which reduces the influence of the global orientation on the data, so the data space is simpler and more compact, which is more conducive to motion modeling.
在得到目标先验空间后,可以在神经网络建模运动数据时,利用目标先验空间进行运动建模的合理性约束,以实现运动建模。After the target prior space is obtained, when the neural network models the motion data, the target prior space can be used to constrain the rationality of motion modeling to realize motion modeling.
本公开基于目标运动数据生成目标先验空间,能够提高生成的目标先验空间的合理性,进而,利用目标先验空间进行运动合理性的约束,能够降低神经网络建模运动数据的难度,从而提高神经网络重建的运动数据的合理性和准确性。The disclosure generates a target prior space based on target motion data, which can improve the rationality of the generated target prior space, and further, uses the target prior space to constrain the rationality of motion, which can reduce the difficulty of neural network modeling motion data, thereby Improve the rationality and accuracy of motion data reconstructed by neural networks.
在一些实施例中,生成目标先验空间之后,还可以基于目标先验空间,识别目标对象的运动类型。In some embodiments, after the target prior space is generated, the motion type of the target object may also be identified based on the target prior space.
在本公开实施例中,针对识别目标对象的运动类型的任务,可以在获取到目标对象对应的运动视频之后,基于运动视频中目标对象对应的各个姿态的姿态数据,从目标先验空间中包括的各个目标运动数据对应的姿态数据中,确定与该运动视频中目标对象对应的姿态数据相匹配的姿态数据,将该相匹配的姿态数据对应的运动类型作为运动视频中目标对象的运动类型。In the embodiment of the present disclosure, for the task of identifying the motion type of the target object, after obtaining the motion video corresponding to the target object, based on the pose data of each pose corresponding to the target object in the motion video, the target prior space includes Among the posture data corresponding to each target motion data, determine the posture data that matches the posture data corresponding to the target object in the motion video, and use the motion type corresponding to the matched posture data as the motion type of the target object in the motion video.
实施时,可以按照以下步骤识别目标对象的运动类型:During implementation, the motion type of the target object can be identified as follows:
步骤一、获取目标对象在运动时的运动视频。Step 1: Obtain a motion video of the target object when it is in motion.
这里,可以利用目标采集设备,如相机,采集目标对象在运动时的运动视频。Here, a target acquisition device, such as a camera, may be used to capture a motion video of the target object when it is in motion.
步骤二、对运动视频进行特征提取,得到运动特征数据。Step 2, performing feature extraction on the motion video to obtain motion feature data.
这里,运动特征数据能够表征目标对象在运动时的各个姿态对应的姿态数据,各个姿态不包含朝向信息。Here, the motion feature data can represent posture data corresponding to each posture of the target object when it is moving, and each posture does not include orientation information.
实施时,对运动视频对应的每帧图像进行特征提取,确定每帧图像中目标对象的姿态特征,并将该姿态特征作为该帧图像对应的运动特征数据,从而,可以得到每帧图像对应的运动特征数据。During implementation, feature extraction is performed on each frame of image corresponding to the motion video, the posture feature of the target object in each frame of image is determined, and the posture feature is used as the motion feature data corresponding to the frame of image, thereby, the corresponding motion feature of each frame of image can be obtained. Motion characteristic data.
步骤三、基于运动特征数据,从目标先验空间中确定与运动特征数据匹配的目标运 动数据。Step 3. Based on the motion feature data, determine the target motion data matching the motion feature data from the target prior space.
本步骤中,可以将运动特征数据中每帧姿态对应的姿态特征和目标先验空间中每个目标运动数据对应的各个姿态的姿态数据进行一致性匹配,将所包括的各个姿态的姿态数据分别与运动特征数据对应的各个姿态特征相匹配的目标运动数据作为与运动特征数据匹配的目标运动数据。In this step, the posture features corresponding to each frame posture in the motion feature data and the posture data of each posture corresponding to each target motion data in the target prior space can be consistent matched, and the posture data of each posture included are respectively The target motion data matched with each gesture feature corresponding to the motion feature data is used as the target motion data matched with the motion feature data.
步骤四、基于与运动特征数据匹配的目标运动数据对应的运动类型,确定目标对象的运动类型。Step 4: Determine the motion type of the target object based on the motion type corresponding to the target motion data matched with the motion feature data.
这里,可以将与运动特征数据匹配的目标运动数据对应的运动类型,作为该运动视频中目标对象的运动类型。Here, the motion type corresponding to the target motion data matched with the motion feature data may be used as the motion type of the target object in the motion video.
本公开利用提取的能够表征目标对象在运动时的各个姿态对应的运动特征数据,与目标先验空间中的目标运动数据进行匹配,能够确定与该运动特征数据相匹配的目标运动数据,进而,基于与该运动特征数据相匹配的目标运动数据对应的运动类型,能够准确地确定目标对象的运动类型。In the present disclosure, the extracted motion feature data that can represent the corresponding postures of the target object during motion is matched with the target motion data in the target prior space, and the target motion data that matches the motion feature data can be determined. Further, Based on the motion type corresponding to the target motion data matched with the motion feature data, the motion type of the target object can be accurately determined.
在一种实施方式中,针对S102,针对每种运动分别对应的三维运动数据,可以利用预先训练的目标编码神经网络,对该种运动对应的三维运动数据进行去除全局朝向的编码处理,得到该种运动对应的目标运动数据。In one embodiment, for S102, for the three-dimensional motion data corresponding to each type of motion, the pre-trained target encoding neural network can be used to perform encoding processing for removing the global orientation on the three-dimensional motion data corresponding to this type of motion, to obtain the The target motion data corresponding to this type of motion.
目标编码神经网络为预先训练好的编码网络,能够对输入的三维运动数据进行去除全局朝向的编码处理,以得到更为准确的目标运动数据。The target encoding neural network is a pre-trained encoding network, which can encode the input 3D motion data to remove the global orientation, so as to obtain more accurate target motion data.
在一些实施例中,针对S102,例如可以采用以下步骤实施:In some embodiments, for S102, for example, the following steps may be used to implement:
S102-1:确定三维运动数据在频域对应的第一频域数据。S102-1: Determine first frequency domain data corresponding to the three-dimensional motion data in the frequency domain.
这里,三维运动数据对应的各个姿态数据均为空间域上的数据,第一频域数据用于表征各个姿态数据在第一数量个频域分量上的融合系数,其中,第一数量为预先设定的,频域分量用于表征姿态数据对应的信息量。第一频域数据的对应的数据维度可以为预设的,例如,第一频域数据的对应的数据维度可以为第一数量m。以三维运动数据中包括128帧姿态数据,第一数量为m为例,第一频域数据可以为128帧姿态数据在m个频域分量上的m维的融合系数。其中,m为正整数。Here, each attitude data corresponding to the three-dimensional motion data is data in the spatial domain, and the first frequency domain data is used to represent the fusion coefficients of each attitude data on the first number of frequency domain components, wherein the first number is a preset Certainly, the frequency domain component is used to represent the amount of information corresponding to the attitude data. The corresponding data dimension of the first frequency domain data may be preset, for example, the corresponding data dimension of the first frequency domain data may be a first number m. Taking the 3D motion data including 128 frames of attitude data as an example, and the first number is m, the first frequency domain data may be m-dimensional fusion coefficients of the 128 frames of attitude data on m frequency domain components. Among them, m is a positive integer.
本步骤中,针对每种运动分别对应的三维运动数据,可以利用DCT对该三维运动数据进行变换,得到三维运动数据在频域对应的第一频域数据。实施时,可以基于该三维运动数据,确定目标对象的各个关键点对应的整体位置变化,然后,利用DCT将能够表征目标对象的各个关键点对应的整体位置变化的三维运动数据转换为第一频域数据。In this step, for the three-dimensional motion data corresponding to each type of motion, DCT may be used to transform the three-dimensional motion data to obtain first frequency-domain data corresponding to the three-dimensional motion data in the frequency domain. During implementation, the overall position change corresponding to each key point of the target object can be determined based on the three-dimensional motion data, and then, DCT is used to convert the three-dimensional motion data that can represent the overall position change corresponding to each key point of the target object into a first frequency domain data.
或者,也可以利用傅里叶变换对该三维运动数据进行变换,从而得到三维运动数据在频域对应的第一频域数据。Alternatively, Fourier transform may also be used to transform the three-dimensional motion data, so as to obtain first frequency domain data corresponding to the three-dimensional motion data in the frequency domain.
S102-2:将三维运动数据分为至少两组三维运动子数据,并确定至少两组三维运动子数据在频域分别对应的第二频域数据。S102-2: Divide the three-dimensional motion data into at least two groups of three-dimensional motion sub-data, and determine second frequency-domain data respectively corresponding to the at least two groups of three-dimensional motion sub-data in the frequency domain.
这里,第二频域数据用于表征每组三维运动子数据对应的各个姿态数据在第二数量 个频域分量上的值,第二数量也为预先设定的。第二频域数据对应的数据维度可以为预设的,例如,第二频域数据对应的数据维度可以为n。Here, the second frequency domain data is used to characterize the values of each attitude data corresponding to each group of three-dimensional motion sub-data on the second number of frequency domain components, and the second number is also preset. The data dimension corresponding to the second frequency domain data may be preset, for example, the data dimension corresponding to the second frequency domain data may be n.
实施时,可以根据第二数量,按照三维运动数据中的每帧姿态数据对应的姿态在运动中的顺序,将三维运动数据中的多帧姿态数据分为第二数量个组,并将每组姿态数据作为一组三维运动子数据,从而,得到多组三维运动子数据。也即,将三维运动数据对应的运动分为多段,一个分段的运动对应于一组三维运动子数据。During implementation, the multi-frame attitude data in the three-dimensional motion data can be divided into a second number of groups according to the order of the poses corresponding to each frame of pose data in the three-dimensional motion data according to the second number, and each group The posture data is used as a set of three-dimensional motion sub-data, thus, multiple sets of three-dimensional motion sub-data are obtained. That is, the motion corresponding to the 3D motion data is divided into multiple segments, and the motion of one segment corresponds to a group of 3D motion sub-data.
然后,针对每组三维运动子数据,可以基于该组三维运动子数据,确定目标对象的各个关键点在该组三维运动子数据对应的分段的运动中的位置变化。进而,可以利用DCT或傅里叶变换,将能够表征目标对象的各个关键点在该组三维运动子数据对应的分段的运动中的位置变化的三维运动数据,转换为第二频域数据,得到该组三维运动子数据对应的第二频域数据。Then, for each set of three-dimensional motion sub-data, based on the set of three-dimensional motion sub-data, position changes of each key point of the target object in the segmented motion corresponding to the set of three-dimensional motion sub-data may be determined. Furthermore, DCT or Fourier transform can be used to convert the three-dimensional motion data capable of characterizing the position changes of each key point of the target object in the segmented motion corresponding to the group of three-dimensional motion sub-data into the second frequency domain data, The second frequency domain data corresponding to the group of three-dimensional motion sub-data is obtained.
示例性的,在三维运动数据包括128帧姿态数据,第二数量为S的情况下,可以先将128帧姿态数据划分至S组,得到S组三维运动子数据,每组三维运动子数据中包括128/S帧姿态数据。然后,可以利用DCT,对每组三维运动子数据进行转换,得到每组三维运动子数据对应的n维第二频域数据。进而,基于对每组三维运动子数据的转换,可以得到S个n维的第二频域数据。Exemplarily, when the 3D motion data includes 128 frames of posture data, and the second number is S, the 128 frames of posture data can be divided into S groups first to obtain S groups of 3D motion sub-data, and each group of 3D motion sub-data Including 128/S frame attitude data. Then, DCT may be used to convert each set of three-dimensional motion sub-data to obtain n-dimensional second frequency-domain data corresponding to each set of three-dimensional motion sub-data. Furthermore, based on the conversion of each group of three-dimensional motion sub-data, S pieces of n-dimensional second frequency domain data can be obtained.
S102-3:基于第一频域数据、第二频域数据,对三维运动数据进行去除全局朝向的压缩处理,得到目标运动数据。S102-3: Based on the first frequency domain data and the second frequency domain data, perform compression processing for removing the global orientation on the three-dimensional motion data to obtain target motion data.
这里,可以基于各组三维运动子数据对应的第二频域数据,对各组三维运动子数据对应的三维运动数据进行压缩处理,再将压缩处理后的运动数据和第一频域数据进行融合,从而完成对三维运动数据进行去除全局朝向的压缩处理,得到目标运动数据。Here, based on the second frequency domain data corresponding to each group of 3D motion sub-data, the 3D motion data corresponding to each group of 3D motion sub-data can be compressed, and then the compressed motion data and the first frequency domain data can be fused , so as to complete the compression process of removing the global orientation on the three-dimensional motion data, and obtain the target motion data.
另外,针对S102-1~S102-3,可以是利用目标编码神经网络执行的,在得到三维运动数据之后,可以将三维运动数据输入至目标编码神经网络,之后,目标编码神经网络中的DCT转换模块,可以输出三维运动数据对应的第一频域数据,以及每组三维运动子数据对应的第二频域数据,之后,再利用第一频域数据、第二频域数据,对三维运动数据进行去除全局朝向的压缩处理,输出目标运动数据。In addition, for S102-1 to S102-3, it can be performed by using the target coding neural network. After obtaining the 3D motion data, the 3D motion data can be input into the target coding neural network, and then the DCT conversion in the target coding neural network The module can output the first frequency domain data corresponding to the three-dimensional motion data, and the second frequency domain data corresponding to each group of three-dimensional motion sub-data, and then use the first frequency domain data and the second frequency domain data to analyze the three-dimensional motion data Perform compression processing to remove the global orientation, and output target motion data.
这样,频域数据能够表征三维运动数据中目标对象的各个关键点在频域上的变化信息,目标对象的各个关键点能够准确反映目标对象的姿态,通过将三维运动数据转换至频域,可以得到能够反映目标对象的各个关键点在运动时的整体变化信息、以及反映目标对象的各个关键点在运动时的整体信息量的第一频域数据;通过将三维运动数据分为多组三维运动子数据,能够实现对目标对象的整体运动的分段处理,实现对目标对象的整体运动的细化分析;通过将多组三维运动子数据分别转换至频域,可以得到能够反映目标对象的各个关键点在每一段运动的变化信息、以及各个关键点在每一段运动对应的信息量的第二频域数据;进而,基于第一频域数据和第二频域数据,有利于实现对三维运动数据的全局朝向的准确去除,得到准确合理的目标运动数据。In this way, the frequency domain data can represent the change information of each key point of the target object in the three-dimensional motion data in the frequency domain, and each key point of the target object can accurately reflect the posture of the target object. By converting the three-dimensional motion data to the frequency domain, it can Obtain the first frequency domain data that can reflect the overall change information of each key point of the target object during motion and the overall information amount of each key point of the target object during motion; by dividing the three-dimensional motion data into multiple groups of three-dimensional motion The sub-data can realize the segmentation processing of the overall motion of the target object, and realize the detailed analysis of the overall motion of the target object; The change information of key points in each segment of motion, and the second frequency domain data of the amount of information corresponding to each key point in each segment of motion; furthermore, based on the first frequency domain data and the second frequency domain data, it is beneficial to realize the three-dimensional motion Accurately remove the global orientation of the data to obtain accurate and reasonable target motion data.
在一些实施例中,针对S102-3:可以按照以下步骤实施:In some embodiments, for S102-3: it can be implemented according to the following steps:
S102-3-1:基于第一频域数据,得到三维运动数据的频域特征数据。S102-3-1: Based on the first frequency domain data, obtain frequency domain feature data of the three-dimensional motion data.
实施时,在得到第一频域数据之后,可以利用目标编码神经网络中的全连接层,对第一频域数据进行映射处理,从而得到三维运动数据对应的频域特征数据。例如,可以将m维的第一频域数据转换为512维的频域特征数据。During implementation, after the first frequency domain data is obtained, the fully connected layer in the target coding neural network can be used to perform mapping processing on the first frequency domain data, so as to obtain the frequency domain feature data corresponding to the three-dimensional motion data. For example, the m-dimensional first frequency-domain data may be converted into 512-dimensional frequency-domain feature data.
S102-3-2:基于第二频域数据,确定至少两组三维运动子数据分别对应的权重;基于至少两组三维运动子数据分别对应的权重,对三维运动数据进行加权处理,得到第一三维运动数据。S102-3-2: Based on the second frequency domain data, determine weights corresponding to at least two sets of three-dimensional motion sub-data; based on weights corresponding to at least two sets of three-dimensional motion sub-data, perform weighting processing on the three-dimensional motion data to obtain the first 3D motion data.
在一些实施例中,基于第二频域数据,确定多组三维运动子数据分别对应的权重时,可以先对多组三维运动子数据分别对应的第二频域数据进行融合处理,得到融合频域数据,其中,融合频域数据对应的维度与三维运动子数据的组数相同。延续上述将128帧姿态数据划分至S组,得到S组三维运动子数据的例子,在得到S个n维的第二频域数据之后,可以将S个n维的第二频域数据融合为S*n的融合频域数据。本公开针对S组三维运动子数据中的每组三维运动子数据对应的第二频域数据(n维的第二频域数据),可以先基于该第二频域数据,确定该第二频域数据对应的权重向量,然后,可以将确定的每个第二频域数据对应的权重向量进行融合,得到S*n的融合频域数据。In some embodiments, when determining weights corresponding to multiple sets of three-dimensional motion sub-data based on the second frequency-domain data, the second frequency-domain data corresponding to multiple sets of three-dimensional motion sub-data can be fused first to obtain the fusion frequency domain data, wherein the dimension corresponding to the fused frequency domain data is the same as the number of groups of the three-dimensional motion sub-data. Continuing the above example of dividing 128 frames of attitude data into S groups to obtain S groups of three-dimensional motion sub-data, after obtaining S pieces of n-dimensional second frequency domain data, the S pieces of n-dimensional second frequency domain data can be fused into Fused frequency domain data for S*n. In this disclosure, for the second frequency domain data (n-dimensional second frequency domain data) corresponding to each group of three-dimensional motion sub-data in S groups of three-dimensional motion sub-data, the second frequency domain data can be firstly determined based on the second frequency domain data. domain data, and then, the determined weight vectors corresponding to each second frequency domain data may be fused to obtain S*n fused frequency domain data.
然后,可以对融合频域数据进行归一化处理,得到多组三维运动子数据分别对应的权重。例如,可以利用softmax函数,对S*n的融合频域数据进行归一化处理,输出S个归一化处理后的权重向量。其中,每个归一化处理后的权重向量对应于一组三维运动子数据中的各个姿态数据。Then, normalization processing can be performed on the fused frequency domain data to obtain weights corresponding to multiple sets of three-dimensional motion sub-data. For example, the softmax function may be used to normalize the S*n fused frequency domain data, and output S normalized weight vectors. Wherein, each normalized weight vector corresponds to each pose data in a set of three-dimensional motion sub-data.
进而,可以将每组三维运动子数据对应的权重向量作为该组三维运动子数据对应的权重。本公开在确定多组三维运动子数据分别对应的权重后,可以基于每组三维运动子数据对应的权重,对该组三维运动子数据对应的各帧姿态数据进行加权处理,得到该组三维运动子数据对应的加权后的三维运动数据。之后,可以将每组三维运动子数据对应的加权后的三维运动数据,作为第一三维运动数据。Furthermore, the weight vector corresponding to each group of three-dimensional motion sub-data may be used as the weight corresponding to this group of three-dimensional motion sub-data. In the present disclosure, after determining the respective weights corresponding to multiple sets of three-dimensional motion sub-data, weighting can be performed on each frame of attitude data corresponding to the set of three-dimensional motion sub-data based on the weight corresponding to each set of three-dimensional motion sub-data to obtain the set of three-dimensional motion The weighted 3D motion data corresponding to the sub-data. Afterwards, the weighted 3D motion data corresponding to each group of 3D motion sub-data may be used as the first 3D motion data.
S102-3-3:对第一三维运动数据进行至少两种尺度下的特征提取,得到至少两种尺度分别对应的第二三维运动数据。S102-3-3: Perform feature extraction in at least two scales on the first three-dimensional motion data to obtain second three-dimensional motion data respectively corresponding to the at least two scales.
这里,一种尺度可以对应于一个目标编码神经网络中的一个卷积层和至少一个全连接层。在得到第一三维运动数据之后,可以利用目标编码神经网络中部署的多种尺度分别对应的卷积层和全连接层,依次对第一三维运动数据进行多种尺度下的特征提取,得到多种尺度分别对应的第二三维运动数据。Here, a scale can correspond to one convolutional layer and at least one fully connected layer in a target encoding neural network. After obtaining the first 3D motion data, the convolutional layer and the fully connected layer corresponding to the multiple scales deployed in the target encoding neural network can be used to sequentially extract the features of the first 3D motion data at multiple scales, and obtain multiple The second three-dimensional motion data corresponding to the scales respectively.
在一些实施例中,针对S102-3-3,针对至少两种尺度中的每种尺度,可以利用该尺度对应的卷积层,对尺度对应的输入三维运动数据进行卷积处理,并对卷积处理的结果进行全连接映射处理,得到该种尺度对应的第二三维运动数据。其中,输入三维运动数据包括:至少两种尺度中前一种尺度对应的第二三维运动数据、或者第一三维运动数据。如图2所示,为本公开实施例提供的一种用于确定一种尺度对应的第二三维运动数据的网络结构的结构示意图,其中,尺度对应于一个卷积层L101和两个全连接层L102和 L103。图2中,卷积层L101对输入三维运动数据进行卷积处理,全连接层L102和L103对卷积处理后的结果进行全连接映射处理,得到一种尺度对应的第二三维运动数据。In some embodiments, for S102-3-3, for each of the at least two scales, the convolution layer corresponding to the scale can be used to perform convolution processing on the input three-dimensional motion data corresponding to the scale, and the convolution The result of the product processing is subjected to full-connection mapping processing to obtain the second three-dimensional motion data corresponding to this scale. Wherein, the input 3D motion data includes: the second 3D motion data corresponding to the former scale of at least two scales, or the first 3D motion data. As shown in FIG. 2 , it is a schematic structural diagram of a network structure for determining second 3D motion data corresponding to a scale provided by an embodiment of the present disclosure, where the scale corresponds to a convolutional layer L101 and two full connections Layers L102 and L103. In FIG. 2 , the convolutional layer L101 performs convolution processing on the input 3D motion data, and the fully connected layers L102 and L103 perform fully connected mapping processing on the result of the convolution processing to obtain the second 3D motion data corresponding to a scale.
如图3所示,为本公开实施例提供的一种用于对第一三维运动数据进行多种尺度下的特征提取,得到多种尺度分别对应的第二三维运动数据的网络结构的结构示意图,图3中包括3种尺度分别对应的3个提取模块,包括第一提取模块L201、第二提取模块L202以及第三提取模块L203;其中,每个提取模块均包括如图2中的一个卷积层L101和两个全连接层L102和L103。图3中,第一提取模块L201、第二提取模块L202以及第三提取模块L203依次对第一三维运动数据进行不同尺度下的特征提取,得到输出的第一种尺度对应的第二三维运动数据、输出的第二种尺度对应的第二三维运动数据以及输出的第三种尺度对应的第二三维运动数据。As shown in FIG. 3 , it is a schematic structural diagram of a network structure for performing feature extraction on the first 3D motion data at multiple scales to obtain second 3D motion data corresponding to multiple scales according to an embodiment of the present disclosure. , Fig. 3 includes three extraction modules corresponding to three kinds of scales, including the first extraction module L201, the second extraction module L202 and the third extraction module L203; wherein, each extraction module includes a volume as shown in Fig. 2 Laminate layer L101 and two fully connected layers L102 and L103. In Fig. 3, the first extraction module L201, the second extraction module L202, and the third extraction module L203 sequentially perform feature extraction on the first three-dimensional motion data at different scales, and obtain the second output corresponding to the first scale of three-dimensional motion data , the output second three-dimensional motion data corresponding to the second scale and the output second three-dimensional motion data corresponding to the third scale.
在实施时,提取模块可以为残差模块(Residual Block)。另外,提取模块的数量可以根据需要进行设置,此处不进行限定,本公开实施例中以提取模块的数量为3为例进行说明。During implementation, the extraction module may be a residual block (Residual Block). In addition, the number of extraction modules can be set according to needs, and is not limited here. In the embodiment of the present disclosure, the number of extraction modules is 3 as an example for illustration.
实施时,如图3所示,针对第一种尺度,其输入第一种尺度对应的卷积层的三维运动数据为第一三维运动数据,利用卷积层对第一三维运动数据进行卷积处理,得到卷积处理结果;之后,将卷积处理结果输入至第一个全连接层,利用第一个全连接层对卷积处理结果进行第一次的全连接映射处理,得到第一映射处理结果;再将第一映射处理结果输入至第二个全连接层,利用第二个全连接层对第一映射处理结果进行进一步的全连接映射处理,得到第二映射处理结果;最后,将卷积处理结果和第二映射处理结果进行融合,得到第一种尺度对应的第二三维运动数据。During implementation, as shown in Figure 3, for the first scale, the 3D motion data input to the convolution layer corresponding to the first scale is the first 3D motion data, and the convolution layer is used to convolve the first 3D motion data processing to obtain the convolution processing result; after that, input the convolution processing result to the first fully connected layer, and use the first fully connected layer to perform the first fully connected mapping processing on the convolution processing result to obtain the first mapping Processing results; then input the first mapping processing results to the second fully connected layer, and use the second fully connected layer to perform further fully connected mapping processing on the first mapping processing results to obtain the second mapping processing results; finally, the The convolution processing result is fused with the second mapping processing result to obtain second 3D motion data corresponding to the first scale.
针对除第一种尺度外的每个尺度,可以将前一尺度输出的第二三维运动数据作为该尺度对应的输入,利用该尺度对应的卷积层对前一尺度输出的第二三维运动数据进行卷积处理,得到该尺度对应的卷积处理结果;之后,将该尺度对应的卷积处理结果输入至第一个全连接层,利用第一个全连接层对卷积处理结果进行第一次的全连接映射处理,得到该尺度对应的第一映射处理结果;再将该尺度对应的第一映射处理结果输入至第二个全连接层,利用第二个全连接层对该尺度对应的第一映射处理结果进行进一步的全连接映射处理,得到该尺度对应的第二映射处理结果;最后,将该尺度对应的卷积处理结果和该尺度对应的第二映射处理结果进行融合,得到该尺度对应的第二三维运动数据。For each scale except the first scale, the second three-dimensional motion data output by the previous scale can be used as the input corresponding to the scale, and the second three-dimensional motion data output by the previous scale can be analyzed by using the convolutional layer corresponding to the scale Perform convolution processing to obtain the convolution processing result corresponding to the scale; after that, input the convolution processing result corresponding to the scale to the first fully connected layer, and use the first fully connected layer to perform the first convolution processing result. The second fully-connected mapping process is performed to obtain the first mapping processing result corresponding to the scale; then the first mapping processing result corresponding to the scale is input to the second fully-connected layer, and the second fully-connected layer is used to The first mapping processing result is further fully connected mapping processing to obtain the second mapping processing result corresponding to the scale; finally, the convolution processing result corresponding to the scale is fused with the second mapping processing result corresponding to the scale to obtain the The scale corresponds to the second 3D motion data.
进而,基于图3所示的3个提取模块,可以得到3种尺度下分别对应的第二三维运动数据。Furthermore, based on the three extraction modules shown in FIG. 3 , the second three-dimensional motion data corresponding to the three scales can be obtained.
这样,通过对三维运动数据的卷积处理和全连接映射处理,能够实现对三维运动数据的合理编码,得到提高第二三维运动数据的合理性。In this way, through the convolution processing and full-connection mapping processing on the three-dimensional motion data, reasonable coding of the three-dimensional motion data can be realized, and the rationality of the second three-dimensional motion data can be improved.
S102-3-4:将频域特征数据和至少两种尺度分别对应的第二三维运动数据进行融合,得到目标运动数据。S102-3-4: Fusing the frequency domain feature data with the second three-dimensional motion data respectively corresponding to at least two scales to obtain target motion data.
本步骤中,可以利用目标编码神经网络中的全连接层,对频域特征数据和多种尺度分别对应的第二三维运动数据进行融合,得到目标运动数据。In this step, the fully connected layer in the target encoding neural network can be used to fuse the frequency domain feature data and the second three-dimensional motion data corresponding to multiple scales to obtain the target motion data.
这样,由于反映不同信息量的第二频域数据对应的数据压缩程度不同,因此,本公开基于每组三维运动子数据对应的第二频域数据所反映的信息量,能够确定每组三维运动子数据对应的权重,再基于多组三维运动子数据分别对应的权重,对三维运动数据进行加权处理,能够实现对三维运动数据对应的每段运动的高精度压缩,提高第一三维运动数据的合理性;进而对第一三维运动数据进行多种尺度下的特征提取,能够得到第一三维运动数据在不同深度上分别对应的第二三维运动数据,从而,通过将多种尺度分别对应的第二三维运动数据和频域特征数据进行融合,得益于第二三维运动数据的在深度维度上的数据丰富性,可以提高目标运动数据的准确性。In this way, since the data compression degrees corresponding to the second frequency domain data reflecting different information amounts are different, the present disclosure can determine each group of three-dimensional motion sub-data based on the information amount reflected by the second frequency domain data corresponding to each group of three-dimensional motion sub-data. The weights corresponding to the sub-data, and then based on the weights corresponding to multiple sets of three-dimensional motion sub-data, weight the three-dimensional motion data, which can realize high-precision compression of each segment of motion corresponding to the three-dimensional motion data, and improve the accuracy of the first three-dimensional motion data. rationality; furthermore, by performing feature extraction on the first three-dimensional motion data at multiple scales, the second three-dimensional motion data corresponding to the first three-dimensional motion data at different depths can be obtained, thus, by extracting the first three-dimensional motion data corresponding to multiple scales respectively The fusion of the 2D and 3D motion data and the frequency domain feature data can improve the accuracy of the target motion data thanks to the richness of the second 3D motion data in the depth dimension.
在一些实施例中,针对S102-3-4,可以先将频域特征数据和至少两种尺度分别对应的第二三维运动数据进行拼接,得到拼接后的第三三维运动数据。In some embodiments, for S102-3-4, the frequency domain feature data and the second 3D motion data respectively corresponding to at least two scales may be spliced first to obtain spliced third 3D motion data.
然后,将拼接后的第三三维运动数据输入至目标编码神经网络中的全连接层,利用全连接层对第三三维运动数据进行全连接映射处理,得到目标运动数据。Then, the spliced third three-dimensional motion data is input to the fully connected layer in the target encoding neural network, and the fully connected layer is used to perform fully connected mapping processing on the third three-dimensional motion data to obtain the target motion data.
本公开得到的目标运动数据可以为目标数量维的特征数据,例如,目标运动数据可以为1*256维的特征数据,也即将128*72维的三维运动数据压缩为1*256维的目标运动数据。The target motion data obtained in this disclosure can be feature data of the target dimension, for example, the target motion data can be 1*256-dimensional feature data, that is, compress the 128*72-dimensional three-dimensional motion data into 1*256-dimensional target motion data.
这样,通过将频域特征数据和多种尺度分别对应的第二三维运动数据进行拼接,能够实现对频域特征数据和第二三维运动数据的数据统一,得到统一后的第三三维运动数据;再通过对第三三维运动数据的全连接映射处理,能够实现将处于隐层特征空间的第三三维运动数据还原到三维运动数据对应的样本初始空间,得到与样本初始空间相匹配的目标运动数据。In this way, by splicing the frequency-domain feature data and the second three-dimensional motion data corresponding to multiple scales, the data unification of the frequency-domain feature data and the second three-dimensional motion data can be realized, and the unified third three-dimensional motion data can be obtained; Then, through the fully connected mapping processing of the third three-dimensional motion data, the third three-dimensional motion data in the hidden layer feature space can be restored to the sample initial space corresponding to the three-dimensional motion data, and the target motion data matching the sample initial space can be obtained .
在一些实施例中,由于得到每种运动对应的目标运动数据可以利用目标编码神经网络执行,因此,本公开实施例还提供了一种训练得到目标编码神经网络的方法。In some embodiments, since obtaining the target motion data corresponding to each type of motion can be performed by using a target coding neural network, the embodiments of the present disclosure also provide a method for training a target coding neural network.
实施时,可以先获取样本数据;其中,样本数据包括:至少两个样本姿态分别对应的样本姿态数据;样本姿态数据能够表征样本姿态,例如可以包括样本姿态和样本姿态对应的朝向信息。多个样本姿态为一个运动对应的多个连续的姿态,获取的样本数据为去除全局朝向的运动数据。例如,样本数据可以包括多个样本运动对应的姿态数据,每个样本运动对应的姿态数据为该样本运动的多个样本姿态对应的样本姿态数据。During implementation, sample data may be obtained first; wherein, the sample data includes: sample pose data corresponding to at least two sample poses; the sample pose data can represent the sample pose, for example, may include the sample pose and orientation information corresponding to the sample pose. The multiple sample poses are multiple continuous poses corresponding to one motion, and the acquired sample data is the motion data with the global orientation removed. For example, the sample data may include posture data corresponding to multiple sample movements, and the posture data corresponding to each sample movement is sample posture data corresponding to multiple sample postures of the sample movement.
在一些实施例中,针对获取样本数据的步骤,可以按照以下步骤实施:In some embodiments, the step of obtaining sample data can be implemented according to the following steps:
P1:获取至少两种样本运动分别对应的原始三维运动数据;原始三维运动数据包括:至少两个样本姿态分别对应的姿态数据。P1: Obtain original 3D motion data corresponding to at least two sample motions; the original 3D motion data includes: attitude data corresponding to at least two sample poses.
这里,原始三维运动数据为包括全局朝向的运动数据,每个原始三维运动数据中对应的姿态数据可以包括姿态、姿态的朝向信息,其中,姿态对应的朝向信息是受到目标对象的全局朝向影响的朝向信息。Here, the original 3D motion data is motion data including global orientation, and the corresponding posture data in each original 3D motion data may include posture and posture orientation information, wherein the orientation information corresponding to the posture is affected by the global orientation of the target object Orientation information.
实施时,可以先获取多种样本运动分别对应的原始三维运动数据。During implementation, the original three-dimensional motion data respectively corresponding to various sample motions may be obtained first.
P2:基于至少两个样本姿态分别对应的姿态数据中首个样本姿态对应的姿态数据的朝向信息,确定原始三维运动数据对应的转向角度。P2: Based on the orientation information of the attitude data corresponding to the first sample attitude among the attitude data corresponding to the at least two sample attitudes, determine the steering angle corresponding to the original three-dimensional motion data.
这里,针对每种样本运动对应的原始三维运动数据,可以基于该原始三维数据包括的多个样本姿态分别对应的姿态数据中首个样本姿态对应的姿态数据,确定首个样本姿态对应的朝向信息,进而,可以基于首个样本姿态对应的朝向信息,确定首个样本姿态对应的yaw;之后,以首个样本姿态对应的yaw为0度为目标,确定首个样本姿态在y轴对应的转向角度以及转向方向(顺时针或逆时针),即在基于该转向角度以及转向方向对首个样本姿态在y轴上进行旋转,旋转后的首个样本姿态对应的yaw为0度。Here, for the original three-dimensional motion data corresponding to each sample motion, the orientation information corresponding to the first sample pose can be determined based on the pose data corresponding to the first sample pose among the multiple sample poses included in the original three-dimensional data. , and then, based on the orientation information corresponding to the first sample pose, the yaw corresponding to the first sample pose can be determined; then, with the yaw corresponding to the first sample pose being 0 degrees as the target, determine the steering direction corresponding to the first sample pose on the y-axis Angle and steering direction (clockwise or counterclockwise), that is, based on the steering angle and steering direction, the first sample pose is rotated on the y-axis, and the yaw corresponding to the rotated first sample pose is 0 degrees.
之后,可以将该转向角度以及转向方向作为原始三维运动数据中包括的每个样本姿态对应的转向角度以及转向方向。并且,转向角度和转向方向可以以旋转矩阵的形式表征。Afterwards, the steering angle and steering direction may be used as the steering angle and steering direction corresponding to each sample pose included in the original three-dimensional motion data. Also, the steering angle and steering direction can be represented in the form of a rotation matrix.
基于上述步骤,可以分别确定每个原始三维运动数据对应的转向角度。Based on the above steps, the steering angle corresponding to each original three-dimensional motion data can be determined respectively.
P3:基于转向角度,对原始三维运动数据中的每个样本姿态进行转向处理,得到样本数据。P3: Based on the steering angle, perform steering processing on each sample pose in the original 3D motion data to obtain sample data.
本步骤中,针对每个原始三维运动数据,可以利用该原始三维运动数据对应的转向角度以及转向方向,依次对原始三维运动数据中的每个样本姿态进行转向处理,得到每个原始三维运动数据对应的样本数据。其中,样本数据中的首个样本姿态对应的yaw为0度,从而实现对首个样本姿态对应的正规化。由于样本数据中各个样本姿态之间的相对角度和方向与样本数据对应的原始三维运动数据中各个样本姿态之间的相对角度和方向一致,从而,实现对原始三维运动数据各个样本姿态的正规化。In this step, for each original 3D motion data, the steering angle and steering direction corresponding to the original 3D motion data can be used to sequentially perform steering processing on each sample pose in the original 3D motion data to obtain each original 3D motion data corresponding sample data. Among them, the yaw corresponding to the first sample pose in the sample data is 0 degrees, so as to realize the normalization corresponding to the first sample pose. Since the relative angles and directions between the sample poses in the sample data are consistent with the relative angles and directions between the sample poses in the original 3D motion data corresponding to the sample data, the normalization of each sample pose in the original 3D motion data is realized .
这样,基于每个原始三维运动数据对应的转向角度以及转向方向,可以实现对每个原始三维运动数据对应的样本姿态的转向处理,进而实现对每个原始三维运动数据对应的各个样本姿态的正规化,得到每个原始三维运动数据对应的样本数据。通过该方案,可以减少原始三维运动数据中的冗余信息,进而提高先验空间的重建精度。In this way, based on the steering angle and steering direction corresponding to each original 3D motion data, the steering processing of the sample pose corresponding to each original 3D motion data can be realized, and then the normalization of each sample pose corresponding to each original 3D motion data can be realized. to obtain the sample data corresponding to each original 3D motion data. Through this scheme, the redundant information in the original 3D motion data can be reduced, thereby improving the reconstruction accuracy of the prior space.
然后可以基于样本数据,对编码神经网络(其中,训练好的编码神经网络为目标编码神经网络)执行多轮训练,并在每轮训练中,执行如图4所示的过程。如图4所示,为本公开实施例提供的一种训练编码神经网络的方法的流程图,可以包括以下步骤:Then multiple rounds of training can be performed on the encoding neural network (wherein the trained encoding neural network is the target encoding neural network) based on the sample data, and in each round of training, the process shown in FIG. 4 is performed. As shown in FIG. 4 , it is a flow chart of a method for training an encoding neural network provided by an embodiment of the present disclosure, which may include the following steps:
S401:对样本数据进行随机的全局朝向转向处理,得到第一中间样本数据;第一中间样本数据包括:至少两个样本姿态分别对应的第一姿态数据。S401: Perform random global orientation and steering processing on the sample data to obtain first intermediate sample data; the first intermediate sample data includes: first pose data corresponding to at least two sample poses respectively.
这里,针对得到的样本数据,可以利用随机旋转模块,对样本数据进行均匀采样处理,得到样本数据对应的一个处于预设范围的随机旋转角度,并利用随机旋转角度对样本数据中的每个样本姿态进行随机的全局朝向转向处理,得到第一中间样本数据。例如,针对每个样本姿态,可以将该样本姿态在y轴上进行顺时针或逆时针的旋转,旋转角度为随机旋转角度,之后,可以得到该样本姿态对应的旋转后的第一样本姿态,以及第一样本姿态对应的第一姿态数据。Here, for the obtained sample data, the random rotation module can be used to uniformly sample the sample data to obtain a random rotation angle corresponding to the sample data within a preset range, and use the random rotation angle to perform a random rotation angle for each sample in the sample data The attitude is subjected to random global orientation and steering processing to obtain the first intermediate sample data. For example, for each sample pose, the sample pose can be rotated clockwise or counterclockwise on the y-axis, and the rotation angle is a random rotation angle, and then the rotated first sample pose corresponding to the sample pose can be obtained , and the first pose data corresponding to the first sample pose.
这样,基于对每个样本姿态进行的随机的全局朝向转向处理,可以分别得到每个样本姿态对应的第一样本姿态及第一姿态数据。In this way, based on the random global orientation and steering processing performed on each sample pose, the first sample pose and the first pose data corresponding to each sample pose can be obtained respectively.
S402:利用编码神经网络对第一中间样本数据进行去除全局朝向的编码处理,得到 编码运动数据。S402: Use the coding neural network to perform coding processing for removing the global orientation on the first intermediate sample data to obtain coded motion data.
实施时,可以将第一中间样本数据输入至待训练的编码神经网络,利用编码神经网络按照上述各实施例所提及的编码方法,对第一中间样本数据进行去除全局朝向的编码处理,得到编码运动数据;并且,编码神经网络在对第一中间样本数据进行处理的过程中,消除第一中间样本数据对应的随机旋转角度,输出不带有随机旋转角度相关的信息的编码运动数据。During implementation, the first intermediate sample data can be input into the encoding neural network to be trained, and the encoding neural network is used to perform encoding processing to remove the global orientation on the first intermediate sample data according to the encoding methods mentioned in the above-mentioned embodiments, to obtain encoding motion data; and, during the process of processing the first intermediate sample data, the encoding neural network eliminates the random rotation angle corresponding to the first intermediate sample data, and outputs encoded motion data without information related to the random rotation angle.
S403:利用解码神经网络对编码运动数据进行解码处理,得到第二中间样本数据;第二中间样本数据包括:至少两个样本姿态分别对应的第二姿态数据。S403: Use the decoding neural network to decode the coded motion data to obtain second intermediate sample data; the second intermediate sample data includes: second pose data corresponding to at least two sample poses.
这里,解码神经网络为与编码神经网络相匹配的神经网络,用于对编码神经网络编码后的数据进行解码还原。例如,可以将编码运动数据输入至待训练的解码神经网络,利用解码神经网络对编码运动数据进行解码处理,从而,输出解码后的第二中间样本数据。Here, the decoding neural network is a neural network that matches the encoding neural network, and is used to decode and restore data encoded by the encoding neural network. For example, the coded motion data may be input into a decoding neural network to be trained, and the coded motion data may be decoded by using the decoding neural network, thereby outputting the decoded second intermediate sample data.
其中,第二中间样本数据中包括解码神经网络预测的多个样本姿态分别对应的第二姿态数据;由于编码运动数据为输出的不带有随机旋转角度相关的信息的数据,所以第二中间样本数据中也不带有随机旋转角度相关的信息。Wherein, the second intermediate sample data includes the second attitude data corresponding to the plurality of sample attitudes predicted by the decoding neural network; since the encoded motion data is the output data without random rotation angle-related information, the second intermediate sample There is also no information about random rotation angles in the data.
如图5所示,为本公开实施例提供的一种利用编码神经网络和解码神经网络对获取的原始三维运动数据进行处理,得到第二中间样本数据的示意图。在图5中,在获取样本运动分别对应的原始三维运动数据之后,可以先确定原始三维运动数据对应的转向角度,对原始三维运动数据中的每个样本姿态进行转向处理,得到样本数据;然后利用随机旋转模块L301对样本数据进行随机的全局朝向转向处理,得到第一中间样本数据;再利用编码神经网络L302对第一中间样本数据进行去除全局朝向的编码处理,输出编码运动数据。之后,利用解码神经网络L303对编码运动数据进行解码处理,输出第二中间样本数据。As shown in FIG. 5 , it is a schematic diagram of obtaining the second intermediate sample data by using the encoding neural network and the decoding neural network to process the acquired original 3D motion data provided by the embodiment of the present disclosure. In Fig. 5, after obtaining the original 3D motion data corresponding to the sample motion, the steering angle corresponding to the original 3D motion data can be determined first, and the steering processing is performed on each sample pose in the original 3D motion data to obtain the sample data; and then Use the random rotation module L301 to perform random global orientation steering processing on the sample data to obtain the first intermediate sample data; then use the encoding neural network L302 to perform encoding processing to remove the global orientation on the first intermediate sample data, and output encoded motion data. Afterwards, the coded motion data is decoded by the decoding neural network L303, and the second intermediate sample data is output.
由图5可知,解码神经网络中可以包括解码模块L3031和还原模块L3032,解码模块L3031包括多个全连接层,还原模块L3032中还包括第一解码网络和第二解码网络。It can be seen from FIG. 5 that the decoding neural network may include a decoding module L3031 and a restoration module L3032. The decoding module L3031 includes multiple fully connected layers, and the restoration module L3032 also includes a first decoding network and a second decoding network.
图5中,针对输出编码运动数据的操作,在得到第一中间样本数据之后,可以将其输入到编码神经网络L302中,可以先对第一中间样本数据进行DCT变换,得到第一频域数据和多组中间子样本数据分别对应的第二频域数据,基于多个第二频域数据确定多组中间子样本数据分别对应的权重,再利用得到的权重对第一中间样本数据进行加权处理,得到第一样本三维运动数据;并基于对第一中间样本数据进行的DCT变换,确定第一中间样本数据对应的第一频域数据,再基于第一频域数据确定样本频域特征数据;之后,利用第一提取模块、第二提取模块和第三提取模块对第一样本三维运动数据进行多种尺度的特征提取,得到多种尺度分别对应的第二样本三维运动数据;最后,将样本频域特征数据和多种尺度分别对应的第二样本三维运动数据进行融合,得到编码运动数据。In Fig. 5, for the operation of outputting encoded motion data, after obtaining the first intermediate sample data, it can be input into the encoding neural network L302, and the first intermediate sample data can be DCT transformed first to obtain the first frequency domain data The second frequency domain data respectively corresponding to the plurality of sets of intermediate sub-sample data, determining weights respectively corresponding to the plurality of sets of intermediate sub-sample data based on the plurality of second frequency domain data, and then using the obtained weights to perform weighting processing on the first intermediate sample data , to obtain the first sample three-dimensional motion data; and based on the DCT transformation performed on the first intermediate sample data, determine the first frequency domain data corresponding to the first intermediate sample data, and then determine the sample frequency domain feature data based on the first frequency domain data ; Afterwards, using the first extraction module, the second extraction module and the third extraction module to perform feature extraction of multiple scales on the first sample three-dimensional motion data, and obtain second sample three-dimensional motion data corresponding to multiple scales; finally, The frequency-domain feature data of the sample is fused with the three-dimensional motion data of the second sample respectively corresponding to multiple scales to obtain coded motion data.
图5中,针对输出第二中间样本数据的操作,在得到编码运动数据之后,可以将其 输入到解码神经网络L303中的解码模块L3031中,利用多个全连接层(包括第一全连接层、第二全连接层、第三全连接层以及第四全连接层)对编码运动数据进行多种尺度下的还原,最终输出预测的每个样本姿态对应的朝向特征数据以及每个样本姿态对应的姿态特征数据。然后,可以利用解码神经网络L303中的还原模块L3032,对预测的每个样本姿态对应的朝向特征数据以及每个样本姿态对应的姿态特征数据进行特征拼接,得到编码运动数据对应的第二中间样本数据。In Fig. 5, for the operation of outputting the second intermediate sample data, after obtaining the coded motion data, it can be input into the decoding module L3031 in the decoding neural network L303, using multiple fully connected layers (including the first fully connected layer , the second fully connected layer, the third fully connected layer, and the fourth fully connected layer) restore the encoded motion data at multiple scales, and finally output the predicted orientation feature data corresponding to each sample pose and the corresponding pose feature data. Then, the restoration module L3032 in the decoding neural network L303 can be used to perform feature splicing on the predicted orientation feature data corresponding to each sample pose and the pose feature data corresponding to each sample pose to obtain the second intermediate sample corresponding to the encoded motion data data.
针对解码模块的还原过程,下面以图5所示的4个全连接层进行详细说明:For the restoration process of the decoding module, the following is a detailed description of the four fully connected layers shown in Figure 5:
首先可以将得到的编码运动数据输入到解码模块L3031的第一全连接层,利用第一全连接层对编码运动数据进行全连接映射处理,得到第一全连接层对应的第一输出特征数据;然后,将第一输出特征数据输入至第二全连接层,利用第二全连接层对第一输出特征数据进行全连接映射处理,得到第二输出特征数据;将第二输出特征数据和第一输出特征数据一起输入至第三全连接层,利用第三全连接层对第二输出特征数据和第一输出特征数据进行全连接映射处理,得到第三输出特征数据;将第三输出特征数据和第二输出特征数据一起输入至第四全连接层,利用第四全连接层对第二输出特征数据和第三输出特征数据进行全连接映射处理,输出每个样本姿态对应的朝向特征数据ψ g以及每个样本姿态对应的姿态具体形状的姿态特征数据ψ lFirst, the obtained coded motion data can be input to the first fully connected layer of the decoding module L3031, and the coded motion data is subjected to a fully connected mapping process using the first fully connected layer to obtain the first output characteristic data corresponding to the first fully connected layer; Then, the first output feature data is input to the second fully connected layer, and the second fully connected layer is used to perform full connection mapping processing on the first output feature data to obtain the second output feature data; the second output feature data and the first The output feature data is input to the third fully connected layer together, and the second output feature data and the first output feature data are used to perform full connection mapping processing on the third fully connected layer to obtain the third output feature data; the third output feature data and The second output feature data is input to the fourth fully connected layer together, and the fourth fully connected layer is used to perform fully connected mapping processing on the second output feature data and the third output feature data, and output the orientation feature data ψ g corresponding to each sample pose And the posture feature data ψ l of the specific shape of the posture corresponding to each sample posture.
示例性的,在编码运动数据为1*256维的数据,样本数据为128*72维的数据的情况下,ψ g可以为128*6维的数据,ψ l可以为128*32维的数据。 Exemplarily, in the case where the encoded motion data is 1*256-dimensional data and the sample data is 128*72-dimensional data, ψ g can be 128*6-dimensional data, and ψ l can be 128*32-dimensional data .
这里,由于得到的ψ g和ψ l为隐式的特征数据,不能直接作为输出,因此还需要利用还原模块中的第一解码网络D cont对ψ g进行解码,得到显示的第一目标特征数据,以及利用还原模块中的第二解码网络D vp对ψ l进行解码,得到显示的第二目标特征数据,然后,再将第一目标特征数据和第二目标特征数据进行特征拼接,得到第二中间样本数据。 Here, since the obtained ψ g and ψ l are implicit feature data and cannot be directly output, it is necessary to use the first decoding network D cont in the restoration module to decode ψ g to obtain the displayed first target feature data , and use the second decoding network D vp in the restoration module to decode ψ l to obtain the displayed second target feature data, and then perform feature stitching on the first target feature data and the second target feature data to obtain the second target feature data Intermediate sample data.
例如,在第一目标特征数据可以为128*3维的数据,第二目标特征数据可以为128*69维的数据的情况下,拼接后得到128*72维的第二中间样本数据。For example, in the case that the first target feature data can be 128*3-dimensional data, and the second target feature data can be 128*69-dimensional data, the second intermediate sample data of 128*72 dimensions can be obtained after splicing.
这样,利用上述多个全连接层,对编码运动数据进行还原,可以防止码运动数据的梯度消失,实现对编码运动数据的充分融合。In this way, by using the above-mentioned multiple fully connected layers to restore the coded motion data, the gradient of the coded motion data can be prevented from disappearing, and the full fusion of the coded motion data can be realized.
图5中,损失模型的描述参见下方对步骤S404中损失模型的描述。In FIG. 5 , for the description of the loss model, refer to the description of the loss model in step S404 below.
S404:基于样本数据和第二中间样本数据,对编码神经网络和解码神经网络进行本轮训练。S404: Based on the sample data and the second intermediate sample data, perform a current round of training on the encoding neural network and the decoding neural network.
本步骤中,可以基于样本数据和第二中间样本数据,确定编码神经网络和解码神经网络对应的模型损失,利用模型损失(在图5中已示出),对编码神经网络和解码神经网络进行本轮训练,得到本轮训练完成的编码神经网络和解码神经网络。In this step, based on the sample data and the second intermediate sample data, the model loss corresponding to the encoding neural network and the decoding neural network can be determined, and the encoding neural network and the decoding neural network can be performed using the model loss (shown in FIG. 5 ). In this round of training, the encoding neural network and decoding neural network completed in this round of training are obtained.
进而,可以基于上述S401~S404,对编码神经网络和解码神经网络进行多轮训练。Furthermore, multiple rounds of training may be performed on the encoding neural network and the decoding neural network based on the above S401-S404.
S405:将经过至少两轮训练的编码神经网络,确定为目标编码神经网络。S405: Determine the encoding neural network that has undergone at least two rounds of training as the target encoding neural network.
这里,可以将经过多轮训练的编码神经网络,确定为目标编码神经网络,以及,可 以将经过多轮训练的解码神经网络,确定为目标解码神经网络。Here, the encoding neural network that has undergone multiple rounds of training can be determined as the target encoding neural network, and the decoding neural network that has undergone multiple rounds of training can be determined as the target decoding neural network.
其中,多轮训练的轮数可以为预设值,或者,多轮训练的轮数可以根据训练完成的编码神经网络和解码神经网络的精度确定,在训练完成的编码神经网络和解码神经网络的精度满足预设精度的情况下,得到目标编码神经网络和目标解码神经网络。Wherein, the number of rounds of multi-round training can be a preset value, or, the number of rounds of multi-round training can be determined according to the accuracy of the trained encoding neural network and the decoding neural network, and the number of training completed encoding neural network and decoding neural network When the accuracy meets the preset accuracy, the target encoding neural network and the target decoding neural network are obtained.
这样,通过对样本数据进行全局朝向转向处理,能够实现对样本数据的加噪处理,利用编码神经网络对加噪处理后的第一中间样本数据进行编码处理,能够提高编码神经网络的去噪能力;利用解码神经网络对编码运动数据进行解码处理,能够实现对编码运动数据的还原,在编码神经网络和解码神经网络精度可靠的情况下,解码神经网络输出的第二中间样本数据也将相对准确,即贴近于样本数据,因此,基于样本数据和第二中间样本数据,能够确定出编码神经网络和解码神经网络对应的模型损失,再基于模型损失对编码神经网络和解码神经网络进行多轮训练,能够得到精度可靠的目标编码神经网络和精度可靠的解码神经网络。在一些实施例中,可以利用如下步骤确定模型损失:In this way, by performing global orientation steering processing on the sample data, the noise-adding processing of the sample data can be realized, and the encoding neural network is used to encode the first intermediate sample data after the noise-adding processing, which can improve the denoising ability of the encoding neural network ; Using the decoding neural network to decode the coded motion data can realize the restoration of the coded motion data. In the case of reliable accuracy of the coded neural network and the decoding neural network, the second intermediate sample data output by the decoding neural network will also be relatively accurate , which is close to the sample data, therefore, based on the sample data and the second intermediate sample data, the corresponding model losses of the encoding neural network and the decoding neural network can be determined, and then multiple rounds of training are performed on the encoding neural network and the decoding neural network based on the model loss , can obtain the target encoding neural network with reliable accuracy and the decoding neural network with reliable accuracy. In some embodiments, the model loss can be determined using the following steps:
S1:基于样本数据和第二中间样本数据,确定以下至少之一:样本数据重构损失、编码运动数据和正态分布之间的相似度损失。S1: Based on the sample data and the second intermediate sample data, at least one of the following is determined: a sample data reconstruction loss, a similarity loss between encoded motion data and a normal distribution.
这里,样本数据重构损失用于表征解码神经网络在解码编码运动数据时的损失以及编码神经网络确定编码运动数据时的损失;相似度损失用于表征编码运动数据在以第一中间样本数据为条件下的概率和正态分布之间的相似度损失。Here, the sample data reconstruction loss is used to characterize the loss of the decoding neural network when decoding the encoded motion data and the loss of the encoding neural network to determine the encoded motion data; the similarity loss is used to characterize the encoding motion data in the first intermediate sample data as Similarity loss between conditional probability and normal distributions.
在一些实施例中,针对本步骤中的样本数据重构损失,可以基于样本数据中每帧样本姿态数据对应的朝向特征数据、每帧样本姿态数据对应的姿态特征数据、第二中间样本数据中每帧第二姿态数据对应的朝向特征数据、以及每帧第二姿态数据对应的姿态特征数据,确定样本数据重构损失。In some embodiments, for the sample data reconstruction loss in this step, it can be based on the orientation feature data corresponding to each frame of sample pose data in the sample data, the pose feature data corresponding to each frame of sample pose data, and the second intermediate sample data. The orientation feature data corresponding to the second pose data of each frame and the pose feature data corresponding to the second pose data of each frame determine the sample data reconstruction loss.
这里,样本数据中每帧样本姿态数据可以包括能够表征每个样本姿态对应朝向的朝向特征数据,以及能够表征每个样本姿态对应的具体姿态形状的姿态特征数据。由上述实施例可知,第二中间样本数据包括的多个样本姿态分别对应的第二姿态数据中也包括每帧第二姿态数据对应的朝向特征数据、以及每帧第二姿态数据对应的姿态特征数据。之后,可以基于如下公式(1)确定样本数据重构损失:Here, each frame of sample pose data in the sample data may include orientation feature data that can characterize the orientation corresponding to each sample pose, and pose feature data that can characterize the specific pose shape corresponding to each sample pose. It can be seen from the above-mentioned embodiment that the second pose data corresponding to the multiple sample poses included in the second intermediate sample data also include the orientation feature data corresponding to the second pose data of each frame, and the pose features corresponding to the second pose data of each frame. data. After that, the sample data reconstruction loss can be determined based on the following formula (1):
Figure PCTCN2022124931-appb-000001
Figure PCTCN2022124931-appb-000001
其中,L rec表示样本数据重构损失,M表示编码神经网络和解码神经网络,β为表示形状参数,
Figure PCTCN2022124931-appb-000002
表示样本姿态对应的朝向特征数据,θ l表示样本姿态对应的姿态特征数据。
Among them, L rec represents the sample data reconstruction loss, M represents the encoding neural network and decoding neural network, β represents the shape parameter,
Figure PCTCN2022124931-appb-000002
represents the orientation feature data corresponding to the sample pose, and θ l represents the pose feature data corresponding to the sample pose.
之后,基于上述公式(1),以及样本数据和第二中间样本数据,可以确定出样本数据重构损失。Afterwards, based on the above formula (1), as well as the sample data and the second intermediate sample data, the sample data reconstruction loss can be determined.
针对本步骤中的相似度损失,可以利用如下公式(2)确定:For the similarity loss in this step, the following formula (2) can be used to determine:
Figure PCTCN2022124931-appb-000003
Figure PCTCN2022124931-appb-000003
其中,L KL表示相似度损失,KL表示散度,
Figure PCTCN2022124931-appb-000004
表示进行了随机的全局朝向转向处理 的第一中间样本数据,z mot表示编码运动数据,
Figure PCTCN2022124931-appb-000005
表示编码运动数据在以第一中间样本数据为条件下的概率,N(0,I)表示均值为0,标准差为I的正态分布。
Among them, L KL represents the similarity loss, KL represents the divergence,
Figure PCTCN2022124931-appb-000004
Represents the first intermediate sample data that has undergone random global orientation and steering processing, z mot represents encoded motion data,
Figure PCTCN2022124931-appb-000005
Indicates the probability of the encoded motion data under the condition of the first intermediate sample data, N(0, I) indicates a normal distribution with a mean value of 0 and a standard deviation of 1.
这样,基于上述公式(2),可以确定相似度损失。In this way, based on the above formula (2), the similarity loss can be determined.
S2:本公开在确定相似度损失后,基于以下至少之一:样本数据重构损失、编码运动数据和正态分布之间的相似度损失,确定模型损失。S2: After determining the similarity loss, the disclosure determines the model loss based on at least one of the following: sample data reconstruction loss, similarity loss between encoded motion data and normal distribution.
实施时,可以按照以下公式(3)确定模型损失:When implementing, the model loss can be determined according to the following formula (3):
L λ recL recKLL KL                    (3) L = λ rec L recKL L KL (3)
其中,λ rec表示样本数据重构损失对应的第一损失系数,λ KL表示相似度损失对应的第二损失系数。 Wherein, λ rec represents the first loss coefficient corresponding to the sample data reconstruction loss, and λ KL represents the second loss coefficient corresponding to the similarity loss.
例如,可以将第一损失系数和样本数据重构损失相乘得到的和与第二损失系数和相似度损失的和相加,基于相加后得到的结果,确定模型损失。For example, the sum obtained by multiplying the first loss coefficient and the sample data reconstruction loss may be added to the sum of the second loss coefficient and the similarity loss, and the model loss may be determined based on the result obtained after the addition.
这样,基于朝向特征数据能够确定解码神经网络在还原姿态数据对应的朝向信息时的损失,基于姿态特征数据能够确定解码神经网络在还原每帧姿态时的损失,将两种损失作为样本数据重构损失对编码神经网络和解码神经网络进行训练,能够提高输出的朝向特征数据和姿态特征数据的精度。In this way, the loss of the decoding neural network when restoring the orientation information corresponding to the pose data can be determined based on the orientation feature data, and the loss of the decoding neural network when restoring the pose of each frame can be determined based on the pose feature data, and the two losses are reconstructed as sample data The loss trains the encoding neural network and decoding neural network, which can improve the accuracy of the output orientation feature data and attitude feature data.
本领域技术人员可以理解,在实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The execution order of each step should be based on its function and possible internal logic. Sure.
基于同一发明构思,本公开实施例中还提供了与先验空间的生成方法对应的先验空间的生成装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述先验空间的生成方法相似,因此装置的实施可以参见方法的实施。Based on the same inventive concept, the embodiment of the present disclosure also provides a priori space generation device corresponding to the generation method of the priori space. Since the problem-solving principle of the device in the embodiment of the present disclosure is the same as that of the priori space in the embodiment of the present disclosure The generation method of is similar, so the implementation of the device can refer to the implementation of the method.
如图6所示,为本公开实施例提供的一种先验空间的生成装置的示意图,包括:As shown in FIG. 6 , it is a schematic diagram of a device for generating a priori space provided by an embodiment of the present disclosure, including:
获取模块601,配置为获取目标对象的至少两种运动分别对应的三维运动数据;所述三维运动数据包括:对应运动的至少两个姿态分别对应的姿态数据;编码模块602,配置为对每种运动对应的三维运动数据进行去除全局朝向的编码处理,得到所述每种运动对应的目标运动数据;确定模块603,配置为基于所述至少两种运动分别对应的目标运动数据,生成目标先验空间。The acquisition module 601 is configured to acquire three-dimensional motion data corresponding to at least two kinds of motions of the target object; the three-dimensional motion data includes: posture data corresponding to at least two postures corresponding to the motion; the encoding module 602 is configured to for each The three-dimensional motion data corresponding to the motion is subjected to encoding processing for removing the global orientation to obtain the target motion data corresponding to each type of motion; the determination module 603 is configured to generate a priori target based on the target motion data corresponding to the at least two types of motion space.
在一种可能的实施方式中,所述装置还包括:识别模块604,配置为所述基于至少两种运动分别对应的目标运动数据,生成目标先验空间之后,获取所述目标对象在运动时的运动视频;对所述运动视频进行特征提取,得到运动特征数据;基于所述运动特征数据,从所述目标先验空间中确定与所述运动特征数据匹配的目标运动数据;基于与所述运动特征数据匹配的目标运动数据对应的运动类型,确定所述目标对象的运动类型。In a possible implementation manner, the device further includes: an identification module 604 configured to acquire the target object's movement data after generating the target prior space based on the target motion data respectively corresponding to at least two types of motion. motion video; feature extraction is performed on the motion video to obtain motion feature data; based on the motion feature data, determine from the target prior space the target motion data that matches the motion feature data; The motion type corresponding to the target motion data matched with the motion feature data determines the motion type of the target object.
在一种可能的实施方式中,所述编码模块602,配置为确定所述三维运动数据在频域对应的第一频域数据,以及将所述三维运动数据分为至少两组三维运动子数据,并确定所述至少两组三维运动子数据在频域分别对应的第二频域数据;基于所述第一频域数据、所述第二频域数据,对所述三维运动数据进行去除全局朝向的压缩处理,得到所述 目标运动数据。In a possible implementation manner, the encoding module 602 is configured to determine the first frequency domain data corresponding to the 3D motion data in the frequency domain, and divide the 3D motion data into at least two groups of 3D motion sub-data , and determine the second frequency domain data respectively corresponding to the at least two groups of three-dimensional motion sub-data in the frequency domain; based on the first frequency domain data and the second frequency domain data, remove the overall Orientation compression processing to obtain the target motion data.
在一种可能的实施方式中,所述编码模块602,配置为基于所述第一频域数据,得到所述三维运动数据的频域特征数据;以及,基于所述第二频域数据,确定至少两组三维运动子数据分别对应的权重;基于至少两组三维运动子数据分别对应的权重,对所述三维运动数据进行加权处理,得到第一三维运动数据;对所述第一三维运动数据进行至少两种尺度下的特征提取,得到所述至少两种尺度分别对应的第二三维运动数据;将所述频域特征数据和至少两种尺度分别对应的第二三维运动数据进行融合,得到所述目标运动数据。In a possible implementation manner, the encoding module 602 is configured to obtain frequency-domain feature data of the three-dimensional motion data based on the first frequency-domain data; and, based on the second frequency-domain data, determine weights corresponding to at least two groups of three-dimensional motion sub-data; based on weights corresponding to at least two groups of three-dimensional motion sub-data, performing weighting processing on the three-dimensional motion data to obtain first three-dimensional motion data; performing feature extraction on at least two scales to obtain second three-dimensional motion data corresponding to the at least two scales; fusing the frequency domain feature data with the second three-dimensional motion data corresponding to at least two scales to obtain The target motion data.
在一种可能的实施方式中,所述编码模块602,配置为对至少两组三维运动子数据分别对应的第二频域数据进行融合处理,得到融合频域数据;其中,所述融合频域数据的维度与所述三维运动子数据的组数相同;对所述融合频域数据进行归一化处理,得到至少两组三维运动子数据分别对应的权重。In a possible implementation manner, the encoding module 602 is configured to perform fusion processing on the second frequency domain data respectively corresponding to at least two groups of three-dimensional motion sub-data to obtain fusion frequency domain data; wherein, the fusion frequency domain data The dimensions of the data are the same as the number of groups of the 3D motion sub-data; the fusion frequency domain data is normalized to obtain weights corresponding to at least two groups of 3D motion sub-data.
在一种可能的实施方式中,所述编码模块602,配置为针对至少两种尺度中的每种尺度,对尺度对应的输入三维运动数据进行卷积处理,并对卷积处理的结果进行全连接映射处理,得到所述尺度对应的第二三维运动数据;其中,所述尺度对应的输入三维运动数据包括:所述至少两种尺度中前一种尺度对应的第二三维运动数据、所述第一三维运动数据。In a possible implementation manner, the encoding module 602 is configured to perform convolution processing on the input 3D motion data corresponding to the scale for each scale of at least two scales, and perform full convolution processing on the result of the convolution processing. Connecting the mapping process to obtain the second three-dimensional motion data corresponding to the scale; wherein, the input three-dimensional motion data corresponding to the scale includes: the second three-dimensional motion data corresponding to the former scale of the at least two scales, the First 3D motion data.
在一种可能的实施方式中,所述编码模块602,配置为将所述频域特征数据和至少两种尺度分别对应的第二三维运动数据进行拼接,得到第三三维运动数据;对所述第三三维运动数据进行全连接映射处理,得到所述目标运动数据。In a possible implementation manner, the encoding module 602 is configured to concatenate the frequency-domain feature data and second 3D motion data corresponding to at least two scales to obtain third 3D motion data; The third three-dimensional motion data is subjected to full connection mapping processing to obtain the target motion data.
在一种可能的实施方式中,所述编码模块602,配置为针对每种运动分别对应的三维运动数据,利用预先训练的目标编码神经网络,对所述每种运动对应的三维运动数据进行去除全局朝向的编码处理,得到所述每种运动对应的目标运动数据。In a possible implementation manner, the coding module 602 is configured to remove the three-dimensional motion data corresponding to each type of motion by using a pre-trained target coding neural network for the three-dimensional motion data corresponding to each type of motion. The encoding process of the global orientation obtains the target motion data corresponding to each type of motion.
在一种可能的实施方式中,所述装置还包括:训练模块605,配置为采用下述方式训练得到所述目标编码神经网络:获取样本数据;所述样本数据包括:至少两个样本姿态分别对应的样本姿态数据;执行至少两轮训练,并在每轮训练中,执行下述过程:对所述样本数据进行随机的全局朝向转向处理,得到第一中间样本数据;所述第一中间样本数据包括:至少两个所述样本姿态分别对应的第一姿态数据;利用编码神经网络对所述第一中间样本数据进行去除全局朝向的编码处理,得到编码运动数据;利用解码神经网络对所述编码运动数据进行解码处理,得到第二中间样本数据;所述第二中间样本数据包括:至少两个所述样本姿态分别对应的第二姿态数据;基于所述样本数据和所述第二中间样本数据,对所述编码神经网络和所述解码神经网络进行本轮训练;将经过至少两轮训练的编码神经网络,确定为所述目标编码神经网络。In a possible implementation manner, the device further includes: a training module 605 configured to train the target encoding neural network in the following manner: obtain sample data; the sample data includes: at least two sample poses respectively Corresponding sample attitude data; perform at least two rounds of training, and in each round of training, perform the following process: perform random global orientation and steering processing on the sample data to obtain first intermediate sample data; the first intermediate sample The data includes: at least two first posture data respectively corresponding to the sample postures; using a coding neural network to perform coding processing for removing the global orientation on the first intermediate sample data to obtain coded motion data; using a decoding neural network to encode the The encoded motion data is decoded to obtain second intermediate sample data; the second intermediate sample data includes: at least two second pose data corresponding to at least two sample poses; based on the sample data and the second intermediate sample The current round of training is performed on the encoding neural network and the decoding neural network; the encoding neural network that has undergone at least two rounds of training is determined as the target encoding neural network.
在一种可能的实施方式中,所述训练模块605,配置为基于所述样本数据和所述第二中间样本数据,确定模型损失;基于所述模型损失,对所述编码神经网络和所述解码神经网络进行本轮训练。In a possible implementation manner, the training module 605 is configured to determine a model loss based on the sample data and the second intermediate sample data; The decoding neural network is trained for the current round.
在一种可能的实施方式中,所述训练模块605,配置为基于所述样本数据和所述第二中间样本数据,确定以下至少之一:样本数据重构损失、编码运动数据和正态分布之间的相似度损失;基于所述样本数据重构损失、编码运动数据和正态分布之间的相似度损失至少之一,确定所述模型损失。In a possible implementation manner, the training module 605 is configured to determine at least one of the following based on the sample data and the second intermediate sample data: sample data reconstruction loss, encoded motion data, and normal distribution A similarity loss between them; determining the model loss based on at least one of the sample data reconstruction loss, the similarity loss between the coded motion data and the normal distribution.
在一种可能的实施方式中,所述训练模块605,配置为基于所述样本数据中每帧样本姿态数据对应的朝向特征数据、每帧样本姿态数据对应的姿态特征数据、所述第二中间样本数据中每帧第二姿态数据对应的朝向特征数据、以及每帧第二姿态数据对应的姿态特征数据,确定所述样本数据重构损失。In a possible implementation manner, the training module 605 is configured to be based on the orientation feature data corresponding to each frame of sample pose data in the sample data, the pose feature data corresponding to each frame of sample pose data, the second intermediate The orientation feature data corresponding to each frame of the second pose data in the sample data, and the pose feature data corresponding to each frame of the second pose data are used to determine the sample data reconstruction loss.
在一种可能的实施方式中,所述训练模块605,配置为获取至少两种样本运动分别对应的原始三维运动数据;所述原始三维运动数据包括:至少两个样本姿态分别对应的姿态数据;基于所述至少两个样本姿态分别对应的姿态数据中首个样本姿态对应的姿态数据的朝向信息,确定所述原始三维运动数据对应的转向角度;基于所述转向角度,对所述原始三维运动数据中的每个样本姿态进行转向处理,得到所述样本数据。In a possible implementation manner, the training module 605 is configured to acquire original three-dimensional motion data corresponding to at least two sample motions; the original three-dimensional motion data includes: gesture data corresponding to at least two sample poses; Based on the orientation information of the attitude data corresponding to the first sample attitude in the attitude data corresponding to the at least two sample attitudes, determine the steering angle corresponding to the original three-dimensional motion data; Turning processing is performed on each sample pose in the data to obtain the sample data.
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明。For a description of the processing flow of each module in the device and the interaction flow between the modules, reference may be made to the relevant descriptions in the foregoing method embodiments.
本公开实施例还提供了一种计算机设备,如图7所示,为本公开实施例提供的一种计算机设备结构示意图,包括:An embodiment of the present disclosure also provides a computer device, as shown in FIG. 7 , which is a schematic structural diagram of a computer device provided by an embodiment of the present disclosure, including:
处理器71和存储器72;所述存储器72存储有处理器71可执行的机器可读指令,处理器71用于执行存储器72中存储的机器可读指令,所述机器可读指令被处理器71执行时,处理器71执行下述步骤:获取原始先验空间;原始先验空间中包括目标对象的多种运动分别对应的三维运动数据;三维运动数据包括:对应运动的多个姿态分别对应的姿态数据;对每种运动对应的三维运动数据进行去除全局朝向的编码处理,得到每种运动对应的目标运动数据以及基于多种运动分别对应的目标运动数据,生成目标先验空间。 Processor 71 and memory 72; the memory 72 stores machine-readable instructions executable by the processor 71, the processor 71 is used to execute the machine-readable instructions stored in the memory 72, and the machine-readable instructions are executed by the processor 71 During execution, the processor 71 performs the following steps: obtain the original prior space; the original prior space includes three-dimensional motion data corresponding to various movements of the target object; the three-dimensional motion data includes: Attitude data: The three-dimensional motion data corresponding to each type of motion is encoded to remove the global orientation, and the target motion data corresponding to each type of motion and the target motion data corresponding to multiple types of motion are obtained to generate the target prior space.
上述存储器72包括内存721和外部存储器722;这里的内存721也称内存储器,用于暂时存放处理器71中的运算数据,以及与硬盘等外部存储器722交换的数据,处理器71通过内存721与外部存储器722进行数据交换。Above-mentioned memory 72 comprises memory 721 and external memory 722; Memory 721 here is also called internal memory, is used for temporarily storing computing data in processor 71, and the data exchanged with external memory 722 such as hard disk, processor 71 communicates with memory 721 through memory 721. The external memory 722 performs data exchange.
上述指令的执行过程可以参考本公开实施例中所述的先验空间的生成方法的步骤。For the execution process of the above instructions, reference may be made to the steps of the method for generating a priori space described in the embodiments of the present disclosure.
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的先验空间的生成方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。本公开实施例所提供的先验空间的生成方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行上述方法实施例中所述的先验空间的生成方法的步骤,可参见上述方法实施例。Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the method for generating a priori space described in the above-mentioned method embodiments is executed. step. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium. The computer program product of the method for generating a priori space provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program code can be used to execute the priori described in the above method embodiments. For the steps of the method for generating the space, refer to the foregoing method embodiments.
该计算机程序产品可以通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品体现为计算机存储介质,在另一个可选实施例中,计算机程序产品 体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。The computer program product can be realized by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) or the like.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置的工作过程,可以参考前述方法实施例中的对应过程。在本公开所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。Those skilled in the art can clearly understand that for the convenience and brevity of description, for the working process of the device described above, reference can be made to the corresponding process in the foregoing method embodiments. In the several embodiments provided in the present disclosure, it should be understood that the disclosed devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined. Or some features can be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的易失或非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions described above are implemented in the form of software function units and sold or used as independent products, they can be stored in a volatile or non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
最后应说明的是:以上所述实施例,仅为本公开的实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that: the above-described embodiments are only implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than to limit them. The scope of protection of the present disclosure is not limited thereto. This example describes the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify or modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure. Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in the scope of the technical solutions of the embodiments of the present disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.

Claims (18)

  1. 一种先验空间的生成方法,包括:A method for generating a priori space, comprising:
    获取目标对象的至少两种运动分别对应的三维运动数据;所述三维运动数据包括:对应运动的至少两个姿态分别对应的姿态数据;Acquiring three-dimensional motion data corresponding to at least two kinds of motions of the target object; the three-dimensional motion data includes: posture data corresponding to at least two postures corresponding to the motion;
    对每种运动对应的三维运动数据进行去除全局朝向的编码处理,得到所述每种运动对应的目标运动数据;performing encoding processing for removing the global orientation on the three-dimensional motion data corresponding to each type of motion, to obtain target motion data corresponding to each type of motion;
    基于所述至少两种运动分别对应的目标运动数据,生成目标先验空间。Based on the target motion data respectively corresponding to the at least two types of motion, a target prior space is generated.
  2. 根据权利要求1所述的方法,其中,所述基于所述至少两种运动分别对应的目标运动数据,生成目标先验空间之后,还包括:The method according to claim 1, wherein, after generating the target prior space based on the target motion data respectively corresponding to the at least two types of motion, further comprising:
    获取所述目标对象在运动时的运动视频;Obtain a motion video of the target object when it is in motion;
    对所述运动视频进行特征提取,得到运动特征数据;Carrying out feature extraction to described motion video, obtains motion feature data;
    基于所述运动特征数据,从所述目标先验空间中确定与所述运动特征数据匹配的目标运动数据;determining target motion data matching the motion feature data from the target prior space based on the motion feature data;
    基于与所述运动特征数据匹配的目标运动数据对应的运动类型,确定所述目标对象的运动类型。Based on the motion type corresponding to the target motion data matched with the motion characteristic data, the motion type of the target object is determined.
  3. 根据权利要求1或2所述的方法,其中,所述对每种运动对应的三维运动数据进行去除全局朝向的编码处理,得到所述每种运动对应的目标运动数据,包括:The method according to claim 1 or 2, wherein the encoding process of removing the global orientation is performed on the three-dimensional motion data corresponding to each type of motion to obtain the target motion data corresponding to each type of motion, including:
    确定所述三维运动数据在频域对应的第一频域数据,以及将所述三维运动数据分为至少两组三维运动子数据,并确定所述至少两组三维运动子数据在频域分别对应的第二频域数据;Determining the first frequency domain data corresponding to the three-dimensional motion data in the frequency domain, and dividing the three-dimensional motion data into at least two groups of three-dimensional motion sub-data, and determining that the at least two groups of three-dimensional motion sub-data respectively correspond to The second frequency domain data of ;
    基于所述第一频域数据、所述第二频域数据,对所述三维运动数据进行去除全局朝向的压缩处理,得到所述目标运动数据。Based on the first frequency domain data and the second frequency domain data, the three-dimensional motion data is compressed to remove the global orientation to obtain the target motion data.
  4. 根据权利要求3所述的方法,其中,所述基于所述第一频域数据、所述第二频域数据,对所述三维运动数据进行去除全局朝向的压缩处理,得到所述目标运动数据,包括:The method according to claim 3, wherein, based on the first frequency domain data and the second frequency domain data, the three-dimensional motion data is compressed to remove the global orientation to obtain the target motion data ,include:
    基于所述第一频域数据,得到所述三维运动数据的频域特征数据;Obtaining frequency-domain feature data of the three-dimensional motion data based on the first frequency-domain data;
    以及,基于所述第二频域数据,确定至少两组三维运动子数据分别对应的权重;And, based on the second frequency domain data, determine weights respectively corresponding to at least two groups of three-dimensional motion sub-data;
    基于至少两组三维运动子数据分别对应的权重,对所述三维运动数据进行加权处理,得到第一三维运动数据;Based on weights corresponding to at least two groups of three-dimensional motion sub-data, performing weighting processing on the three-dimensional motion data to obtain first three-dimensional motion data;
    对所述第一三维运动数据进行至少两种尺度的特征提取,得到所述至少两种尺度分别对应的第二三维运动数据;performing feature extraction on at least two scales of the first three-dimensional motion data to obtain second three-dimensional motion data respectively corresponding to the at least two scales;
    将所述频域特征数据和至少两种尺度分别对应的第二三维运动数据进行融合,得到所述目标运动数据。The target motion data is obtained by fusing the frequency-domain feature data with second three-dimensional motion data corresponding to at least two scales.
  5. 根据权利要求4所述的方法,其中,所述基于所述第二频域数据,确定至少两组三维运动子数据分别对应的权重,包括:The method according to claim 4, wherein said determining weights corresponding to at least two groups of three-dimensional motion sub-data based on said second frequency domain data includes:
    对至少两组三维运动子数据分别对应的第二频域数据进行融合处理,得到融合频域数据;其中,所述融合频域数据的维度与所述三维运动子数据的组数相同;Perform fusion processing on the second frequency domain data respectively corresponding to at least two groups of three-dimensional motion sub-data to obtain fusion frequency domain data; wherein, the dimension of the fusion frequency domain data is the same as the number of groups of the three-dimensional motion sub-data;
    对所述融合频域数据进行归一化处理,得到至少两组三维运动子数据分别对应的权重。Perform normalization processing on the fused frequency domain data to obtain weights respectively corresponding to at least two groups of three-dimensional motion sub-data.
  6. 根据权利要求4或5所述的方法,其中,所述对所述第一三维运动数据进行至少两种尺度下的特征提取,得到至少两种尺度分别对应的第二三维运动数据,包括:The method according to claim 4 or 5, wherein said performing feature extraction on said first three-dimensional motion data in at least two scales to obtain second three-dimensional motion data respectively corresponding to at least two scales, comprising:
    针对至少两种尺度中的每种尺度,对尺度对应的输入三维运动数据进行卷积处理,并对卷积处理的结果进行全连接映射处理,得到所述尺度对应的第二三维运动数据;For each of the at least two scales, perform convolution processing on the input 3D motion data corresponding to the scale, and perform fully connected mapping processing on the result of the convolution processing to obtain second 3D motion data corresponding to the scale;
    其中,所述尺度对应的输入三维运动数据包括:所述至少两种尺度中前一种尺度对应的第二三维运动数据、所述第一三维运动数据。Wherein, the input 3D motion data corresponding to the scale includes: the second 3D motion data corresponding to the former scale of the at least two scales, and the first 3D motion data.
  7. 根据权利要求4至6任一项所述的方法,其中,所述将所述频域特征数据和至少两种尺度分别对应的第二三维运动数据进行融合,得到所述目标运动数据,包括:The method according to any one of claims 4 to 6, wherein said fusing said frequency-domain feature data and second three-dimensional motion data respectively corresponding to at least two scales to obtain said target motion data includes:
    将所述频域特征数据和至少两种尺度分别对应的第二三维运动数据进行拼接,得到第三三维运动数据;splicing the frequency-domain feature data and second three-dimensional motion data corresponding to at least two scales to obtain third three-dimensional motion data;
    对所述第三三维运动数据进行全连接映射处理,得到所述目标运动数据。Performing full-connection mapping processing on the third three-dimensional motion data to obtain the target motion data.
  8. 根据权利要求1至7任一项所述的方法,其中,所述对每种运动对应的三维运动数据进行去除全局朝向的编码处理,得到所述每种运动对应的目标运动数据,包括:The method according to any one of claims 1 to 7, wherein the encoding process of removing the global orientation is performed on the three-dimensional motion data corresponding to each type of motion to obtain the target motion data corresponding to each type of motion, including:
    针对每种运动分别对应的三维运动数据,利用预先训练的目标编码神经网络,对所述每种运动对应的三维运动数据进行去除全局朝向的编码处理,得到所述每种运动对应的目标运动数据。For the three-dimensional motion data corresponding to each type of motion, using the pre-trained target encoding neural network, the three-dimensional motion data corresponding to each type of motion is encoded to remove the global orientation, and the target motion data corresponding to each type of motion is obtained. .
  9. 根据权利要求8所述的方法,其中,采用下述方式训练得到所述目标编码神经网络:The method according to claim 8, wherein the target coding neural network is obtained by training in the following manner:
    获取样本数据;所述样本数据包括:至少两个样本姿态分别对应的样本姿态数据;Acquiring sample data; the sample data includes: sample pose data corresponding to at least two sample poses;
    执行至少两轮训练,并在每轮训练中,执行下述过程:Perform at least two rounds of training, and in each round, perform the following procedure:
    对所述样本数据进行随机的全局朝向转向处理,得到第一中间样本数据;所述第一中间样本数据包括:至少两个所述样本姿态分别对应的第一姿态数据;Perform random global orientation and steering processing on the sample data to obtain first intermediate sample data; the first intermediate sample data includes: at least two first posture data respectively corresponding to the sample postures;
    利用编码神经网络对所述第一中间样本数据进行去除全局朝向的编码处理,得到编码运动数据;performing encoding processing for removing the global orientation on the first intermediate sample data by using an encoding neural network to obtain encoded motion data;
    利用解码神经网络对所述编码运动数据进行解码处理,得到第二中间样本数据;所述第二中间样本数据包括:至少两个所述样本姿态分别对应的第二姿态数据;Decoding the coded motion data by using a decoding neural network to obtain second intermediate sample data; the second intermediate sample data includes: at least two second pose data respectively corresponding to the sample poses;
    基于所述样本数据和所述第二中间样本数据,对所述编码神经网络和所述解码神经网络进行本轮训练;performing a current round of training on the encoding neural network and the decoding neural network based on the sample data and the second intermediate sample data;
    将经过至少两轮训练的编码神经网络,确定为所述目标编码神经网络。An encoding neural network that has undergone at least two rounds of training is determined as the target encoding neural network.
  10. 根据权利要求9所述的方法,其中,所述基于所述样本数据和所述第二中间样本数据,对所述编码神经网络和所述解码神经网络进行本轮训练,包括:The method according to claim 9, wherein said performing a current round of training on said encoding neural network and said decoding neural network based on said sample data and said second intermediate sample data comprises:
    基于所述样本数据和所述第二中间样本数据,确定模型损失;determining a model loss based on the sample data and the second intermediate sample data;
    基于所述模型损失,对所述编码神经网络和所述解码神经网络进行本轮训练。Based on the model loss, a current round of training is performed on the encoding neural network and the decoding neural network.
  11. 根据权利要求10所述的方法,其中,所述基于所述样本数据和所述第二中间样本数据,确定模型损失,包括:The method according to claim 10, wherein said determining a model loss based on said sample data and said second intermediate sample data comprises:
    基于所述样本数据和所述第二中间样本数据,确定以下至少之一:样本数据重构损失、编码运动数据和正态分布之间的相似度损失;Based on the sample data and the second intermediate sample data, at least one of the following is determined: a sample data reconstruction loss, a similarity loss between encoded motion data and a normal distribution;
    基于所述样本数据重构损失、编码运动数据和正态分布之间的相似度损失至少之一,确定所述模型损失。The model loss is determined based on at least one of the sample data reconstruction loss, a similarity loss between encoded motion data and a normal distribution.
  12. 根据权利要求11所述的方法,其中,基于所述样本数据和所述第二中间样本数据,确定样本数据重构损失,包括:The method according to claim 11, wherein, based on the sample data and the second intermediate sample data, determining a sample data reconstruction loss comprises:
    基于所述样本数据中每帧样本姿态数据对应的朝向特征数据、每帧样本姿态数据对应的姿态特征数据、所述第二中间样本数据中每帧第二姿态数据对应的朝向特征数据、以及每帧第二姿态数据对应的姿态特征数据,确定所述样本数据重构损失。Based on the orientation characteristic data corresponding to each frame of sample posture data in the sample data, the posture characteristic data corresponding to each frame of sample posture data, the orientation characteristic data corresponding to each frame of second posture data in the second intermediate sample data, and each The pose feature data corresponding to the second pose data of the frame is used to determine the reconstruction loss of the sample data.
  13. 根据权利要求9至12任一项所述的方法,其中,所述获取样本数据,包括:The method according to any one of claims 9 to 12, wherein said acquiring sample data comprises:
    获取至少两种样本运动分别对应的原始三维运动数据;所述原始三维运动数据包括:至少两个样本姿态分别对应的姿态数据;Acquire original three-dimensional motion data corresponding to at least two sample motions; the original three-dimensional motion data includes: attitude data corresponding to at least two sample poses;
    基于所述至少两个样本姿态分别对应的姿态数据中首个样本姿态对应的姿态数据的朝向信息,确定所述原始三维运动数据对应的转向角度;Based on the orientation information of the attitude data corresponding to the first sample attitude among the attitude data corresponding to the at least two sample attitudes, determine the steering angle corresponding to the original three-dimensional motion data;
    基于所述转向角度,对所述原始三维运动数据中的每个样本姿态进行转向处理,得到所述样本数据。Based on the steering angle, perform steering processing on each sample pose in the original three-dimensional motion data to obtain the sample data.
  14. 一种先验空间的生成装置,包括:获取模块,配置为获取目标对象的至少两种运动分别对应的三维运动数据;所述三维运动数据包括:对应运动的至少两个姿态分别对应的姿态数据;编码模块,配置为对每种运动对应的三维运动数据进行去除全局朝向的编码处理,得到每种运动对应的目标运动数据;确定模块,配置为基于所述至少两种运动分别对应的目标运动数据,生成目标先验空间。A device for generating a priori space, comprising: an acquisition module configured to acquire three-dimensional motion data corresponding to at least two types of motion of a target object; the three-dimensional motion data includes: posture data corresponding to at least two postures of the corresponding motion The encoding module is configured to perform encoding processing for removing the global orientation on the three-dimensional motion data corresponding to each type of motion, so as to obtain the target motion data corresponding to each type of motion; the determination module is configured to be based on the target motions corresponding to the at least two types of motion data to generate a target prior space.
  15. 一种计算机设备,包括:处理器、存储器,所述存储器存储有所述处理器可执行的机器可读指令,所述处理器用于执行所述存储器中存储的机器可读指令,所述机器可读指令被所述处理器执行时,所述处理器执行如权利要求1至13任一项所述的先验空间的生成方法的步骤。A computer device, comprising: a processor and a memory, the memory stores machine-readable instructions executable by the processor, the processor is configured to execute the machine-readable instructions stored in the memory, and the machine can When the read instruction is executed by the processor, the processor executes the steps of the method for generating a priori space according to any one of claims 1 to 13.
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被计算机设备运行时,所述计算机设备执行如权利要求1至13任一项所述的先验空间的生成方法的步骤。A computer-readable storage medium, a computer program is stored on the computer-readable storage medium, and when the computer program is run by a computer device, the computer device executes the prior art described in any one of claims 1 to 13 The steps of the generation method of the space.
  17. 一种计算机程序,包括计算机可读代码,在所述计算机可读代码被计算机读取并执行的情况下,设备中的处理器执行用于实现权利要求1至13任一项所述的先验空间的生成方法的步骤。A computer program, comprising computer-readable codes, when the computer-readable codes are read and executed by a computer, the processor in the device executes the prior art for realizing any one of claims 1-13 The steps of the generation method of the space.
  18. 一种计算机程序产品,配置为存储计算机可读指令,所述计算机可读指令被执行时使得计算机执行权利要求1至13任一项所述的先验空间的生成方法的步骤。A computer program product configured to store computer-readable instructions, which, when executed, cause a computer to execute the steps of the method for generating a priori space according to any one of claims 1 to 13.
PCT/CN2022/124931 2021-10-29 2022-10-12 Apriori space generation method and apparatus, and computer device, storage medium, computer program and computer program product WO2023071806A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111275623.6 2021-10-29
CN202111275623.6A CN113920466A (en) 2021-10-29 2021-10-29 Priori space generation method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023071806A1 true WO2023071806A1 (en) 2023-05-04

Family

ID=79243890

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/124931 WO2023071806A1 (en) 2021-10-29 2022-10-12 Apriori space generation method and apparatus, and computer device, storage medium, computer program and computer program product

Country Status (2)

Country Link
CN (1) CN113920466A (en)
WO (1) WO2023071806A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118521719A (en) * 2024-07-23 2024-08-20 浙江核新同花顺网络信息股份有限公司 Virtual person three-dimensional model determining method, device, equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113920466A (en) * 2021-10-29 2022-01-11 上海商汤智能科技有限公司 Priori space generation method and device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013020578A (en) * 2011-07-14 2013-01-31 Nippon Telegr & Teleph Corp <Ntt> Three-dimensional posture estimation device, three-dimensional posture estimation method and program
CN111047548A (en) * 2020-03-12 2020-04-21 腾讯科技(深圳)有限公司 Attitude transformation data processing method and device, computer equipment and storage medium
CN111401230A (en) * 2020-03-13 2020-07-10 深圳市商汤科技有限公司 Attitude estimation method and apparatus, electronic device, and storage medium
CN112200165A (en) * 2020-12-04 2021-01-08 北京软通智慧城市科技有限公司 Model training method, human body posture estimation method, device, equipment and medium
US20210319629A1 (en) * 2019-07-23 2021-10-14 Shenzhen University Generation method of human body motion editing model, storage medium and electronic device
CN113920466A (en) * 2021-10-29 2022-01-11 上海商汤智能科技有限公司 Priori space generation method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013020578A (en) * 2011-07-14 2013-01-31 Nippon Telegr & Teleph Corp <Ntt> Three-dimensional posture estimation device, three-dimensional posture estimation method and program
US20210319629A1 (en) * 2019-07-23 2021-10-14 Shenzhen University Generation method of human body motion editing model, storage medium and electronic device
CN111047548A (en) * 2020-03-12 2020-04-21 腾讯科技(深圳)有限公司 Attitude transformation data processing method and device, computer equipment and storage medium
CN111401230A (en) * 2020-03-13 2020-07-10 深圳市商汤科技有限公司 Attitude estimation method and apparatus, electronic device, and storage medium
CN112200165A (en) * 2020-12-04 2021-01-08 北京软通智慧城市科技有限公司 Model training method, human body posture estimation method, device, equipment and medium
CN113920466A (en) * 2021-10-29 2022-01-11 上海商汤智能科技有限公司 Priori space generation method and device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118521719A (en) * 2024-07-23 2024-08-20 浙江核新同花顺网络信息股份有限公司 Virtual person three-dimensional model determining method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN113920466A (en) 2022-01-11

Similar Documents

Publication Publication Date Title
US10593021B1 (en) Motion deblurring using neural network architectures
CN111047548B (en) Attitude transformation data processing method and device, computer equipment and storage medium
WO2023071806A1 (en) Apriori space generation method and apparatus, and computer device, storage medium, computer program and computer program product
Holden et al. Learning motion manifolds with convolutional autoencoders
US9349072B2 (en) Local feature based image compression
CN109684969B (en) Gaze position estimation method, computer device, and storage medium
CN113196289A (en) Human body action recognition method, human body action recognition system and device
Tuzel et al. Global-local face upsampling network
Feng et al. Mining spatial-temporal patterns and structural sparsity for human motion data denoising
Jiang et al. Dual attention mobdensenet (damdnet) for robust 3d face alignment
CN112530019B (en) Three-dimensional human body reconstruction method and device, computer equipment and storage medium
CN112541864A (en) Image restoration method based on multi-scale generation type confrontation network model
CN112215050A (en) Nonlinear 3DMM face reconstruction and posture normalization method, device, medium and equipment
CN110599395A (en) Target image generation method, device, server and storage medium
Xin et al. Facial attribute capsules for noise face super resolution
CN114283495B (en) Human body posture estimation method based on binarization neural network
CN114140831B (en) Human body posture estimation method and device, electronic equipment and storage medium
CN110598601A (en) Face 3D key point detection method and system based on distributed thermodynamic diagram
CN110516643A (en) A kind of face 3D critical point detection method and system based on joint thermodynamic chart
CN114049435A (en) Three-dimensional human body reconstruction method and system based on Transformer model
CN111524232A (en) Three-dimensional modeling method and device and server
CN111462274A (en) Human body image synthesis method and system based on SMP L model
CN114219890A (en) Three-dimensional reconstruction method, device and equipment and computer storage medium
CN113989283B (en) 3D human body posture estimation method and device, electronic equipment and storage medium
CN111488810A (en) Face recognition method and device, terminal equipment and computer readable medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885685

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE