WO2021128611A1 - Motion prediction method based on deep neural network, and intelligent terminal - Google Patents

Motion prediction method based on deep neural network, and intelligent terminal Download PDF

Info

Publication number
WO2021128611A1
WO2021128611A1 PCT/CN2020/080091 CN2020080091W WO2021128611A1 WO 2021128611 A1 WO2021128611 A1 WO 2021128611A1 CN 2020080091 W CN2020080091 W CN 2020080091W WO 2021128611 A1 WO2021128611 A1 WO 2021128611A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion
neural network
deep neural
point cloud
error
Prior art date
Application number
PCT/CN2020/080091
Other languages
French (fr)
Chinese (zh)
Inventor
胡瑞珍
黄惠
闫子豪
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Publication of WO2021128611A1 publication Critical patent/WO2021128611A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the invention relates to the technical field of deep learning, in particular to a motion prediction method based on a deep neural network, an intelligent terminal and a storage medium.
  • GANs Generative Adversarial Networks
  • the work of computer graphics also has the problem of inferring the motion of three-dimensional objects.
  • the movement of the mechanical component is explained by predicting the possible movement of the mechanical component and the entire assembly from the geometric deployment of the component. For example, creating diagram animations from concept sketches.
  • interactive landscapes are introduced, which are motion representations of objects being used in a certain way, for example, a cup is used by people to drink water.
  • This representation can then be used to classify motion into different types of interactions and also to predict the interactions supported by the object within a few seconds of its motion.
  • a structure called a motion tree is used to obtain the relative motion of objects in the scene. The structure tree is inferred from different instances of objects found in different geometric configurations.
  • the possible motion and motion parameters of the object component are predicted based on the model learned from a small amount of static motion object data set of each object.
  • This model effectively relates the geometry of the object to its possible movement. From two unsegmented functionally similar instances or objects with the same motion but in different motion states, the possible motion of the parts of the object is predicted. Although it is possible to infer the motion of objects in the scene, it is limited by the assumption that multiple instances of objects appear in the scene. The disadvantage of the data-driven method is that the object needs to be segmented well. Some shortcomings are that the input of the designed network requires a pair of objects with the same motion state but different rotation angles as input. When it is necessary to directly obtain functional predictions in a three-dimensional scene, for example, in robot navigation, it is unrealistic to expect either a pre-segmented object or a rotating object pair.
  • the present invention provides a motion prediction method based on a deep neural network, an intelligent terminal and a storage medium.
  • a method for motion prediction based on a deep neural network wherein the method for motion prediction based on a deep neural network includes:
  • the deep neural network outputs the first part and the second part of the three-dimensional point cloud, using the first part as a motion subunit, and the second part as a reference part of the motion unit;
  • the network prediction is completed according to the output of the three-dimensional point cloud, and the motion information is output.
  • the motion information includes the motion segmentation, the motion axis, and the motion type.
  • the loss function used when training the deep neural network is:
  • D t represents the displacement map
  • S represents the segmentation
  • M represents the fitting motion parameter
  • L rec is the reconstruction error
  • L disp is the displacement error
  • L seg is the segmentation error
  • L mob is the regression error of the motion parameter
  • Reconstruction error represents the degree of distortion of the shape
  • displacement error represents the accuracy of the moving part
  • segmentation error and regression error describe the correctness of the motion information, including the division of motion and immobility, the position, direction and type of motion of the motion axis.
  • L rec describes the geometric error between the predicted point cloud after motion and the real point cloud after motion
  • composition is:
  • L shape is used to punish points that do not match the target shape
  • L density is the local point density of the predicted point cloud and the target point cloud.
  • gt is the abbreviation of ground truth, which means correct.
  • the difference between the motion information and the target motion information is predicted by an error loss function; the motion type includes rotation motion and translation motion.
  • the loss function is as follows:
  • dot means dot product, Represents the displacement map of the point cloud p at the t-th frame, and d gt is the direction of the correct axis of motion; It describes whether the predicted displacement is perpendicular to the real axis of motion.
  • the specific calculation formula is:
  • is a constant
  • proj(p) represents the distance between the point p and the projection point that projects the point p onto the correct axis of motion
  • the loss function is as follows:
  • the motion information loss function is:
  • d, x and t are the direction of the motion axis, the position of the motion axis and the type of motion, respectively, d gt is the correct direction of the motion axis, x gt is the correct position of the motion axis, t gt is the correct type of motion, and H is the cross entropy .
  • the number of points in the three-dimensional point cloud is 1024.
  • An intelligent terminal wherein the intelligent terminal includes the above-mentioned deep neural network-based motion prediction system, and further includes: a memory, a processor, and a memory based on the memory and capable of running on the processor.
  • a motion prediction program based on a deep neural network which implements the steps of the above-mentioned deep neural network-based motion prediction method when the motion prediction program based on the deep neural network is executed by the processor.
  • a storage medium wherein the storage medium stores a motion prediction program based on a deep neural network, and when the motion prediction program based on the deep neural network is executed by a processor, it realizes the above-mentioned motion prediction method based on the deep neural network step.
  • the present invention uses a data set to train a deep neural network; inputs a three-dimensional point cloud to the deep neural network; the deep neural network outputs the first part and the second part of the three-dimensional point cloud, and uses the first part as a motion subunit
  • the second part is used as the reference part of the motion unit; the network prediction is completed according to the output of the three-dimensional point cloud, and the motion information is output.
  • the motion information includes the motion segmentation, the motion axis, and the motion type.
  • the present invention realizes the prediction result of simultaneous movement and components of various hinged objects that are unstructured and may be partially scanned in a static state, and can predict the movement of the object components very accurately.
  • Fig. 1 is a flowchart of a preferred embodiment of a motion prediction method based on a deep neural network of the present invention
  • FIG. 2 is a schematic diagram of the deep neural network learning a deep prediction model from a training set in a preferred embodiment of the motion prediction method based on a deep neural network of the present invention, and the training set covers various motions of different objects;
  • FIG. 3 is a schematic diagram of the structure of a long and short-term memory network in a preferred embodiment of a motion prediction method based on a deep neural network of the present invention
  • FIG. 4 is a schematic diagram of the movement type in the preferred embodiment of the motion prediction method based on the deep neural network of the present invention as a rotation movement;
  • FIG. 5 is a schematic diagram of the movement type in the preferred embodiment of the motion prediction method based on the deep neural network of the present invention is translational movement;
  • FIG. 6 is a schematic diagram of a result set of motion and component prediction in different motions of various shapes of complete and partial scans in the preferred embodiment of the motion prediction method based on the deep neural network of the present invention
  • FIG. 7 is a schematic diagram of predicting the parallel movement of the desk in the preferred embodiment of the motion prediction method based on the deep neural network of the present invention.
  • FIG. 8 is a schematic diagram of the architecture of the reference prediction network "BaseNet” in the preferred embodiment of the motion prediction method based on the deep neural network of the present invention
  • FIG. 9 is a schematic diagram of the visual comparison between MAPP-NET and BaseNet in the preferred embodiment of the motion prediction method based on the deep neural network of the present invention.
  • FIG. 10 is a schematic diagram of the visualization comparison of the prediction obtained without the reconstruction loss item L rec or the displacement loss item L disp in the preferred embodiment of the motion prediction method based on the deep neural network of the present invention
  • FIG. 11 is a schematic diagram of a visual comparison of motion parameters and segmentation results that are not obtained through network prediction in a preferred embodiment of a motion prediction method based on a deep neural network of the present invention
  • FIG. 12 is a schematic diagram of the operating environment of the preferred embodiment of the smart terminal of the present invention.
  • the motion prediction method based on a deep neural network is a motion prediction method based on a deep neural network, wherein the motion prediction method based on a deep neural network includes the following steps :
  • Step S10 use the data set to train the deep neural network
  • Step S20 input the three-dimensional point cloud to the deep neural network
  • Step S30 The deep neural network outputs the first part and the second part of the three-dimensional point cloud, using the first part as a motion subunit, and the second part as a reference part of the motion unit;
  • Step S40 Complete network prediction according to the output of the three-dimensional point cloud, and output motion information, where the motion information includes motion segmentation, motion axis, and motion type.
  • the present invention introduces a learning-based method, which simultaneously predicts a single unsegmented point cloud, which may be a partially scanned rotating part of a three-dimensional object, and their movement.
  • the deep neural network of the present invention regards the input three-dimensional object as a motion unit, and outputs two parts of the point cloud. One part is used as the motion sub-unit and the other part is used as the reference part of the motion unit. This is applied iteratively.
  • the part obtained by the network proposed by the invention can predict finer component movement, thereby obtaining the prediction of hierarchical movement and object segmentation based on movement, as shown in FIG. 2.
  • MAPP-NET Deep Neural Network
  • the learning-based method of the present invention can gather rich clues, such as the geometry of parts and their contexts from training data, and thus speculate Three-dimensional objects that have not been seen.
  • the core point of the point cloud motion prediction can be regarded as the prediction point pair and the displacement field that changes over time. It allows the network to process unstructured low-level input and take advantage of the instantaneous characteristics of motion.
  • the MAPP-NET of the present invention is implemented by a cyclic neural network, its input is a point cloud, and then the displacement of each point in the subsequent frame is predicted, and the input point cloud of each subsequent frame is the reference point.
  • the architecture of the network is composed of encoder-decoder pairs and crossed with Long Short-Term Memory (LSTM), which also predicts the displacement field of the input point cloud; the present invention also adds additional layers to the network. Infer the motion parameters based on the motion segmentation and the predicted displacement field. Therefore, given a point cloud, MAPP-NET not only infers the motion type and motion parameters (such as rotation axis) of the geometric transformation of the point, but also predicts the segmentation of their rotatable part based on the predicted motion state.
  • LSTM Long Short-Term Memory
  • the purpose of the present invention is to segment the movable part of a given three-dimensional object, determine the type of object motion, and generate a motion sequence of the next few frames of the object.
  • the object is represented by a single, unsegmented point cloud.
  • the present invention uses a deep neural network to pre-train on a data set to achieve the above goals. Therefore, the main technical problem of the present invention is how to design the network structure and loss function to accomplish the above tasks.
  • the input of the present invention is a three-dimensional point cloud with 1024 points. It is assumed that the point cloud has only one motion unit, that is, the points of the point cloud are either fixed or belong to the same motion.
  • the output is a point cloud sequence. Each point cloud in the sequence has 1024 points and corresponds to the points in the input point cloud one by one.
  • the network also predicts and outputs the motion segmentation S, the motion axis (d, x), and the motion type t.
  • the core of the network is to use a cyclic neural network to predict the displacement of the point in the point cloud, and the displacement is the representation of the movement.
  • Recurrent neural network is used because this kind of network has a good effect in processing sequence data.
  • the present invention uses a long and short-term memory network, and uses the network structure collection abstraction layer SA and the feature transfer layer FP in PointNet++.
  • Figure 3 illustrates the structure of the network in detail.
  • the input point cloud P 0 enters the cyclic neural network after passing through a collective abstraction layer. It contains several sub-networks.
  • the sub-networks are composed of a feature transfer layer and a fully connected layer.
  • the network outputs the motion prediction of a certain frame, that is, the displacement D.
  • the point cloud P of several frames after the movement is obtained.
  • segmentation and motion information can be analyzed after passing through some layers. After passing several frames of displacement information into a fully connected layer, the segmentation of the point cloud can be obtained.
  • Motion information can also be obtained separately by a similar method, but the incoming information is several frames of point cloud information after motion instead of displacement, and because of the overall consideration, it is necessary to add a collection abstraction layer before the fully connected layer. The reason why point cloud is used instead of displacement is because it is found in the experiment that the former can have higher accuracy.
  • the specific structure can be seen in Figure 3;
  • the present invention designs the following loss function:
  • D t represents the displacement map
  • S represents the segmentation
  • M represents the fitting motion parameter
  • L rec is the reconstruction error
  • L disp is the displacement error
  • L seg is the segmentation error
  • L mob is the regression error of the motion parameter
  • Reconstruction error represents the degree of distortion of the shape
  • displacement error represents the accuracy of the moving part
  • segmentation error and regression error describe the correctness of the motion information, including the division of motion and immobility, the position, direction and type of motion of the motion axis.
  • L rec describes the geometric error between the predicted point cloud after motion and the real point cloud after motion
  • composition is:
  • L shape is used to punish points that do not match the target shape
  • L density is the local point density of the predicted point cloud and the target point cloud.
  • gt is the abbreviation of ground truth, which means correct.
  • the difference between the motion information and the target motion information is predicted by the error loss function; the motion type includes rotation motion and translation motion.
  • this displacement loss function can measure the difference between the predicted motion information and the target motion information. As mentioned earlier, this is also for the motion part of the point cloud. As there are different types of exercise, there are also different forms. The present invention only considers two types of motions, rotation and translation.
  • dot means dot product, Represents the displacement map of the point cloud p at the t-th frame, d gt is the direction of the correct axis of motion; It is the deviation of the rotation angle of each point, and the rotation angle of all points is the same.
  • the specific calculation formula is:
  • is a constant
  • proj(p) represents the distance between the point p and the projection point that projects the point p onto the correct axis of motion
  • Split loss function It is a multinomial logistic regression cross entropy (softmax cross entropy) of predicted segmentation and true segmentation.
  • the motion information loss function is:
  • d, x, and t are the direction of the motion axis, the position of the motion axis, and the type of motion, respectively, d gt is the correct motion axis direction, x gt is the correct motion axis position, t gt is the correct motion type, and H is the cross entropy .
  • the present invention completes the prediction of the future motion of the object by introducing a new cyclic neural network structure and several novel loss functions, which include the point cloud state at several moments in the future, the segmentation of the motion part, the motion type and the motion parameters.
  • the present invention demonstrates the use of MAPP-NET to obtain the motility prediction, and evaluates the different components of the method.
  • the present invention uses the loss function defined by the following formula (1) and the Adam random optimization sub-training network.
  • the motion unit data set is used.
  • the present invention samples the visible surface of the unit to create a point cloud, which is called a "full scan”.
  • the present invention divides the data set into training/testing units according to the division ratio of 90/10.
  • the present invention also obtains a set of partial scans from the test set for additional evaluation.
  • Figure 6 shows an example of motion prediction on the test unit for complete and partial scans.
  • the predicted frame of the first 5 frames of each input point cloud is shown, and the predicted transformation axis, reference part and moving part are drawn.
  • MAPP-NET predicts the correct part movement and generates the corresponding movement sequence for different objects with different types of motion.
  • the method of the present invention accurately predicts the rotational motion of the shape of different axis directions and positions, including horizontal and vertical axis directions, such as the flip phone shown in the first row (left) and the second row (left) shows Rotating flash drive device (U disk).
  • the method of the present invention also accurately predicts the position of the axle, such as the luggage case shown in the fourth row (left) and the stacker example shown in the second row (right).
  • MAPP-NET can predict the correct direction of its opening by translation, although the data only shows the front surface of the drawer, without the internal structure; because the object is wrapped The reference part is too large.
  • a similar result was found for the handle of the drawer in the third row (left), but a different type of movement was predicted.
  • the method of the present invention has learned to stop generating new points after finding the stop state of motion. This shows that the method can predict the range of motion.
  • MAPP-NET can also predict the movement of multiple parts of the same object.
  • the method of the present invention can either predict multiple motions iteratively, as shown in Fig. 2; or predict the motions of different components at the same time, especially components of different motion types. This is feasible because the present invention trains a single network to predict all the different types of motion, such as translation and rotation.
  • the present invention shows all 5 consecutive frames of the predicted segmentation. The moving parts of the generated frame (red) are shown in a lighter color when they are closer to the input frame.
  • E angle arccos(
  • the second measurement method calculates the error of the shaft position:
  • Table 1 Motion prediction error of the method of the present invention and BaseNet
  • BaseNet takes the point cloud P 0 as input, and uses a standard network architecture to directly estimate the segmentation S and the motion parameter M.
  • the network consists of an encoder/decoder pair and a fully connected layer, as shown in Figure 8.
  • the loss function of BaseNet is:
  • Table 1 shows the comparison between MAPP-NET and BaseNet between complete and partial scans. It can be found that the segmentation error E seg and motion type error E type of BaseNet are equivalent to the method of the present invention, but its axis direction error E angle and axis position The error E dist is at least 5% higher than that of the present invention.
  • the main reason for the difference in results may be that segmentation and classification tasks are simpler than motion prediction. Network architectures like PointNet++ have shown that good results can be achieved on those two tasks, but for motion prediction, a single input frame may lead to ambiguity in speculation.
  • the present invention uses a recurrent neural network to generate a sequence of multiple frames describing motion, which more restricts inference. As a result, the prediction of motion parameters is more accurate.
  • Figure 9 shows a visual comparison between the method of the present invention and BaseNet on some examples. Because BaseNet does not generate motion frames, it shows the axis of segmentation and prediction on the input point cloud. However, for the method of the present invention, the predicted segmentation and the axis of 5 consecutive frames are shown together. The moving parts of the generated frame are expressed in lighter colors when they are closer to the input frame. For the translation and rotation of complete and partial scans, BaseNet is more likely to predict the wrong type of motion, resulting in prediction errors of complex shapes. For example, for the keyboard drawer under the desk, the direction of the sliding motion is incorrectly predicted.
  • L rec and L disp are the loss function items of the predicted displacement map D t or the point cloud P t compared with the benchmark.
  • the result of the method of the present invention is compared with that without adding any of these two items. Compare the results of the items.
  • the second and third rows of Table 2 show the error values obtained in this experiment, compared to the sixth row using the complete loss function of the present invention. Compared with the complete version of the loss function of the present invention, removing one of L rec and L disp increases the error, and more importantly, as shown in Figure 10, the intermediate prediction sequence compares the results obtained by using the complete loss function , Its quality is worse.
  • the complete method of the present invention can predict an accurate and smooth movement of the moving part and can also keep the reference part unchanged.
  • the network of the present invention generates a point cloud motion sequence P t from the displacement map D t , which can be directly used to fit the motion parameter M; however, for the segmentation S, the present invention can filter some points, depending on whether they are Move more than the appropriate threshold ⁇ of the displacement map P t , thereby dividing the points into moving and stationary (reference) points.
  • the optimal rigid transformation matrix is calculated, which has the smallest mean variance for transforming one frame to the next, and extracts: axis direction of translation, axis direction of rotation, and position.
  • the fourth and fifth rows of Table 2 show the error values of this experiment.
  • This motion fitting method is very sensitive to noise and causes large errors; however, the prediction obtained by using the complete network of the present invention is more stable and provides better results.
  • the comparison between the result of the motion parameter fitting and the result of the present invention is shown in FIG. 11, it can be seen that there is no motion loss term L mob and segmentation loss term L seg , and some outliers will cause large errors in axis fitting.
  • L mob the motion loss term
  • L seg segmentation loss term
  • the noise of the displacement of different points can also cause large errors in the axis fitting.
  • most of the points do not move except for the lower part of the object, which causes the position of the fitted axis to deviate from the center of the wheel.
  • the error defined on D t affects the generation of P t and also affects D t+1 . If only the reconstruction loss item L rec is independently measured on each D t , then the accumulated error cannot be accurately taken into consideration in the learning process. Conversely, P t is obtained by applying all previous displacement maps To the input point cloud P 0 . Therefore, by defining the reconstruction loss term L rec on each P t , the loss term provides a more global limit in the error of the generated sequence. The definition of the motion loss term L mob is also applied with similar parameters.
  • Table 3 Use D t instead of P t to define the comparison of reconstruction loss term L rec and motion loss term L mob.
  • the last line corresponds to the method of the present invention to define both loss terms on P t , and obtain the lowest error
  • the method of the present invention exhibits high accuracy in predicting the motion of an object with a single moving part. Therefore, the method of the present invention can be regarded as a good basic module for predicting the motion of an object in more situations.
  • FIG. 2 and FIG. 7 show the potential of the method of the present invention for detecting multiple moving parts in an object, including movement occurring in a parallel manner or movement in a hierarchical sequence.
  • further experiments are needed to quantitatively evaluate the method of the present invention, and it may be necessary to construct a data set of objects with multiple movable parts and their known motion parameters and segmentation.
  • the current data set of the present invention assumes that the shape is a meaningful orientation and the data set is relatively small, consisting of 276 motion units.
  • Another more direct improvement method that can be applied to more complex scenes is to enhance the data set of the present invention, by applying random transformation to the motion unit, so that the network of the present invention can operate in a pose-invariant manner, or use partial scanning To train the network to improve its anti-interference ability.
  • Another direction of future work is to use the motion predicted by the method of the present invention to synthesize the motion of the input shape.
  • an interesting sub-problem is learning how to complement the geometry of an object.
  • the geometry of the object may be lost when motion occurs. For example, a drawer drawn from a cabinet should show its interior. , If the shape is scanned or the interior is not modeled, the interior will be lost.
  • One possible method is to learn how to synthesize the missing geometry from the predicted motion and existing part geometry. This method requires at least to build a training set in the form of pre-segmented objects and model all its internal details.
  • the present invention introduces a loss function composed of a reconstruction loss function and a displacement loss function, which ensures that the shape of the object is maintained and the motion is accurately predicted.
  • the reconstruction loss measures the extent to which the shape of the object is maintained during the movement
  • the displacement loss measures the extent to which the displacement field describes the movement. Show that: Compared with other alternative methods, this loss function can bring the most accurate prediction.
  • RNN Recurrent Neural Network
  • the present invention shows that MAPP-NET can predict the movement of object parts very accurately. These objects are a variety of objects with different types of motion (including rotation and translation transformation), which can be a complete point cloud of a 3D object or a partial scan. the result of. In addition, the rationality of the method of the present invention was verified and compared with the benchmark method. Finally, the present invention shows the preliminary results, that is, the network proposed by the present invention has the potential to segment objects composed of multiple moving parts in a hierarchical manner, and predict the movement of multiple components at the same time.
  • the present invention classifies the functional visibility analysis problem as segmenting the input geometry and marking the motion type and parameters of each segmentation; therefore, the deep neural network proposed by the present invention learns from pre-segmentation and three-dimensional shapes of known motions , And then perform segmentation and prediction.
  • the deep neural network MAPP-NET of the present invention predicts the movement of a part from a three-dimensional point cloud shape, but does not need to segment the shape; the present invention trains a deep learning model to simultaneously segment the input shape and predict its parts The movement is thus achieved.
  • the network of the present invention is trained on the motion unit data set with the reference segmentation and motion parameters identified; once the training is completed, it can be used to predict the motion of a single unsegmented point cloud representing a static state of the object .
  • the present invention introduces a loss function composed of a reconstruction loss function and a displacement loss function, which ensures that the shape of the object is maintained while also accurately predicting the movement; the reconstruction loss measures the degree to which the shape of the object is maintained during the movement.
  • the displacement loss measures the extent to which the displacement field describes the motion; it shows that compared with other alternative methods, this loss function can bring the most accurate prediction.
  • RNN Recurrent Neural Network
  • MAPP-NET can predict the movement of object parts very accurately.
  • These objects are a variety of objects with different motion types (including rotation and translation transformation), which can be a complete point cloud of 3D objects or It is the result of partial scanning.
  • the present invention shows preliminary results.
  • the network proposed by the present invention has the potential to segment objects composed of multiple moving parts in a hierarchical manner, and predict the movement of multiple components at the same time.
  • the present invention also provides an intelligent terminal correspondingly.
  • the intelligent terminal includes a processor 10, a memory 20 and a display 30.
  • FIG. 7 only shows part of the components of the smart terminal, but it should be understood that it is not required to implement all the shown components, and more or fewer components may be implemented instead.
  • the memory 20 may be an internal storage unit of the smart terminal, such as a hard disk or a memory of the smart terminal.
  • the memory 20 may also be an external storage device of the smart terminal, such as a plug-in hard disk equipped on the smart terminal, a smart media card (SMC), and a secure digital (Secure Digital). Digital, SD) card, flash card, etc.
  • the memory 20 may also include both an internal storage unit of the smart terminal and an external storage device.
  • the memory 20 is used to store application software and various types of data installed on the smart terminal, such as the program code of the installed smart terminal.
  • the memory 20 can also be used to temporarily store data that has been output or will be output.
  • a motion prediction program 40 based on a deep neural network is stored in the memory 20, and the motion prediction program 40 based on a deep neural network can be executed by the processor 10, so as to realize the motion based on the deep neural network in this application. method of prediction.
  • the processor 10 may be a central processing unit (CPU), microprocessor or other data processing chip in some embodiments, and is used to run the program code or process data stored in the memory 20, for example Perform the motion prediction method based on the deep neural network and so on.
  • CPU central processing unit
  • microprocessor or other data processing chip in some embodiments, and is used to run the program code or process data stored in the memory 20, for example Perform the motion prediction method based on the deep neural network and so on.
  • the display 30 may be an LED display, a liquid crystal display, a touch liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like.
  • the display 30 is used for displaying information on the smart terminal and for displaying a visualized user interface.
  • the components 10-30 of the smart terminal communicate with each other via a system bus.
  • the deep neural network outputs the first part and the second part of the three-dimensional point cloud, using the first part as a motion subunit, and the second part as a reference part of the motion unit;
  • the network prediction is completed according to the output of the three-dimensional point cloud, and the motion information is output.
  • the motion information includes the motion segmentation, the motion axis, and the motion type.
  • the present invention also provides a storage medium, wherein the storage medium stores a motion prediction program based on a deep neural network, and the motion prediction program based on the deep neural network is executed by a processor to realize the motion based on the deep neural network.
  • the steps of the prediction method; the details are as described above.
  • the present invention provides a motion prediction method and smart terminal based on a deep neural network, the method includes: training a deep neural network using a data set; inputting a three-dimensional point cloud to the deep neural network; The deep neural network outputs the first part and the second part of the 3D point cloud, using the first part as the motion subunit, and the second part as the reference part of the motion unit; complete network prediction based on the output of the 3D point cloud , Output motion information, the motion information includes motion segmentation, motion axis and motion type.
  • the present invention realizes the prediction result of simultaneous movement and components of various hinged objects that are unstructured and may be partially scanned in a static state, and can predict the movement of the object components very accurately.
  • the processes in the methods of the above-mentioned embodiments can be implemented by instructing relevant hardware (such as a processor, a controller, etc.) through a computer program, and the program can be stored in a computer program.
  • the program may include the processes of the foregoing method embodiments when executed.
  • the storage medium mentioned may be a memory, a magnetic disk, an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Disclosed are a motion prediction method based on a deep neural network, and an intelligent terminal. The method comprises: training a deep neural network using a data set; inputting a three-dimensional point cloud into the deep neural network; outputting a first portion and a second portion of the three-dimensional point cloud by means of the deep neural network, wherein the first portion is used as a motion sub-unit, and the second portion is used as a reference portion of a motion unit; and completing network prediction according to the output of the three-dimensional point cloud, and outputting motion information, wherein the motion information comprises motion segmentation, a motion axis and a motion type. According to the present invention, prediction results of simultaneous motion, in a static state, and components of various hinged objects, which are not structured and are possibly partially scanned, are obtained, and the motion of object components can be predicted very accurately.

Description

一种基于深度神经网络的运动预测方法和智能终端A motion prediction method and intelligent terminal based on deep neural network 技术领域Technical field
本发明涉及深度学习技术领域,尤其涉及一种基于深度神经网络的运动预测方法、智能终端及存储介质。The invention relates to the technical field of deep learning, in particular to a motion prediction method based on a deep neural network, an intelligent terminal and a storage medium.
背景技术Background technique
近年来,计算机图形学以及相关领域例如计算机视觉和机器人领域已经聚焦于推断三维物体和它们部件的可能运动,因为这个问题对于理解物体直观功能性(affordances)以及功能性等问题密切相关。该问题比较难解决的是当仅仅给定一个三维物体的几个静止状态,机器是否以及如何能够学习来预测部件运动或者部件可运动性。In recent years, computer graphics and related fields such as computer vision and robotics have focused on inferring the possible movement of three-dimensional objects and their components, because this problem is closely related to the understanding of the intuitive functionality and functionality of objects. The more difficult problem to solve is whether and how the machine can learn to predict component motion or component motion when given only a few static states of a three-dimensional object.
已有方法提出根据物体运动来获得并且重建,表示并且理解物体运动,甚至根据静止物体预测部件运动,这些工作背后的动机是更全面地理解物体的运动有助于图形学应用,例如,动画,物体位姿修正和重建,以及机器人应用,如3D场景中人机交互的建模。Existing methods have proposed to obtain and reconstruct according to the movement of the object, represent and understand the movement of the object, and even predict the movement of the part based on the stationary object. The motivation behind these works is to understand the movement of the object more comprehensively, which is helpful for graphics applications, such as animation. Object pose correction and reconstruction, and robot applications, such as the modeling of human-computer interaction in 3D scenes.
在机器人领域,大量工作聚焦于功能可见性预测的问题,它们的目标是识别出物体中能进行特定交互的区域,例如,抓或者推。最近应用到深度神经网络来标记功能可见性标签的图像,或者物理模拟来得到与功能可见性密切相关的人类效用。功能可见性分析的更通用的方法是基于人体位姿假设的想法,预测拟合给定场景上下文的最佳人体位姿以辅助理解场景。基于人与物体的交互,人体位姿假设也能用于预测物体的功能类别。与功能可见性和人体位姿分析密切相关的是活动识别,其中一个例子是在输入场景中检测活动区域,这些区域支持具体类别的人类活动,例如吃饭或者看电视。尽管功能可见性检测识别能进行特定运动类型的区域,例如转动或者滑动;预测的运动仅仅用标签来描述,并且局限于和人的交互。因此,他们不能表示一个物体的一般运动。功能可见性分析的更通用的方法的焦点是以高层次来理解与特定物体交互的动作或者是在给定场景下的动作,然而这些方法不能检测或者建模与这些动作相关的具体运动或者部件运动。In the field of robotics, a lot of work has focused on the problem of functional visibility prediction. Their goal is to identify areas in objects that can perform specific interactions, such as grasping or pushing. Recently applied to deep neural networks to mark images of functional visibility labels, or physical simulations to obtain human utility closely related to functional visibility. A more general method of functional visibility analysis is based on the idea of human pose hypothesis, predicting the best human pose that fits a given scene context to assist in understanding the scene. Based on the interaction between people and objects, the posture assumption of the human body can also be used to predict the functional category of the object. Closely related to functional visibility and human pose analysis is activity recognition. One example is the detection of active areas in the input scene. These areas support specific types of human activities, such as eating or watching TV. Although functional visibility detection identifies areas that can perform a specific type of movement, such as turning or sliding; the predicted movement is only described by tags and is limited to human interaction. Therefore, they cannot represent the general motion of an object. The focus of more general methods of functional visibility analysis is to understand the actions that interact with specific objects or actions in a given scene at a high level. However, these methods cannot detect or model specific movements or parts related to these actions. movement.
在计算机视觉中,已提出基于当前物体的描述,推测未来物体的状态的方法, 这些方法隐式预测图像中的物体正进行的运动以及未来的运动。通用的解决方法是利用在视频数据上训练生成对抗网络(GANs)来生成输入图像的后续帧。另一方面,将视频分解成内容和运动组件,之后根据选定的内容和运动,将分解得到的内容和运动组件创造视频的后续帧。In computer vision, methods have been proposed to infer the state of future objects based on the description of the current object. These methods implicitly predict the movement of the object in the image and the future movement. The general solution is to train Generative Adversarial Networks (GANs) on video data to generate subsequent frames of the input image. On the other hand, the video is decomposed into content and motion components, and then based on the selected content and motion, the decomposed content and motion components are used to create subsequent frames of the video.
计算机图形学的工作也有针对三维物体进行运动推断的问题。通过从部件的几何部署预测机械部件和整个组件的可能运动来说明机械组件的运动。例如从概念草图创建图表动画。对于更一般的形状,引入了互动地形(interaction landscapes),它是物体以某种方式被使用的动作表示,例如,杯子被人用来喝水。然后,这种表示可以用于将运动分类为不同类型的交互并且还用于预测物体在其运动的几秒内支持的交互。例如使用一个称为运动树的结构,得到了场景中的物体的相对运动。结构树是根据在不同的几何配置中找到物体的不同实例推断得到的。当给定一个部件分割好的三维物体,基于从每个物体的少量静止运动状态的物体数据集学习得到的模型来预测物体部件可能进行的运动以及运动参数。这个模型有效地将物体的几何关联到它的可能运动。从两个未分割的功能性相似的实例或者运动一样但在不同的运动状态的物体来预测物体的部件进行的可能运动。虽然能够推断场景中物体的运动,但是它受限于要得到场景中出现的多个物体实例这个假设。数据驱动方法缺点是需要物体被很好地分割。有的不足是设计的网络的输入需要一对运动状态一样而旋转角度大小不同的物体作为输入。当需要在三维场景中直接获得功能性预测,例如,在机器人导航中,希望要么预分割的物体,要么旋转的物体对,这都是不现实的。The work of computer graphics also has the problem of inferring the motion of three-dimensional objects. The movement of the mechanical component is explained by predicting the possible movement of the mechanical component and the entire assembly from the geometric deployment of the component. For example, creating diagram animations from concept sketches. For more general shapes, interactive landscapes are introduced, which are motion representations of objects being used in a certain way, for example, a cup is used by people to drink water. This representation can then be used to classify motion into different types of interactions and also to predict the interactions supported by the object within a few seconds of its motion. For example, a structure called a motion tree is used to obtain the relative motion of objects in the scene. The structure tree is inferred from different instances of objects found in different geometric configurations. When a three-dimensional object segmented by a component is given, the possible motion and motion parameters of the object component are predicted based on the model learned from a small amount of static motion object data set of each object. This model effectively relates the geometry of the object to its possible movement. From two unsegmented functionally similar instances or objects with the same motion but in different motion states, the possible motion of the parts of the object is predicted. Although it is possible to infer the motion of objects in the scene, it is limited by the assumption that multiple instances of objects appear in the scene. The disadvantage of the data-driven method is that the object needs to be segmented well. Some shortcomings are that the input of the designed network requires a pair of objects with the same motion state but different rotation angles as input. When it is necessary to directly obtain functional predictions in a three-dimensional scene, for example, in robot navigation, it is unrealistic to expect either a pre-segmented object or a rotating object pair.
因此,现有技术还有待于改进和发展。Therefore, the existing technology needs to be improved and developed.
发明内容Summary of the invention
本发明针对现有技术的上述缺陷,本发明提供一种基于深度神经网络的运动预测方法、智能终端及存储介质。In view of the above-mentioned defects of the prior art, the present invention provides a motion prediction method based on a deep neural network, an intelligent terminal and a storage medium.
本发明解决技术问题所采用的技术方案如下:The technical solutions adopted by the present invention to solve the technical problems are as follows:
一种基于深度神经网络的运动预测方法,其中,所述基于深度神经网络的运动预测方法包括:A method for motion prediction based on a deep neural network, wherein the method for motion prediction based on a deep neural network includes:
使用数据集训练深度神经网络;Use data sets to train deep neural networks;
将三维点云输入至所述深度神经网络;Inputting a three-dimensional point cloud to the deep neural network;
所述深度神经网络输出所述三维点云的第一部分和第二部分,将所述第一部分作为运动子单元,所述第二部分作为运动单元的参考部分;The deep neural network outputs the first part and the second part of the three-dimensional point cloud, using the first part as a motion subunit, and the second part as a reference part of the motion unit;
根据所述三维点云的输出完成网络预测,输出运动信息,所述运动信息包括运动性分割、运动轴和运动类型。The network prediction is completed according to the output of the three-dimensional point cloud, and the motion information is output. The motion information includes the motion segmentation, the motion axis, and the motion type.
所述的基于深度神经网络的运动预测方法,其中,在训练所述深度神经网络时,所使用的损失函数为:In the motion prediction method based on the deep neural network, the loss function used when training the deep neural network is:
Figure PCTCN2020080091-appb-000001
Figure PCTCN2020080091-appb-000001
其中,D t表示位移图,S表示分割,M表示拟合运动参数,L rec是重建误差,L disp是位移误差,L seg是分割误差,L mob是运动参数的回归误差; Among them, D t represents the displacement map, S represents the segmentation, M represents the fitting motion parameter, L rec is the reconstruction error, L disp is the displacement error, L seg is the segmentation error, and L mob is the regression error of the motion parameter;
重建误差表示形状的扭曲程度,位移误差表示运动部分的精确度,分割误差和回归误差则刻画了运动信息的正确程度,包括对运动与不动的划分、运动轴的位置、方向和运动类型。Reconstruction error represents the degree of distortion of the shape, displacement error represents the accuracy of the moving part, segmentation error and regression error describe the correctness of the motion information, including the division of motion and immobility, the position, direction and type of motion of the motion axis.
所述的基于深度神经网络的运动预测方法,其中,L rec刻画了预测的运动后点云与真实的运动后点云之间的几何误差; In the motion prediction method based on the deep neural network, L rec describes the geometric error between the predicted point cloud after motion and the real point cloud after motion;
将点云P 0分成参照部分和运动部分,在经历过运动后
Figure PCTCN2020080091-appb-000002
后,参照部分保持静止,运动部分为刚性运动,其中,P t-1和P t表示两个邻接的点云帧,因此L rec分为两部分:
Divide the point cloud P 0 into a reference part and a motion part, after experiencing motion
Figure PCTCN2020080091-appb-000002
Later, the reference part remains stationary, and the moving part is rigid. Among them, P t-1 and P t represent two adjacent point cloud frames, so L rec is divided into two parts:
Figure PCTCN2020080091-appb-000003
Figure PCTCN2020080091-appb-000003
Figure PCTCN2020080091-appb-000004
是参照部分的误差,
Figure PCTCN2020080091-appb-000005
是运动部分的误差;
Figure PCTCN2020080091-appb-000004
Is the error of the reference part,
Figure PCTCN2020080091-appb-000005
Is the error of the moving part;
Figure PCTCN2020080091-appb-000006
是每个点误差距离的平方和:
Figure PCTCN2020080091-appb-000006
Is the sum of the squares of the error distance of each point:
Figure PCTCN2020080091-appb-000007
Figure PCTCN2020080091-appb-000007
其中,
Figure PCTCN2020080091-appb-000008
是点p真实的位置;
among them,
Figure PCTCN2020080091-appb-000008
Is the real position of point p;
Figure PCTCN2020080091-appb-000009
的构成为:
Figure PCTCN2020080091-appb-000009
The composition is:
Figure PCTCN2020080091-appb-000010
Figure PCTCN2020080091-appb-000010
其中,L shape是用来惩罚不与目标形状匹配的点,L density是预测点云与目标点云局部点密度,
Figure PCTCN2020080091-appb-000011
指所述深度神经网络生成的第t帧的点云中的运动部分,
Figure PCTCN2020080091-appb-000012
指正确的第t帧的点云中的运动部分,gt是ground truth的缩写,表示正确的意思。
Among them, L shape is used to punish points that do not match the target shape, and L density is the local point density of the predicted point cloud and the target point cloud.
Figure PCTCN2020080091-appb-000011
Refers to the moving part of the point cloud of the t-th frame generated by the deep neural network,
Figure PCTCN2020080091-appb-000012
Refers to the moving part of the point cloud of the correct t-th frame. gt is the abbreviation of ground truth, which means correct.
所述的基于深度神经网络的运动预测方法,其中,通过误差损失函数预测运动信息与目标运动信息之间的差别;所述运动类型包括旋转运动和平移运动。In the motion prediction method based on the deep neural network, the difference between the motion information and the target motion information is predicted by an error loss function; the motion type includes rotation motion and translation motion.
所述的基于深度神经网络的运动预测方法,其中,对于旋转运动,损失函数如下:In the motion prediction method based on the deep neural network, for the rotation motion, the loss function is as follows:
Figure PCTCN2020080091-appb-000013
Figure PCTCN2020080091-appb-000013
其中,dot表示点乘,
Figure PCTCN2020080091-appb-000014
表示第t帧点云p的位移图,d gt是正确的运动轴的方向;
Figure PCTCN2020080091-appb-000015
刻画了预测的位移是否垂直于真实的运动轴,具体的计算公式为:
Among them, dot means dot product,
Figure PCTCN2020080091-appb-000014
Represents the displacement map of the point cloud p at the t-th frame, and d gt is the direction of the correct axis of motion;
Figure PCTCN2020080091-appb-000015
It describes whether the predicted displacement is perpendicular to the real axis of motion. The specific calculation formula is:
Figure PCTCN2020080091-appb-000016
Figure PCTCN2020080091-appb-000016
Figure PCTCN2020080091-appb-000017
则是各个点旋转角的偏差,所有点旋转角度一致,具体计算公式为:
Figure PCTCN2020080091-appb-000017
It is the deviation of the rotation angle of each point, and the rotation angle of all points is the same. The specific calculation formula is:
Figure PCTCN2020080091-appb-000018
Figure PCTCN2020080091-appb-000018
其中,σ为常数,proj(p)表示p点与将点p投影到正确运动轴上的投影点的距离,
Figure PCTCN2020080091-appb-000019
Among them, σ is a constant, proj(p) represents the distance between the point p and the projection point that projects the point p onto the correct axis of motion,
Figure PCTCN2020080091-appb-000019
Figure PCTCN2020080091-appb-000020
要求每个点旋转前和旋转后距离真实转轴的距离相同,约束其运动的圆周性,具体计算公式为:
Figure PCTCN2020080091-appb-000020
It is required that each point is the same distance from the real axis of rotation before and after rotation, and the circularity of its motion is restricted. The specific calculation formula is:
Figure PCTCN2020080091-appb-000021
Figure PCTCN2020080091-appb-000021
所述的基于深度神经网络的运动预测方法,其中,对于平移运动,损失函数如下:In the motion prediction method based on the deep neural network, for translational motion, the loss function is as follows:
Figure PCTCN2020080091-appb-000022
Figure PCTCN2020080091-appb-000022
Figure PCTCN2020080091-appb-000023
刻画了预测的位移是否平行于真实的运动轴,具体计算公式为:
Figure PCTCN2020080091-appb-000023
It describes whether the predicted displacement is parallel to the real axis of motion. The specific calculation formula is:
Figure PCTCN2020080091-appb-000024
Figure PCTCN2020080091-appb-000024
Figure PCTCN2020080091-appb-000025
则要求每个点移动的距离相同,方差为0,具体计算公式为:
Figure PCTCN2020080091-appb-000025
It is required that the distance moved by each point is the same, and the variance is 0. The specific calculation formula is:
Figure PCTCN2020080091-appb-000026
Figure PCTCN2020080091-appb-000026
所述的基于深度神经网络的运动预测方法,其中,运动信息损失函数为:In the motion prediction method based on a deep neural network, the motion information loss function is:
Figure PCTCN2020080091-appb-000027
Figure PCTCN2020080091-appb-000027
其中,d、x和t分别为运动轴方向、运动轴位置和运动类型,d gt是正确的运动轴方向,x gt是正确的运动轴位置,t gt是正确的运动类型,H是交叉熵。 Among them, d, x and t are the direction of the motion axis, the position of the motion axis and the type of motion, respectively, d gt is the correct direction of the motion axis, x gt is the correct position of the motion axis, t gt is the correct type of motion, and H is the cross entropy .
所述的基于深度神经网络的运动预测方法,其中,所述三维点云的点数为1024个。In the motion prediction method based on a deep neural network, the number of points in the three-dimensional point cloud is 1024.
一种智能终端,其中,所述智能终端包括如上所述的基于深度神经网络的运动预测系统,还包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的基于深度神经网络的运动预测程序,所述基于深度神经网络的运动预测程序被所述处理器执行时实现如上所述的基于深度神经网络的运动预测方法的步骤。An intelligent terminal, wherein the intelligent terminal includes the above-mentioned deep neural network-based motion prediction system, and further includes: a memory, a processor, and a memory based on the memory and capable of running on the processor. A motion prediction program based on a deep neural network, which implements the steps of the above-mentioned deep neural network-based motion prediction method when the motion prediction program based on the deep neural network is executed by the processor.
一种存储介质,其中,所述存储介质存储有基于深度神经网络的运动预测程序,所述基于深度神经网络的运动预测程序被处理器执行时实现如上所述基于深度神经网络的运动预测方法的步骤。A storage medium, wherein the storage medium stores a motion prediction program based on a deep neural network, and when the motion prediction program based on the deep neural network is executed by a processor, it realizes the above-mentioned motion prediction method based on the deep neural network step.
本发明使用数据集训练深度神经网络;将三维点云输入至所述深度神经网络;所述深度神经网络输出所述三维点云的第一部分和第二部分,将所述第一部分作为运动子单元,所述第二部分作为运动单元的参考部分;根据所述三维点云的输出完成网络预测,输出运动信息,所述运动信息包括运动性分割、运动轴和运动类型。本发明实现了在非结构化并且可能是部分扫描的各种铰链式物体在静止状态下同时运动和部件的预测结果,能够十分准确地预测物体部件的运动。The present invention uses a data set to train a deep neural network; inputs a three-dimensional point cloud to the deep neural network; the deep neural network outputs the first part and the second part of the three-dimensional point cloud, and uses the first part as a motion subunit The second part is used as the reference part of the motion unit; the network prediction is completed according to the output of the three-dimensional point cloud, and the motion information is output. The motion information includes the motion segmentation, the motion axis, and the motion type. The present invention realizes the prediction result of simultaneous movement and components of various hinged objects that are unstructured and may be partially scanned in a static state, and can predict the movement of the object components very accurately.
附图说明Description of the drawings
图1是本发明基于深度神经网络的运动预测方法的较佳实施例的流程图;Fig. 1 is a flowchart of a preferred embodiment of a motion prediction method based on a deep neural network of the present invention;
图2是本发明基于深度神经网络的运动预测方法的较佳实施例中深度神经 网络从一个训练集学习深度预测模型,该训练集涵盖了不同物体的各种运动的示意图;2 is a schematic diagram of the deep neural network learning a deep prediction model from a training set in a preferred embodiment of the motion prediction method based on a deep neural network of the present invention, and the training set covers various motions of different objects;
图3是本发明基于深度神经网络的运动预测方法的较佳实施例中长短期记忆网络的结构示意图;3 is a schematic diagram of the structure of a long and short-term memory network in a preferred embodiment of a motion prediction method based on a deep neural network of the present invention;
图4是本发明基于深度神经网络的运动预测方法的较佳实施例中运动类型为旋转运动的示意图;FIG. 4 is a schematic diagram of the movement type in the preferred embodiment of the motion prediction method based on the deep neural network of the present invention as a rotation movement; FIG.
图5是本发明基于深度神经网络的运动预测方法的较佳实施例中运动类型为平移运动的示意图;5 is a schematic diagram of the movement type in the preferred embodiment of the motion prediction method based on the deep neural network of the present invention is translational movement;
图6是本发明基于深度神经网络的运动预测方法的较佳实施例中在完整和部分扫描的多种形状的不同运动进行运动和部件预测结果集的示意图;FIG. 6 is a schematic diagram of a result set of motion and component prediction in different motions of various shapes of complete and partial scans in the preferred embodiment of the motion prediction method based on the deep neural network of the present invention;
图7是本发明基于深度神经网络的运动预测方法的较佳实施例中预测课桌的并行运动的示意图;7 is a schematic diagram of predicting the parallel movement of the desk in the preferred embodiment of the motion prediction method based on the deep neural network of the present invention;
图8是本发明基于深度神经网络的运动预测方法的较佳实施例中基准预测网络“BaseNet(基网)”架构的示意图;FIG. 8 is a schematic diagram of the architecture of the reference prediction network "BaseNet" in the preferred embodiment of the motion prediction method based on the deep neural network of the present invention;
图9是本发明基于深度神经网络的运动预测方法的较佳实施例中MAPP-NET与BaseNet之间的可视化对比的示意图;9 is a schematic diagram of the visual comparison between MAPP-NET and BaseNet in the preferred embodiment of the motion prediction method based on the deep neural network of the present invention;
图10是本发明基于深度神经网络的运动预测方法的较佳实施例中与没有重建损失项L rec或者位移损失项L disp得到的预测的可视化对比的示意图; FIG. 10 is a schematic diagram of the visualization comparison of the prediction obtained without the reconstruction loss item L rec or the displacement loss item L disp in the preferred embodiment of the motion prediction method based on the deep neural network of the present invention;
图11是本发明基于深度神经网络的运动预测方法的较佳实施例中运动参数和分割不是通过网络预测得到的结果的可视化对比的示意图;FIG. 11 is a schematic diagram of a visual comparison of motion parameters and segmentation results that are not obtained through network prediction in a preferred embodiment of a motion prediction method based on a deep neural network of the present invention;
图12为本发明智能终端的较佳实施例的运行环境示意图。FIG. 12 is a schematic diagram of the operating environment of the preferred embodiment of the smart terminal of the present invention.
具体实施方式Detailed ways
为使本发明的目的、技术方案及优点更加清楚、明确,以下参照附图并举实施例对本发明进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer and clearer, the present invention will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not used to limit the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
本发明较佳实施例所述的基于深度神经网络的运动预测方法,如图1所示, 一种基于深度神经网络的运动预测方法,其中,所述基于深度神经网络的运动预测方法包括以下步骤:The motion prediction method based on a deep neural network according to a preferred embodiment of the present invention, as shown in FIG. 1, is a motion prediction method based on a deep neural network, wherein the motion prediction method based on a deep neural network includes the following steps :
步骤S10、使用数据集训练深度神经网络;Step S10, use the data set to train the deep neural network;
步骤S20、将三维点云输入至所述深度神经网络;Step S20, input the three-dimensional point cloud to the deep neural network;
步骤S30、所述深度神经网络输出所述三维点云的第一部分和第二部分,将所述第一部分作为运动子单元,所述第二部分作为运动单元的参考部分;Step S30: The deep neural network outputs the first part and the second part of the three-dimensional point cloud, using the first part as a motion subunit, and the second part as a reference part of the motion unit;
步骤S40、根据所述三维点云的输出完成网络预测,输出运动信息,所述运动信息包括运动性分割、运动轴和运动类型。Step S40: Complete network prediction according to the output of the three-dimensional point cloud, and output motion information, where the motion information includes motion segmentation, motion axis, and motion type.
本发明引入了一个基于学习的方法,它同时预测单一未分割的点云,可能是一个三维物体的部分扫描形状的旋转部件以及它们的运动。本发明的深度神经网络,将输入的三维物体视为一个运动单元,并且输出点云的两个部分,将其中一个部分作为运动子单元而另一个部分作为运动单元的参考部分,迭代地应用本发明提出的网络得到的部分能够预测更精细的部件运动,从而得到层次性运动同时基于运动的物体分割的预测,如图2所示。MAPP-NET(深度神经网络)从一个训练集学习深度预测模型,该训练集涵盖了不同物体的各种运动。尽管从单一的配置中进行运动性预测与分割的问题本质上是不适定的,本发明的基于学习的方法能够汇聚丰富的线索,例如从训练数据中得到部件几何以及它们的上下文场景,从而推测没有见过的三维物体。The present invention introduces a learning-based method, which simultaneously predicts a single unsegmented point cloud, which may be a partially scanned rotating part of a three-dimensional object, and their movement. The deep neural network of the present invention regards the input three-dimensional object as a motion unit, and outputs two parts of the point cloud. One part is used as the motion sub-unit and the other part is used as the reference part of the motion unit. This is applied iteratively. The part obtained by the network proposed by the invention can predict finer component movement, thereby obtaining the prediction of hierarchical movement and object segmentation based on movement, as shown in FIG. 2. MAPP-NET (Deep Neural Network) learns a deep prediction model from a training set that covers various movements of different objects. Although the problem of motion prediction and segmentation from a single configuration is inherently ill-posed, the learning-based method of the present invention can gather rich clues, such as the geometry of parts and their contexts from training data, and thus speculate Three-dimensional objects that have not been seen.
点云的运动性预测的核心要点能被视为是预测点对以及随着时间变化的位移场,它允许网络处理非结构化的低层次的输入并且利用运动的瞬时性特点,具体来说,本发明的MAPP-NET通过循环神经网络来实现,它的输入是点云,然后预测在后续帧中每个点的位移情况,而接下来的每一帧的输入点云是参考点。网络的架构由编码器-解码器对组成同时交叉着长短时记忆网络(LSTM,Long Short-Term Memory),它同时预测输入点云的位移场;本发明还在网络中加入了额外的层来推断基于运动性的分割以及预测得到的位移场的运动参数。因此,给定一个点云,MAPP-NET既推断点的几何变换的运动类型和运动参数(如旋转轴),又根据预测的运动状态预测他们的可旋转的部分的分割。The core point of the point cloud motion prediction can be regarded as the prediction point pair and the displacement field that changes over time. It allows the network to process unstructured low-level input and take advantage of the instantaneous characteristics of motion. Specifically, The MAPP-NET of the present invention is implemented by a cyclic neural network, its input is a point cloud, and then the displacement of each point in the subsequent frame is predicted, and the input point cloud of each subsequent frame is the reference point. The architecture of the network is composed of encoder-decoder pairs and crossed with Long Short-Term Memory (LSTM), which also predicts the displacement field of the input point cloud; the present invention also adds additional layers to the network. Infer the motion parameters based on the motion segmentation and the predicted displacement field. Therefore, given a point cloud, MAPP-NET not only infers the motion type and motion parameters (such as rotation axis) of the geometric transformation of the point, but also predicts the segmentation of their rotatable part based on the predicted motion state.
本发明的目的是分割出给定三维物体中可运动的部分,判断物体运动类型,并生成物体下几帧的运动序列。其中物体是用单个,未经分割的点云来表示的。 本发明采用深度神经网络在数据集上预训练来达到上述目标。因此本发明的主要技术问题为如何设计网络结构和损失函数以完成上述任务。The purpose of the present invention is to segment the movable part of a given three-dimensional object, determine the type of object motion, and generate a motion sequence of the next few frames of the object. The object is represented by a single, unsegmented point cloud. The present invention uses a deep neural network to pre-train on a data set to achieve the above goals. Therefore, the main technical problem of the present invention is how to design the network structure and loss function to accomplish the above tasks.
本发明输入是点数为1024的三维点云,假设该点云只有一个运动单元,即点云的点要么是参照不动的,要么同属一个运动。输出为一个点云序列,序列中的每个点云都有1024个点并与输入点云中的点一一对应。同时网络还预测输出了运动性分割S,运动轴(d,x)、和运动类型t。运动轴信息包含轴的方向d和轴上一点x的位置,把它们统称为运动信息M=(t,d,x)。The input of the present invention is a three-dimensional point cloud with 1024 points. It is assumed that the point cloud has only one motion unit, that is, the points of the point cloud are either fixed or belong to the same motion. The output is a point cloud sequence. Each point cloud in the sequence has 1024 points and corresponds to the points in the input point cloud one by one. At the same time, the network also predicts and outputs the motion segmentation S, the motion axis (d, x), and the motion type t. The motion axis information includes the axis direction d and the position of a point x on the axis, which are collectively referred to as motion information M=(t, d, x).
网络的核心是用一个循环神经网络去预测点云中点的位移,该位移即为运动的表示。采用循环神经网络是因为这种网络在处理序列数据时有不错的效果。更具体的,本发明采用的是长短期记忆网络,并运用了PointNet++中的网络结构集合抽象层SA与特征传递层FP。图3具体说明了网络的结构,输入的点云P 0在经过一个集合抽象层之后便进入循环神经网络,它包含了若干子网络,子网络由一个特征传递层和全连接层构成,每个子网络输出某一帧的运动预测,即位移D。之后与输入点云相加便得到了运动后若干帧的点云P。有了这些点云和位移,在经过一些层就能分析出分割、运动信息。将若干帧位移信息一起传入一个全连接层后便能得到该点云的分割。运动信息也可以通过类似的方法分别得到,不过传入的信息是运动后的若干帧点云信息而不是位移,并且由于要整体考虑,需要在全连接层前加入一个集合抽象层。之所以用点云而不用位移是因为在实验中发现前者能有更高的精度,具体结构可见图3; The core of the network is to use a cyclic neural network to predict the displacement of the point in the point cloud, and the displacement is the representation of the movement. Recurrent neural network is used because this kind of network has a good effect in processing sequence data. More specifically, the present invention uses a long and short-term memory network, and uses the network structure collection abstraction layer SA and the feature transfer layer FP in PointNet++. Figure 3 illustrates the structure of the network in detail. The input point cloud P 0 enters the cyclic neural network after passing through a collective abstraction layer. It contains several sub-networks. The sub-networks are composed of a feature transfer layer and a fully connected layer. The network outputs the motion prediction of a certain frame, that is, the displacement D. After adding it to the input point cloud, the point cloud P of several frames after the movement is obtained. With these point clouds and displacements, segmentation and motion information can be analyzed after passing through some layers. After passing several frames of displacement information into a fully connected layer, the segmentation of the point cloud can be obtained. Motion information can also be obtained separately by a similar method, but the incoming information is several frames of point cloud information after motion instead of displacement, and because of the overall consideration, it is necessary to add a collection abstraction layer before the fully connected layer. The reason why point cloud is used instead of displacement is because it is found in the experiment that the former can have higher accuracy. The specific structure can be seen in Figure 3;
网络训练和损失函数;Network training and loss function;
为了训练上述多重输出网络,本发明设计了如下的损失函数:In order to train the above-mentioned multiple output network, the present invention designs the following loss function:
Figure PCTCN2020080091-appb-000028
Figure PCTCN2020080091-appb-000028
其中,D t表示位移图,S表示分割,M表示拟合运动参数,L rec是重建误差,L disp是位移误差,L seg是分割误差,L mob是运动参数的回归误差; Among them, D t represents the displacement map, S represents the segmentation, M represents the fitting motion parameter, L rec is the reconstruction error, L disp is the displacement error, L seg is the segmentation error, and L mob is the regression error of the motion parameter;
重建误差表示形状的扭曲程度,位移误差表示运动部分的精确度,分割误差和回归误差则刻画了运动信息的正确程度,包括对运动与不动的划分、运动轴的位置、方向和运动类型。Reconstruction error represents the degree of distortion of the shape, displacement error represents the accuracy of the moving part, segmentation error and regression error describe the correctness of the motion information, including the division of motion and immobility, the position, direction and type of motion of the motion axis.
重建损失函数,L rec刻画了预测的运动后点云与真实的运动后点云之间的几 何误差; Reconstruction loss function, L rec describes the geometric error between the predicted point cloud after motion and the real point cloud after motion;
将点云P 0分成参照部分和运动部分,在经历过运动后
Figure PCTCN2020080091-appb-000029
后,参照部分保持静止,运动部分为刚性运动,其中,P t-1和P t表示两个邻接的点云帧,因此L rec分为两部分:
Divide the point cloud P 0 into a reference part and a motion part, after experiencing motion
Figure PCTCN2020080091-appb-000029
Later, the reference part remains stationary, and the moving part is rigid. Among them, P t-1 and P t represent two adjacent point cloud frames, so L rec is divided into two parts:
Figure PCTCN2020080091-appb-000030
Figure PCTCN2020080091-appb-000030
Figure PCTCN2020080091-appb-000031
是参照部分的误差,
Figure PCTCN2020080091-appb-000032
是运动部分的误差;
Figure PCTCN2020080091-appb-000031
Is the error of the reference part,
Figure PCTCN2020080091-appb-000032
Is the error of the moving part;
Figure PCTCN2020080091-appb-000033
是每个点误差距离的平方和:
Figure PCTCN2020080091-appb-000033
Is the sum of the squares of the error distance of each point:
Figure PCTCN2020080091-appb-000034
Figure PCTCN2020080091-appb-000034
其中,
Figure PCTCN2020080091-appb-000035
是点p真实的位置;
among them,
Figure PCTCN2020080091-appb-000035
Is the real position of point p;
Figure PCTCN2020080091-appb-000036
的构成为:
Figure PCTCN2020080091-appb-000036
The composition is:
Figure PCTCN2020080091-appb-000037
Figure PCTCN2020080091-appb-000037
其中,L shape是用来惩罚不与目标形状匹配的点,L density是预测点云与目标点云局部点密度,
Figure PCTCN2020080091-appb-000038
指所述深度神经网络生成的第t帧的点云中的运动部分,
Figure PCTCN2020080091-appb-000039
指正确的第t帧的点云中的运动部分,gt是ground truth的缩写,表示正确的意思。
Among them, L shape is used to punish points that do not match the target shape, and L density is the local point density of the predicted point cloud and the target point cloud.
Figure PCTCN2020080091-appb-000038
Refers to the moving part of the point cloud of the t-th frame generated by the deep neural network,
Figure PCTCN2020080091-appb-000039
Refers to the moving part of the point cloud of the correct t-th frame. gt is the abbreviation of ground truth, which means correct.
通过误差损失函数预测运动信息与目标运动信息之间的差别;所述运动类型包括旋转运动和平移运动。The difference between the motion information and the target motion information is predicted by the error loss function; the motion type includes rotation motion and translation motion.
对于位移损失函数(误差损失函数),此位移损失函数可以衡量预测运动信息与目标运动信息之间的差别,如前所述,这也是针对点云中运动部分而言的。由于有不同的运动类型,因而也对应有不同形式。本发明只考虑了旋转与平移两种类型的运动。For the displacement loss function (error loss function), this displacement loss function can measure the difference between the predicted motion information and the target motion information. As mentioned earlier, this is also for the motion part of the point cloud. As there are different types of exercise, there are also different forms. The present invention only considers two types of motions, rotation and translation.
对于旋转运动,见图4,损失函数如下:For rotational motion, see Figure 4, the loss function is as follows:
Figure PCTCN2020080091-appb-000040
Figure PCTCN2020080091-appb-000040
Figure PCTCN2020080091-appb-000041
刻画了预测的位移是否垂直于真实的运动轴,具体的计算公式为:
Figure PCTCN2020080091-appb-000041
It describes whether the predicted displacement is perpendicular to the real axis of motion. The specific calculation formula is:
Figure PCTCN2020080091-appb-000042
Figure PCTCN2020080091-appb-000042
其中,dot表示点乘,
Figure PCTCN2020080091-appb-000043
表示第t帧点云p的位移图,d gt是正确的运动轴的方向;
Figure PCTCN2020080091-appb-000044
则是各个点旋转角的偏差,所有点旋转角度一致,具体计算公式为:
Among them, dot means dot product,
Figure PCTCN2020080091-appb-000043
Represents the displacement map of the point cloud p at the t-th frame, d gt is the direction of the correct axis of motion;
Figure PCTCN2020080091-appb-000044
It is the deviation of the rotation angle of each point, and the rotation angle of all points is the same. The specific calculation formula is:
Figure PCTCN2020080091-appb-000045
Figure PCTCN2020080091-appb-000045
其中,σ为常数,proj(p)表示p点与将点p投影到正确运动轴上的投影点的距离,
Figure PCTCN2020080091-appb-000046
Among them, σ is a constant, proj(p) represents the distance between the point p and the projection point that projects the point p onto the correct axis of motion,
Figure PCTCN2020080091-appb-000046
Figure PCTCN2020080091-appb-000047
要求每个点旋转前和旋转后距离真实转轴的距离相同,约束其运动的圆周性,具体计算公式为:
Figure PCTCN2020080091-appb-000047
It is required that each point is the same distance from the real axis of rotation before and after rotation, and the circularity of its motion is restricted. The specific calculation formula is:
Figure PCTCN2020080091-appb-000048
Figure PCTCN2020080091-appb-000048
对于平移运动,见图5,损失函数如下:For translational motion, see Figure 5, the loss function is as follows:
Figure PCTCN2020080091-appb-000049
Figure PCTCN2020080091-appb-000049
Figure PCTCN2020080091-appb-000050
刻画了预测的位移是否平行于真实的运动轴,具体计算公式为:
Figure PCTCN2020080091-appb-000050
It describes whether the predicted displacement is parallel to the real axis of motion. The specific calculation formula is:
Figure PCTCN2020080091-appb-000051
Figure PCTCN2020080091-appb-000051
Figure PCTCN2020080091-appb-000052
则要求每个点移动的距离相同,方差为0,具体计算公式为:
Figure PCTCN2020080091-appb-000052
It is required that the distance moved by each point is the same, and the variance is 0. The specific calculation formula is:
Figure PCTCN2020080091-appb-000053
Figure PCTCN2020080091-appb-000053
分割损失函数
Figure PCTCN2020080091-appb-000054
是预测分割与真实分割的多项逻辑斯特回归交叉熵(softmax cross entropy)。
Split loss function
Figure PCTCN2020080091-appb-000054
It is a multinomial logistic regression cross entropy (softmax cross entropy) of predicted segmentation and true segmentation.
运动信息损失函数为:The motion information loss function is:
Figure PCTCN2020080091-appb-000055
Figure PCTCN2020080091-appb-000055
其中,d、x和t分别为运动轴方向、运动轴位置和运动类型,d gt是正确的运动轴方向,x gt是正确的运动轴位置,t gt是正确的运动类型,H是交叉熵。 Among them, d, x, and t are the direction of the motion axis, the position of the motion axis, and the type of motion, respectively, d gt is the correct motion axis direction, x gt is the correct motion axis position, t gt is the correct motion type, and H is the cross entropy .
本发明通过引入一种新的循环神经网络结构和若干新颖的损失函数来完成对物体未来运动的预测,这包括未来几个时刻的点云状态,运动部分的分割,运动类型和运动参数。The present invention completes the prediction of the future motion of the object by introducing a new cyclic neural network structure and several novel loss functions, which include the point cloud state at several moments in the future, the segmentation of the motion part, the motion type and the motion parameters.
进一步地,本发明展示了使用MAPP-NET得到的运动性预测,并且评估该方法的不同组件。本发明使用如下公式(1)定义的损失函数和Adam随机优化子训练网络。在本发明的实验中,使用了运动单元数据集。本发明采样了单元的可见表面来创建点云,称为“完整扫描”。本发明按照90/10的划分比例,将数据集划分为训练/测试单元。本发明还从测试集得到部分扫描的集合,用于额外的评估。Further, the present invention demonstrates the use of MAPP-NET to obtain the motility prediction, and evaluates the different components of the method. The present invention uses the loss function defined by the following formula (1) and the Adam random optimization sub-training network. In the experiment of the present invention, the motion unit data set is used. The present invention samples the visible surface of the unit to create a point cloud, which is called a "full scan". The present invention divides the data set into training/testing units according to the division ratio of 90/10. The present invention also obtains a set of partial scans from the test set for additional evaluation.
Figure PCTCN2020080091-appb-000056
Figure PCTCN2020080091-appb-000056
图6展示了对于完整和部分扫描,在测试单元上进行运动预测的例子。对于每个例子,展示了每个输入(input)点云的前5帧的预测帧(frame),绘制了预测的变换轴,参考部件和运动部件。可以观察到MAPP-NET对于不同的运动类型的不同物体,如何预测正确的部件运动并且生成对应的运动序列。例如,本发明的方法准确地预测不同轴方向和位置的形状的旋转运动,包括水平和垂直的轴方向,例如第一行(左)展示的翻盖式手机以及第二行(左)展示了转动的闪存驱动设备(U盘)。本发明的方法还准确地预测轴位置,如第四行(左)展示的行李箱和第二行(右)展示的堆垛机的例子。Figure 6 shows an example of motion prediction on the test unit for complete and partial scans. For each example, the predicted frame of the first 5 frames of each input point cloud is shown, and the predicted transformation axis, reference part and moving part are drawn. It can be observed how MAPP-NET predicts the correct part movement and generates the corresponding movement sequence for different objects with different types of motion. For example, the method of the present invention accurately predicts the rotational motion of the shape of different axis directions and positions, including horizontal and vertical axis directions, such as the flip phone shown in the first row (left) and the second row (left) shows Rotating flash drive device (U disk). The method of the present invention also accurately predicts the position of the axle, such as the luggage case shown in the fourth row (left) and the stacker example shown in the second row (right).
还可以看到,对于平移运动,例如第五行(右)抽屉的运动,MAPP-NET能够通过平移预测其打开的正确方向,尽管数据只显示抽屉的前表面,而没有内部结构;因为包裹该物体的参考部件太大了。发现第三行(左)的抽屉的把手一个相似的结果,但是预测了不同的运动类型。更进一步,我们能够从第五行(左)和最后一行(右)展示的例子中发现,针对那些已经接近结束帧的输入点云,本发明方法在找到运动的停止状态后,已学会停止生成新的帧,这表明该方法能够推测运动的范围。It can also be seen that for translational movements, such as the movement of the fifth row (right) drawer, MAPP-NET can predict the correct direction of its opening by translation, although the data only shows the front surface of the drawer, without the internal structure; because the object is wrapped The reference part is too large. A similar result was found for the handle of the drawer in the third row (left), but a different type of movement was predicted. Furthermore, we can find from the examples shown in the fifth row (left) and the last row (right) that for those input point clouds that are already close to the end of the frame, the method of the present invention has learned to stop generating new points after finding the stop state of motion. This shows that the method can predict the range of motion.
此外,MAPP-NET还能预测相同物体的多个部件的运动。给定多于一个运动部件的物体,本发明方法能够要么迭代地预测多个运动,如图2所示;要么同时预测不同组件的运动,特别是不同运动类型的组件。这是可行的,因为本发明训练单个网络来预测所有的不同运动类型,例如平移和旋转。如图7的同时运动的例子,本发明展示了预测的分割的全部5帧连续的帧。生成的帧(红色)的运 动部件当它们与输入帧更接近时,用更浅的颜色表示。In addition, MAPP-NET can also predict the movement of multiple parts of the same object. Given an object with more than one moving part, the method of the present invention can either predict multiple motions iteratively, as shown in Fig. 2; or predict the motions of different components at the same time, especially components of different motion types. This is feasible because the present invention trains a single network to predict all the different types of motion, such as translation and rotation. As in the example of simultaneous motion in Fig. 7, the present invention shows all 5 consecutive frames of the predicted segmentation. The moving parts of the generated frame (red) are shown in a lighter color when they are closer to the input frame.
针对测试集通过MAPP-NET预测的运动性,本发明进行定量评估,通过度量运动参数和分割的误差,因为本发明有基准可以使用。具体来说,对于每个测试单元,本发明使用两种度量来计算预测的变换轴M=(d,x)与基准轴M gt=(d gt,x gt)相比较的误差。第一个度量方式阐述了预测的轴方向的误差: Regarding the mobility predicted by MAPP-NET in the test set, the present invention performs quantitative evaluation by measuring the motion parameters and segmentation errors, because the present invention has benchmarks that can be used. Specifically, for each test unit, the present invention uses two metrics to calculate the error of the predicted transformation axis M = (d, x) compared with the reference axis M gt = (d gt , x gt ). The first metric explains the error of the predicted axis direction:
E angle=arccos(|dot(d/||d|| 2,d gt/||d gt|| 2)|); E angle = arccos(|dot(d/||d|| 2 , d gt /||d gt || 2 )|);
简单地表示了预测与基准轴之间偏差的角度,范围在[0,π/2]。第二个度量方式计算了轴位置的误差:It simply represents the angle of deviation between the prediction and the reference axis, in the range [0, π/2]. The second measurement method calculates the error of the shaft position:
E dist=min(||x-π(x)|| 2,1); E dist =min(||x-π(x)|| 2 , 1);
π(x)将点x投影到由M gt=(d gt,x gt)决定的基准运动轴。因为所有的形状被正则化到一个单位体中,本发明截断最大距离到1。注意平移没有定义好的轴的位置。因此对于平移,仅仅计算轴方向。当分类错误时,变换类型误差E seg设为1;反之为0。分割误差E seg仅仅度量被指定为错误标签的点的百分比。 π(x) projects the point x to the reference axis of motion determined by M gt =(d gt , x gt ). Because all shapes are regularized into one unit body, the present invention truncates the maximum distance to 1. Note that the translation does not have a defined axis position. So for translation, only the axis direction is calculated. When the classification is wrong, the transformation type error E seg is set to 1; otherwise, it is 0. The segmentation error E seg only measures the percentage of points designated as error labels.
然后,计算两个数据集的每个误差的均值:完整和部分扫描。本发明方法的误差可以在表1中看到:可以观察到所有的误差都相对很低,表明预测的运动的准确性很高;此外,本发明方法针对于完整和部分扫描都达到相当的结果,表明本发明方法的鲁棒性。Then, calculate the mean of each error of the two data sets: full and partial scans. The error of the method of the present invention can be seen in Table 1: It can be observed that all errors are relatively low, indicating that the accuracy of the predicted motion is very high; in addition, the method of the present invention achieves comparable results for both complete and partial scans. , Which shows the robustness of the method of the present invention.
Figure PCTCN2020080091-appb-000057
Figure PCTCN2020080091-appb-000057
表1:本发明方法和BaseNet的运动预测误差Table 1: Motion prediction error of the method of the present invention and BaseNet
与BaseNet的对比,为了展示利用MAPP-NET的优势,能够在预测所有的运动相关的参数之前生成位移图,本发明与基准进行了比较,将它称为“BaseNet”。BaseNet将点云P 0作为输入,使用标准的网络架构直接估计分割S和运动参数M。网络由编码器/解码器对和全连接层组成,如图8所示。BaseNet的损失函数为: Compared with BaseNet, in order to show the advantage of using MAPP-NET to generate displacement maps before predicting all motion-related parameters, the present invention compares with the benchmark and calls it "BaseNet". BaseNet takes the point cloud P 0 as input, and uses a standard network architecture to directly estimate the segmentation S and the motion parameter M. The network consists of an encoder/decoder pair and a fully connected layer, as shown in Figure 8. The loss function of BaseNet is:
L(S,M)=L seg(S)+L motion(M); L(S, M)=L seg (S)+L motion (M);
使用了公式(1)定义的两个损失函数项。Two loss function terms defined by formula (1) are used.
表1展示了MAPP-NET和BaseNet在完整和部分扫描之间的比较,可以发现BaseNet的分割误差E seg和运动类型误差E type与本发明方法相当,但是它的轴方向误差E angle和轴位置误差E dist比本发明的至少高了5%。结果差异的主要原因可能是分割和分类任务相比运动预测更简单。网络架构像PointNet++已经表明了在那两个任务上能取得好的结果,然而对于运动预测,单独的输入帧可能会导致推测的歧义。 Table 1 shows the comparison between MAPP-NET and BaseNet between complete and partial scans. It can be found that the segmentation error E seg and motion type error E type of BaseNet are equivalent to the method of the present invention, but its axis direction error E angle and axis position The error E dist is at least 5% higher than that of the present invention. The main reason for the difference in results may be that segmentation and classification tasks are simpler than motion prediction. Network architectures like PointNet++ have shown that good results can be achieved on those two tasks, but for motion prediction, a single input frame may lead to ambiguity in speculation.
在本发明的深度学习框架中,本发明使用循环神经网络来生成多个描述运动的帧的序列,它更多地限制了推断。结果,运动参数地预测更加准确。In the deep learning framework of the present invention, the present invention uses a recurrent neural network to generate a sequence of multiple frames describing motion, which more restricts inference. As a result, the prediction of motion parameters is more accurate.
图9展示了本发明方法与BaseNet在一些例子上的可视化对比。因为BaseNet没有生成运动帧,展示它在输入点云上分割和预测的轴,然而对于本发明方法,一起展示了预测的分割和轴的5帧连续的帧。生成的帧的运动部件当它们更接近输入帧时用更浅的颜色表示。对于完整和部分扫描的平移和旋转,BaseNet更容易预测错误的运动类型,从而造成复杂形状的预测错误,例如,对于在课桌下的键盘抽屉,错误预测了滑动运动的方向。Figure 9 shows a visual comparison between the method of the present invention and BaseNet on some examples. Because BaseNet does not generate motion frames, it shows the axis of segmentation and prediction on the input point cloud. However, for the method of the present invention, the predicted segmentation and the axis of 5 consecutive frames are shown together. The moving parts of the generated frame are expressed in lighter colors when they are closer to the input frame. For the translation and rotation of complete and partial scans, BaseNet is more likely to predict the wrong type of motion, resulting in prediction errors of complex shapes. For example, for the keyboard drawer under the desk, the direction of the sliding motion is incorrectly predicted.
为了进一步验证本发明的损失函数,在完整的扫描上进行三个消融研究实验。In order to further verify the loss function of the present invention, three ablation research experiments were performed on the complete scan.
L rec和L disp的重要性。为了显示L rec和L disp的重要性,这两项是预测的位移图D t或者点云P t与基准比较的损失函数项,将本发明方法的结果与没有添加这两项中的任意一项得到的结果进行比较。表2的第二和第三行展示了这个实验得到的误差值,对比第六行使用本发明的完整的损失函数。对比本发明损失函数完整的版本,去除L rec和L disp两者之一增大了误差,并且更重要的是,如图10所示,中间预测的序列对比那些使用完整的损失函数得到的结果,它的质量更差。 The importance of L rec and L disp. In order to show the importance of L rec and L disp , these two items are the loss function items of the predicted displacement map D t or the point cloud P t compared with the benchmark. The result of the method of the present invention is compared with that without adding any of these two items. Compare the results of the items. The second and third rows of Table 2 show the error values obtained in this experiment, compared to the sixth row using the complete loss function of the present invention. Compared with the complete version of the loss function of the present invention, removing one of L rec and L disp increases the error, and more importantly, as shown in Figure 10, the intermediate prediction sequence compares the results obtained by using the complete loss function , Its quality is worse.
方法method E angle E angle E dist E dist E seg E seg E type E type
w/oL rec w/oL rec 0.3290.329 0.0190.019 0.0480.048 0.0950.095
w/oL disp w/oL disp 0.2180.218 0.1660.166 0.0550.055 0.0710.071
w/oL mob w/oL mob 0.5130.513 0.5650.565 0.1080.108 0.0680.068
w/oL mob和L seg w/oL mob and L seg 0.6230.623 0.4990.499 0.2130.213 0.0650.065
结果result 0.2090.209 0.1530.153 0.0380.038 0.0470.047
表2:对比完整的MAPP-NET和去除某个损失函数项的方法的消融实验,注意损失函数的所有项的重要性得到最低的误差(末行)。Table 2: Comparing the complete MAPP-NET and the ablation experiment of the method of removing a certain loss function term, pay attention to the importance of all terms of the loss function to get the lowest error (last row).
没有重建损失项L rec,尽管多亏了位移损失项L disp,运动部件的运动看起来合理,点(特别是那些在参考部件的点)更容易移动到不可预料的位置。 There is no reconstruction loss term L rec , although thanks to the displacement loss term L disp , the motion of the moving part looks reasonable, and points (especially those on the reference part) are more likely to move to unexpected positions.
另一方面,当去掉位移损失项L disp,运动部件的点的运动变得不一致,这导致了运动部件的扭曲。相比之下,本发明的完整的方法能够对于运动部件预测一个准确且平滑的运动并且还能保持参考部件不发生改变。 On the other hand, when the displacement loss term L disp is removed, the movement of the points of the moving part becomes inconsistent, which causes the distortion of the moving part. In contrast, the complete method of the present invention can predict an accurate and smooth movement of the moving part and can also keep the reference part unchanged.
L mob和L seg的重要性。在第二个消融实验,验证运动损失项L mob和和分割损失项L seg的用法,通过将本发明的完整网络与根据预测的位移图推测运动参数M和分割S而不是通过网络的额外层来预测它们的方法作比较。具体来说,本发明的网络从位移图D t生成一个点云运动序列P t,它能直接用于拟合运动参数M;然而对于分割S,本发明能过滤一些点,这依赖于它们是否比位移图P t的合适阈值θ移动得更多,从而将点分为运动和静止(参考)点。 The importance of L mob and L seg. In the second ablation experiment, verify the usage of the motion loss term L mob and the segmentation loss term L seg , by combining the complete network of the present invention with the predicted displacement map to infer the motion parameter M and segmentation S instead of passing through additional layers of the network To predict their methods for comparison. Specifically, the network of the present invention generates a point cloud motion sequence P t from the displacement map D t , which can be directly used to fit the motion parameter M; however, for the segmentation S, the present invention can filter some points, depending on whether they are Move more than the appropriate threshold θ of the displacement map P t , thereby dividing the points into moving and stationary (reference) points.
在实验中,使用阈值θ=0.01来决定分割。为了拟合每对邻接帧的运动轴,计算最佳刚性变换矩阵,该矩阵具有将一帧变换到下一帧的最小的均值方差,并且提取:平移的轴方向、旋转的轴方向和位置。对于评估,计算了平移的轴方向误差E angle,旋转的轴方向误差E angle和轴位置误差E dist。最终,计算所有的测试序列的所有邻接帧的均值误差。表2第四和第五行展示了这个实验的误差值。 In the experiment, the threshold θ=0.01 is used to determine the division. In order to fit the motion axes of each pair of adjacent frames, the optimal rigid transformation matrix is calculated, which has the smallest mean variance for transforming one frame to the next, and extracts: axis direction of translation, axis direction of rotation, and position. For the evaluation, calculated translational error E angle in the axial direction, the axial direction of the rotation axis position and the error E angle error E dist. Finally, calculate the mean error of all adjacent frames of all test sequences. The fourth and fifth rows of Table 2 show the error values of this experiment.
这种运动拟合方法对于噪声很敏感,导致很大误差;然而利用本发明的完整网络得到的预测更稳定并且提供更好的结果。运动参数拟合结果与本发明的结果的对比如图11所示,可以看到,没有运动损失项L mob和分割损失项L seg,一些离群点会造成轴拟合的大误差。然而对于没有运动损失项L mob的例子,尽管分割看起来是正确的,不同点的位移的噪声也能造成轴拟合的大误差。例如,第二行展示的轮子,除了物体的更低部分的点有运动外,大多数点没有运动,导致了拟合出来的轴的位置偏离了轮子的中心。 This motion fitting method is very sensitive to noise and causes large errors; however, the prediction obtained by using the complete network of the present invention is more stable and provides better results. The comparison between the result of the motion parameter fitting and the result of the present invention is shown in FIG. 11, it can be seen that there is no motion loss term L mob and segmentation loss term L seg , and some outliers will cause large errors in axis fitting. However, for the example without the motion loss term L mob , although the segmentation seems to be correct, the noise of the displacement of different points can also cause large errors in the axis fitting. For example, in the wheel shown in the second row, most of the points do not move except for the lower part of the object, which causes the position of the fitted axis to deviate from the center of the wheel.
基于P t定义L rec和L mob的重要性。此外,因为本发明的网络提供了位移图D t和点云P t作为中间输出,除了位移损失L disp,所有的损失项都能基于D t和P t两者任一定义。因此,进行第三个消融实验来表现基于P t的重建项L rec和运动损失项L mob的定义。正如本发明方法做的那样,这个定义比在D t上定义更好。表3展示了这个实验证明的这个点。这个结果的主要原因是位移图D t是在两个邻接的点云帧P t-1和P t之间定义的。因此,定义在D t上的误差会影响P t的生成,也会影响D t+1。如果仅仅在每个D t上独立地度量重建损失项L rec,之后在学习过程中不能准确地将累计的误差加入考虑。相反,P t是通过应用所有的先前的位移图
Figure PCTCN2020080091-appb-000058
到输入点云P 0而得到的。因此,通过在每个P t上定义重建损失项L rec,损失项在生成的序列的误差中提供了更全局的限制。运动损失项L mob的定义也是采用相似的参数应用。
Define the importance of L rec and L mob based on P t. In addition, because the network of the present invention provides the displacement map D t and the point cloud P t as intermediate outputs, all loss terms can be defined based on either D t and P t except for the displacement loss L disp. Therefore, a third ablation experiment was performed to show the definition of the reconstruction term L rec and the motion loss term L mob based on P t. As the method of the present invention does, this definition is better than the definition on D t . Table 3 shows this point proved by this experiment. The main reason for this result is that the displacement map D t is defined between two adjacent point cloud frames P t-1 and P t. Therefore, the error defined on D t affects the generation of P t and also affects D t+1 . If only the reconstruction loss item L rec is independently measured on each D t , then the accumulated error cannot be accurately taken into consideration in the learning process. Conversely, P t is obtained by applying all previous displacement maps
Figure PCTCN2020080091-appb-000058
To the input point cloud P 0 . Therefore, by defining the reconstruction loss term L rec on each P t , the loss term provides a more global limit in the error of the generated sequence. The definition of the motion loss term L mob is also applied with similar parameters.
方法method E angle E angle E dist E dist E seg E seg E type E type
L mobon D t L mob on D t 0.4410.441 0.2140.214 0.1700.170 0.0730.073
L recon D t L rec on D t 0.3500.350 0.2030.203 0.1450.145 0.0740.074
both on D t both on D t 0.5690.569 0.3310.331 0.3080.308 0.0950.095
结果result 0.2090.209 0.1530.153 0.0380.038 0.0470.047
表3:使用D t而不是P t定义重建损失项L rec和运动损失项L mob的比较。最后一行对应于本发明将两个损失项都定义在P t上的方法,得到最低的误差 Table 3: Use D t instead of P t to define the comparison of reconstruction loss term L rec and motion loss term L mob. The last line corresponds to the method of the present invention to define both loss terms on P t , and obtain the lowest error
正如实验所强调的,本发明的方法呈现了预测具有单个运动部件的物体的运动的高准确性。因此,对于在更多情形下预测物体的运动,本发明的方法算是一个不错的基础模块。例如,图2和图7显示了本发明的方法对于检测一个物体中多个运动部件的潜能,包括以并行方式发生的运动或者按照层次性顺序的运动。然而,对于这项更复杂任务,还需要进一步的实验来定量地评估本发明方法,可能需要构造具有多个可运动部件的物体的数据集和它们已知的运动参数以及分割。另外,本发明当前的数据集假定了形状是有意义的朝向并且数据集相对小,由276个运动单元组成。另一个更直接的改进方法可应用到更复杂场景的方向是强化本发明的数据集,通过应用随机变换到运动单元,以便本发明的网络能够以 位姿不变的方式运作,或者利用部分扫描来训练网络从而改善它的抗干扰能力。As emphasized by the experiment, the method of the present invention exhibits high accuracy in predicting the motion of an object with a single moving part. Therefore, the method of the present invention can be regarded as a good basic module for predicting the motion of an object in more situations. For example, FIG. 2 and FIG. 7 show the potential of the method of the present invention for detecting multiple moving parts in an object, including movement occurring in a parallel manner or movement in a hierarchical sequence. However, for this more complex task, further experiments are needed to quantitatively evaluate the method of the present invention, and it may be necessary to construct a data set of objects with multiple movable parts and their known motion parameters and segmentation. In addition, the current data set of the present invention assumes that the shape is a meaningful orientation and the data set is relatively small, consisting of 276 motion units. Another more direct improvement method that can be applied to more complex scenes is to enhance the data set of the present invention, by applying random transformation to the motion unit, so that the network of the present invention can operate in a pose-invariant manner, or use partial scanning To train the network to improve its anti-interference ability.
未来工作的另一个方向是利用本发明的方法预测的运动性来合成输入形状的运动。作为这个更大的运动合成问题的一部分,一个有趣的子问题是学习如何补全物体的几何形状,当运动发生时物体的几何可能会丢失,例如,从橱柜中拉出的抽屉应显示其内部,如果对形状进行扫描或未对其内部进行建模,则会丢失内部。一个可能的方法是从预测的运动和存在的部件几何学习如何合成丢失的几何。这种方法至少需要以预分割的物体的形式建立一个训练集,并对其所有内部细节进行建模。Another direction of future work is to use the motion predicted by the method of the present invention to synthesize the motion of the input shape. As part of this larger motion synthesis problem, an interesting sub-problem is learning how to complement the geometry of an object. The geometry of the object may be lost when motion occurs. For example, a drawer drawn from a cabinet should show its interior. , If the shape is scanned or the interior is not modeled, the interior will be lost. One possible method is to learn how to synthesize the missing geometry from the predicted motion and existing part geometry. This method requires at least to build a training set in the form of pre-segmented objects and model all its internal details.
本发明介绍了由重建损失函数和位移损失函数组成的损失函数,它保证了在保持物体形状的同时也准确地预测运动。重建损失衡量了在运动过程中,维持物体形状的程度,而位移损失衡量了位移场刻画运动的程度。表明:对比其他可选择的方法,这个损失函数能带来最准确的预测。循环神经网络(RNN)架构的使用允许本发明不仅预测运动的后续帧,也使本发明能够决定当运动停止时,除了推断运动参数还能推断预测的运动的范围,如:门能开多大角度。The present invention introduces a loss function composed of a reconstruction loss function and a displacement loss function, which ensures that the shape of the object is maintained and the motion is accurately predicted. The reconstruction loss measures the extent to which the shape of the object is maintained during the movement, while the displacement loss measures the extent to which the displacement field describes the movement. Show that: Compared with other alternative methods, this loss function can bring the most accurate prediction. The use of the Recurrent Neural Network (RNN) architecture allows the present invention not only to predict the subsequent frames of motion, but also enables the present invention to determine when the motion stops, in addition to inferring the motion parameters, it can also infer the range of the predicted motion, such as: how far the door can be opened .
本发明展示了MAPP-NET能够十分准确地预测物体部件的运动,这些物体是具有不同运动类型(包括旋转和平移变换)的多种物体,既可以是3D物体的完整点云也可以是部分扫描的结果。此外,还验证本发明方法的合理性并且与基准方法进行了比较。最终,本发明展示了初步的结果,即本发明提出的网络有能按层次性方式来分割具有多个运动部分组成的物体的潜能,同时预测多个部件的运动。The present invention shows that MAPP-NET can predict the movement of object parts very accurately. These objects are a variety of objects with different types of motion (including rotation and translation transformation), which can be a complete point cloud of a 3D object or a partial scan. the result of. In addition, the rationality of the method of the present invention was verified and compared with the benchmark method. Finally, the present invention shows the preliminary results, that is, the network proposed by the present invention has the potential to segment objects composed of multiple moving parts in a hierarchical manner, and predict the movement of multiple components at the same time.
技术效果:Technical effect:
(1)本发明将功能可见性分析问题归为对于输入的几何进行分割并且标记每个分割的运动类型和参数;因此,本发明提出的深度神经网络从预分割和已知运动的三维形状学习,之后执行分割和预测。(1) The present invention classifies the functional visibility analysis problem as segmenting the input geometry and marking the motion type and parameters of each segmentation; therefore, the deep neural network proposed by the present invention learns from pre-segmentation and three-dimensional shapes of known motions , And then perform segmentation and prediction.
(2)本发明的深度神经网络MAPP-NET从一个三维点云形状预测部件的运动,然而并不需要该形状的分割;本发明通过训练一个深度学习模型来同时分割输入形状和预测它的部件的运动从而实现的。(2) The deep neural network MAPP-NET of the present invention predicts the movement of a part from a three-dimensional point cloud shape, but does not need to segment the shape; the present invention trains a deep learning model to simultaneously segment the input shape and predict its parts The movement is thus achieved.
(3)本发明的网络是在基准分割和运动参数标识好了的运动单元数据集上训练的;一旦训练完毕,它能够用来预测单个未分割的表示物体的一个静止状态的点云的运动。(3) The network of the present invention is trained on the motion unit data set with the reference segmentation and motion parameters identified; once the training is completed, it can be used to predict the motion of a single unsegmented point cloud representing a static state of the object .
(4)本发明介绍了由重建损失函数和位移损失函数组成的损失函数,它保证了在保持物体形状的同时也准确地预测运动;重建损失衡量了在运动过程中,维持物体形状的程度,而位移损失衡量了位移场刻画运动的程度;表明:对比其他可选择的方法,这个损失函数能带来最准确的预测。(4) The present invention introduces a loss function composed of a reconstruction loss function and a displacement loss function, which ensures that the shape of the object is maintained while also accurately predicting the movement; the reconstruction loss measures the degree to which the shape of the object is maintained during the movement. The displacement loss measures the extent to which the displacement field describes the motion; it shows that compared with other alternative methods, this loss function can bring the most accurate prediction.
(5)循环神经网络(RNN)架构的使用允许本发明不仅预测运动的后续帧,也使本发明能够决定当运动停止时,除了推断运动参数还能推断预测的运动的范围,如:门能开多大。(5) The use of the Recurrent Neural Network (RNN) architecture allows the present invention not only to predict the subsequent frames of motion, but also enables the present invention to determine when the motion stops, in addition to inferring the motion parameters, it can also infer the range of the predicted motion, such as: gate energy How big is it?
(6)本发明展示了MAPP-NET能够十分准确地预测物体部件的运动,这些物体是具有不同运动类型(包括旋转和平移变换)的多种物体,既可以是3D物体的完整点云也可以是部分扫描的结果。(6) The present invention shows that MAPP-NET can predict the movement of object parts very accurately. These objects are a variety of objects with different motion types (including rotation and translation transformation), which can be a complete point cloud of 3D objects or It is the result of partial scanning.
(7)本发明展示了初步的结果,本发明提出的网络有能按层次性方式来分割具有多个运动部分组成的物体的潜能,同时预测多个部件的运动。(7) The present invention shows preliminary results. The network proposed by the present invention has the potential to segment objects composed of multiple moving parts in a hierarchical manner, and predict the movement of multiple components at the same time.
进一步地,如图12所示,基于上述基于深度神经网络的运动预测方法,本发明还相应提供了一种智能终端,所述智能终端包括处理器10、存储器20及显示器30。图7仅示出了智能终端的部分组件,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。Further, as shown in FIG. 12, based on the above-mentioned deep neural network-based motion prediction method, the present invention also provides an intelligent terminal correspondingly. The intelligent terminal includes a processor 10, a memory 20 and a display 30. FIG. 7 only shows part of the components of the smart terminal, but it should be understood that it is not required to implement all the shown components, and more or fewer components may be implemented instead.
所述存储器20在一些实施例中可以是所述智能终端的内部存储单元,例如智能终端的硬盘或内存。所述存储器20在另一些实施例中也可以是所述智能终端的外部存储设备,例如所述智能终端上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器20还可以既包括所述智能终端的内部存储单元也包括外部存储设备。所述存储器20用于存储安装于所述智能终端的应用软件及各类数据,例如所述安装智能终端的程序代码等。所述存储器20还可以用于暂时地存储已经输出或者将要输出的数据。在一实施例中,存储器20上存储有基于深度神经网络的运动预测程序40,该基于深度神经网络的运动预测程序40可被处理器10所执行,从而实现本申请中基于深度神经网络的运动预测方法。In some embodiments, the memory 20 may be an internal storage unit of the smart terminal, such as a hard disk or a memory of the smart terminal. In other embodiments, the memory 20 may also be an external storage device of the smart terminal, such as a plug-in hard disk equipped on the smart terminal, a smart media card (SMC), and a secure digital (Secure Digital). Digital, SD) card, flash card, etc. Further, the memory 20 may also include both an internal storage unit of the smart terminal and an external storage device. The memory 20 is used to store application software and various types of data installed on the smart terminal, such as the program code of the installed smart terminal. The memory 20 can also be used to temporarily store data that has been output or will be output. In an embodiment, a motion prediction program 40 based on a deep neural network is stored in the memory 20, and the motion prediction program 40 based on a deep neural network can be executed by the processor 10, so as to realize the motion based on the deep neural network in this application. method of prediction.
所述处理器10在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行所述存储器20中存储的程序代码或处理数据,例如执行所述基于深度神经网络的运动预测方法等。The processor 10 may be a central processing unit (CPU), microprocessor or other data processing chip in some embodiments, and is used to run the program code or process data stored in the memory 20, for example Perform the motion prediction method based on the deep neural network and so on.
所述显示器30在一些实施例中可以是LED显示器、液晶显示器、触控式液 晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。所述显示器30用于显示在所述智能终端的信息以及用于显示可视化的用户界面。所述智能终端的部件10-30通过系统总线相互通信。In some embodiments, the display 30 may be an LED display, a liquid crystal display, a touch liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like. The display 30 is used for displaying information on the smart terminal and for displaying a visualized user interface. The components 10-30 of the smart terminal communicate with each other via a system bus.
在一实施例中,当处理器10执行所述存储器20中基于深度神经网络的运动预测程序40时实现以下步骤:In an embodiment, when the processor 10 executes the motion prediction program 40 based on the deep neural network in the memory 20, the following steps are implemented:
使用数据集训练深度神经网络;Use data sets to train deep neural networks;
将三维点云输入至所述深度神经网络;Inputting a three-dimensional point cloud to the deep neural network;
所述深度神经网络输出所述三维点云的第一部分和第二部分,将所述第一部分作为运动子单元,所述第二部分作为运动单元的参考部分;The deep neural network outputs the first part and the second part of the three-dimensional point cloud, using the first part as a motion subunit, and the second part as a reference part of the motion unit;
根据所述三维点云的输出完成网络预测,输出运动信息,所述运动信息包括运动性分割、运动轴和运动类型。The network prediction is completed according to the output of the three-dimensional point cloud, and the motion information is output. The motion information includes the motion segmentation, the motion axis, and the motion type.
本发明还提供一种存储介质,其中,所述存储介质存储有基于深度神经网络的运动预测程序,所述基于深度神经网络的运动预测程序被处理器执行时实现所述基于深度神经网络的运动预测方法的步骤;具体如上所述。The present invention also provides a storage medium, wherein the storage medium stores a motion prediction program based on a deep neural network, and the motion prediction program based on the deep neural network is executed by a processor to realize the motion based on the deep neural network. The steps of the prediction method; the details are as described above.
综上所述,本发明提供了一种基于深度神经网络的运动预测方法和智能终端,所述方法包括:使用数据集训练深度神经网络;将三维点云输入至所述深度神经网络;所述深度神经网络输出所述三维点云的第一部分和第二部分,将所述第一部分作为运动子单元,所述第二部分作为运动单元的参考部分;根据所述三维点云的输出完成网络预测,输出运动信息,所述运动信息包括运动性分割、运动轴和运动类型。本发明实现了在非结构化并且可能是部分扫描的各种铰链式物体在静止状态下同时运动和部件的预测结果,能够十分准确地预测物体部件的运动。In summary, the present invention provides a motion prediction method and smart terminal based on a deep neural network, the method includes: training a deep neural network using a data set; inputting a three-dimensional point cloud to the deep neural network; The deep neural network outputs the first part and the second part of the 3D point cloud, using the first part as the motion subunit, and the second part as the reference part of the motion unit; complete network prediction based on the output of the 3D point cloud , Output motion information, the motion information includes motion segmentation, motion axis and motion type. The present invention realizes the prediction result of simultaneous movement and components of various hinged objects that are unstructured and may be partially scanned in a static state, and can predict the movement of the object components very accurately.
当然,本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关硬件(如处理器,控制器等)来完成,所述的程序可存储于一计算机可读取的存储介质中,所述程序在执行时可包括如上述各方法实施例的流程。其中所述的存储介质可为存储器、磁碟、光盘等。Of course, those of ordinary skill in the art can understand that all or part of the processes in the methods of the above-mentioned embodiments can be implemented by instructing relevant hardware (such as a processor, a controller, etc.) through a computer program, and the program can be stored in a computer program. In a computer-readable storage medium, the program may include the processes of the foregoing method embodiments when executed. The storage medium mentioned may be a memory, a magnetic disk, an optical disk, and the like.
应当理解的是,本发明的应用不限于上述的举例,对本领域普通技术人员来说,可以根据上述说明加以改进或变换,所有这些改进和变换都应属于本发明所附权利要求的保护范围。It should be understood that the application of the present invention is not limited to the above examples. For those of ordinary skill in the art, improvements or changes can be made based on the above description, and all these improvements and changes should fall within the protection scope of the appended claims of the present invention.

Claims (10)

  1. 一种基于深度神经网络的运动预测方法,其特征在于,所述基于深度神经网络的运动预测方法包括:A method for motion prediction based on a deep neural network, characterized in that the method for motion prediction based on a deep neural network includes:
    使用数据集训练深度神经网络;Use data sets to train deep neural networks;
    将三维点云输入至所述深度神经网络;Inputting a three-dimensional point cloud to the deep neural network;
    所述深度神经网络输出所述三维点云的第一部分和第二部分,将所述第一部分作为运动子单元,所述第二部分作为运动单元的参考部分;The deep neural network outputs the first part and the second part of the three-dimensional point cloud, using the first part as a motion subunit, and the second part as a reference part of the motion unit;
    根据所述三维点云的输出完成网络预测,输出运动信息,所述运动信息包括运动性分割、运动轴和运动类型。The network prediction is completed according to the output of the three-dimensional point cloud, and the motion information is output. The motion information includes the motion segmentation, the motion axis, and the motion type.
  2. 根据权利要求1所述的基于深度神经网络的运动预测方法,其特征在于,在训练所述深度神经网络时,所使用的损失函数为:The motion prediction method based on a deep neural network according to claim 1, wherein the loss function used when training the deep neural network is:
    Figure PCTCN2020080091-appb-100001
    Figure PCTCN2020080091-appb-100001
    其中,D t表示位移图,S表示分割,M表示拟合运动参数,L rec是重建误差,L disp是位移误差,L seg是分割误差,L mob是运动参数的回归误差; Among them, D t represents the displacement map, S represents the segmentation, M represents the fitting motion parameter, L rec is the reconstruction error, L disp is the displacement error, L seg is the segmentation error, and L mob is the regression error of the motion parameter;
    重建误差表示形状的扭曲程度,位移误差表示运动部分的精确度,分割误差和回归误差则刻画了运动信息的正确程度,包括对运动与不动的划分、运动轴的位置、方向和运动类型。Reconstruction error represents the degree of distortion of the shape, displacement error represents the accuracy of the moving part, segmentation error and regression error describe the correctness of the motion information, including the division of motion and immobility, the position, direction and type of motion of the motion axis.
  3. 根据权利要求2所述的基于深度神经网络的运动预测方法,其特征在于,L rec刻画了预测的运动后点云与真实的运动后点云之间的几何误差; The motion prediction method based on a deep neural network according to claim 2, wherein L rec describes the geometric error between the predicted point cloud after motion and the real point cloud after motion;
    将点云P 0分成参照部分和运动部分,在经历过运动后
    Figure PCTCN2020080091-appb-100002
    后,参照部分保持静止,运动部分为刚性运动,其中,P t-1和P t表示两个邻接的点云帧,因此L rec分为两部分:
    Divide the point cloud P 0 into a reference part and a motion part, after experiencing motion
    Figure PCTCN2020080091-appb-100002
    Later, the reference part remains stationary, and the moving part is rigid. Among them, P t-1 and P t represent two adjacent point cloud frames, so L rec is divided into two parts:
    Figure PCTCN2020080091-appb-100003
    Figure PCTCN2020080091-appb-100003
    Figure PCTCN2020080091-appb-100004
    是参照部分的误差,
    Figure PCTCN2020080091-appb-100005
    是运动部分的误差;
    Figure PCTCN2020080091-appb-100004
    Is the error of the reference part,
    Figure PCTCN2020080091-appb-100005
    Is the error of the moving part;
    Figure PCTCN2020080091-appb-100006
    是每个点误差距离的平方和:
    Figure PCTCN2020080091-appb-100006
    Is the sum of the squares of the error distance of each point:
    Figure PCTCN2020080091-appb-100007
    Figure PCTCN2020080091-appb-100007
    其中,
    Figure PCTCN2020080091-appb-100008
    是点p真实的位置;
    among them,
    Figure PCTCN2020080091-appb-100008
    Is the real position of point p;
    Figure PCTCN2020080091-appb-100009
    的构成为:
    Figure PCTCN2020080091-appb-100009
    The composition is:
    Figure PCTCN2020080091-appb-100010
    Figure PCTCN2020080091-appb-100010
    其中,L shape是用来惩罚不与目标形状匹配的点,L density是预测点云与目标点云局部点密度,
    Figure PCTCN2020080091-appb-100011
    指所述深度神经网络生成的第t帧的点云中的运动部分,
    Figure PCTCN2020080091-appb-100012
    指正确的第t帧的点云中的运动部分,gt是ground truth的缩写,表示正确的意思。
    Among them, L shape is used to punish points that do not match the target shape, and L density is the local point density of the predicted point cloud and the target point cloud.
    Figure PCTCN2020080091-appb-100011
    Refers to the moving part of the point cloud of the t-th frame generated by the deep neural network,
    Figure PCTCN2020080091-appb-100012
    Refers to the moving part of the point cloud of the correct t-th frame. gt is the abbreviation of ground truth, which means correct.
  4. 根据权利要求3所述的基于深度神经网络的运动预测方法,其特征在于,通过误差损失函数预测运动信息与目标运动信息之间的差别;所述运动类型包括旋转运动和平移运动。The motion prediction method based on a deep neural network according to claim 3, wherein the difference between the motion information and the target motion information is predicted by an error loss function; the motion type includes rotation motion and translation motion.
  5. 根据权利要求4所述的基于深度神经网络的运动预测方法,其特征在于,对于旋转运动,损失函数如下:The motion prediction method based on deep neural network according to claim 4, characterized in that, for rotating motion, the loss function is as follows:
    Figure PCTCN2020080091-appb-100013
    Figure PCTCN2020080091-appb-100013
    Figure PCTCN2020080091-appb-100014
    刻画了预测的位移是否垂直于真实的运动轴,具体的计算公式为:
    Figure PCTCN2020080091-appb-100014
    It describes whether the predicted displacement is perpendicular to the real axis of motion. The specific calculation formula is:
    Figure PCTCN2020080091-appb-100015
    Figure PCTCN2020080091-appb-100015
    其中,dot表示点乘,
    Figure PCTCN2020080091-appb-100016
    表示第t帧点云p的位移图,d gt是正确的运动轴的方向;
    Figure PCTCN2020080091-appb-100017
    则是各个点旋转角的偏差,所有点旋转角度一致,具体计算公式为:
    Among them, dot means dot product,
    Figure PCTCN2020080091-appb-100016
    Represents the displacement map of the point cloud p at the t-th frame, d gt is the direction of the correct axis of motion;
    Figure PCTCN2020080091-appb-100017
    It is the deviation of the rotation angle of each point, and the rotation angle of all points is the same. The specific calculation formula is:
    Figure PCTCN2020080091-appb-100018
    Figure PCTCN2020080091-appb-100018
    其中,σ为常数,proj(p)表示p点与将点p投影到正确运动轴上的投影点的距离,
    Figure PCTCN2020080091-appb-100019
    Among them, σ is a constant, proj(p) represents the distance between the point p and the projection point that projects the point p onto the correct axis of motion,
    Figure PCTCN2020080091-appb-100019
    Figure PCTCN2020080091-appb-100020
    要求每个点旋转前和旋转后距离真实转轴的距离相同,约束其运动 的圆周性,具体计算公式为:
    Figure PCTCN2020080091-appb-100020
    It is required that each point is the same distance from the real axis of rotation before and after rotation, and the circularity of its motion is restricted. The specific calculation formula is:
    Figure PCTCN2020080091-appb-100021
    Figure PCTCN2020080091-appb-100021
  6. 根据权利要求5所述的基于深度神经网络的运动预测方法,其特征在于,对于平移运动,损失函数如下:The motion prediction method based on a deep neural network according to claim 5, wherein for translational motion, the loss function is as follows:
    Figure PCTCN2020080091-appb-100022
    Figure PCTCN2020080091-appb-100022
    Figure PCTCN2020080091-appb-100023
    刻画了预测的位移是否平行于真实的运动轴,具体计算公式为:
    Figure PCTCN2020080091-appb-100023
    It describes whether the predicted displacement is parallel to the real axis of motion. The specific calculation formula is:
    Figure PCTCN2020080091-appb-100024
    Figure PCTCN2020080091-appb-100024
    Figure PCTCN2020080091-appb-100025
    则要求每个点移动的距离相同,方差为0,具体计算公式为:
    Figure PCTCN2020080091-appb-100025
    It is required that the distance moved by each point is the same, and the variance is 0. The specific calculation formula is:
    Figure PCTCN2020080091-appb-100026
    Figure PCTCN2020080091-appb-100026
  7. 根据权利要求6所述的基于深度神经网络的运动预测方法,其特征在于,运动信息损失函数为:The motion prediction method based on a deep neural network according to claim 6, wherein the motion information loss function is:
    Figure PCTCN2020080091-appb-100027
    Figure PCTCN2020080091-appb-100027
    其中,d、x和t分别为运动轴方向、运动轴位置和运动类型,d gt是正确的运动轴方向,x gt是正确的运动轴位置,t gt是正确的运动类型,H是交叉熵。 Among them, d, x and t are the direction of the motion axis, the position of the motion axis and the type of motion, respectively, d gt is the correct direction of the motion axis, x gt is the correct position of the motion axis, t gt is the correct type of motion, and H is the cross entropy .
  8. 根据权利要求1所述的基于深度神经网络的运动预测方法,其特征在于,所述三维点云的点数为1024个。The motion prediction method based on a deep neural network according to claim 1, wherein the number of points in the three-dimensional point cloud is 1024.
  9. 一种智能终端,其特征在于,所述智能终端包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的基于深度神经网络的运动预测程序,所述基于深度神经网络的运动预测程序被所述处理器执行时实现如权利要求1-8任一项所述的基于深度神经网络的运动预测方法的步骤。An intelligent terminal, characterized in that, the intelligent terminal includes: a memory, a processor, and a motion prediction program based on a deep neural network that is stored in the memory and can run on the processor, and the deep neural network-based motion prediction program When the motion prediction program of the network is executed by the processor, the steps of the motion prediction method based on a deep neural network according to any one of claims 1-8 are realized.
  10. 一种存储介质,其特征在于,所述存储介质存储有基于深度神经网络的运动预测程序,所述基于深度神经网络的运动预测程序被处理器执行时实现如权利要求1-8任一项所述基于深度神经网络的运动预测方法的步骤。A storage medium, characterized in that the storage medium stores a motion prediction program based on a deep neural network, and when the motion prediction program based on a deep neural network is executed by a processor, it is implemented as described in any one of claims 1-8. The steps of the motion prediction method based on deep neural network are described.
PCT/CN2020/080091 2019-12-27 2020-03-19 Motion prediction method based on deep neural network, and intelligent terminal WO2021128611A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911378607.2 2019-12-27
CN201911378607.2A CN111080671B (en) 2019-12-27 2019-12-27 Motion prediction method based on deep neural network and intelligent terminal

Publications (1)

Publication Number Publication Date
WO2021128611A1 true WO2021128611A1 (en) 2021-07-01

Family

ID=70318616

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/080091 WO2021128611A1 (en) 2019-12-27 2020-03-19 Motion prediction method based on deep neural network, and intelligent terminal

Country Status (2)

Country Link
CN (1) CN111080671B (en)
WO (1) WO2021128611A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914946B (en) * 2020-08-19 2021-07-06 中国科学院自动化研究所 Countermeasure sample generation method, system and device for outlier removal method
CN112268564B (en) * 2020-12-25 2021-03-02 中国人民解放军国防科技大学 Unmanned aerial vehicle landing space position and attitude end-to-end estimation method
CN113313835B (en) * 2021-07-29 2021-11-09 深圳市数字城市工程研究中心 Building roof automatic modeling method based on airborne LiDAR point cloud

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109480838A (en) * 2018-10-18 2019-03-19 北京理工大学 A kind of continuous compound movement Intention Anticipation method of human body based on surface layer electromyography signal
WO2019099684A1 (en) * 2017-11-15 2019-05-23 Google Llc Unsupervised learning of image depth and ego-motion prediction neural networks
CN109948475A (en) * 2019-03-06 2019-06-28 武汉大学 A kind of human motion recognition method based on framework characteristic and deep learning
CN110293552A (en) * 2018-03-21 2019-10-01 北京猎户星空科技有限公司 Mechanical arm control method, device, control equipment and storage medium
US20190323852A1 (en) * 2018-03-15 2019-10-24 Blue Vision Labs UK Limited Enhanced vehicle tracking

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3579196A1 (en) * 2018-06-05 2019-12-11 Cristian Sminchisescu Human clothing transfer method, system and device
CN110473284B (en) * 2019-07-29 2021-02-12 电子科技大学 Moving object three-dimensional model reconstruction method based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019099684A1 (en) * 2017-11-15 2019-05-23 Google Llc Unsupervised learning of image depth and ego-motion prediction neural networks
US20190323852A1 (en) * 2018-03-15 2019-10-24 Blue Vision Labs UK Limited Enhanced vehicle tracking
CN110293552A (en) * 2018-03-21 2019-10-01 北京猎户星空科技有限公司 Mechanical arm control method, device, control equipment and storage medium
CN109480838A (en) * 2018-10-18 2019-03-19 北京理工大学 A kind of continuous compound movement Intention Anticipation method of human body based on surface layer electromyography signal
CN109948475A (en) * 2019-03-06 2019-06-28 武汉大学 A kind of human motion recognition method based on framework characteristic and deep learning

Also Published As

Publication number Publication date
CN111080671A (en) 2020-04-28
CN111080671B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
Han et al. Dynamic scene semantics SLAM based on semantic segmentation
Li et al. Weakly supervised salient object detection using image labels
WO2021128611A1 (en) Motion prediction method based on deep neural network, and intelligent terminal
Yan et al. RPM-Net: recurrent prediction of motion and parts from point cloud
CN100407798C (en) Three-dimensional geometric mode building system and method
Liang et al. Parsing the hand in depth images
US9361723B2 (en) Method for real-time face animation based on single video camera
Shi et al. RGB-D semantic segmentation and label-oriented voxelgrid fusion for accurate 3D semantic mapping
Tian et al. Joint temporal context exploitation and active learning for video segmentation
Bešić et al. Dynamic object removal and spatio-temporal RGB-D inpainting via geometry-aware adversarial learning
US20110208685A1 (en) Motion Capture Using Intelligent Part Identification
Tretschk et al. State of the Art in Dense Monocular Non‐Rigid 3D Reconstruction
Franco et al. Learning temporally consistent rigidities
Mahbub et al. Advances in human action, activity and gesture recognition
Guo et al. A visual navigation perspective for category-level object pose estimation
Ren et al. End-to-end weakly-supervised single-stage multiple 3D hand mesh reconstruction from a single RGB image
Jin et al. A novel vSLAM framework with unsupervised semantic segmentation based on adversarial transfer learning
Singh et al. Fast semantic-aware motion state detection for visual slam in dynamic environment
An et al. RS-AUG: Improve 3D object detection on LiDAR with realistic simulator based data augmentation
Dhore et al. Human Pose Estimation And Classification: A Review
Tian et al. RGB oralscan video-based orthodontic treatment monitoring
Wang et al. Swimmer’s posture recognition and correction method based on embedded depth image skeleton tracking
CN112967317B (en) Visual odometry method based on convolutional neural network architecture in dynamic environment
Luo et al. Robust 3D face modeling and tracking from RGB-D images
Tan et al. Active learning of neural collision handler for complex 3D mesh deformations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20906180

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.11.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20906180

Country of ref document: EP

Kind code of ref document: A1