CN108717732B - Expression tracking method based on MobileNet model - Google Patents

Expression tracking method based on MobileNet model Download PDF

Info

Publication number
CN108717732B
CN108717732B CN201810486472.0A CN201810486472A CN108717732B CN 108717732 B CN108717732 B CN 108717732B CN 201810486472 A CN201810486472 A CN 201810486472A CN 108717732 B CN108717732 B CN 108717732B
Authority
CN
China
Prior art keywords
model
layer
face
neural network
coordinates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810486472.0A
Other languages
Chinese (zh)
Other versions
CN108717732A (en
Inventor
饶云波
宋佳丽
吉普照
范柏江
苟苗
杨攀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201810486472.0A priority Critical patent/CN108717732B/en
Publication of CN108717732A publication Critical patent/CN108717732A/en
Application granted granted Critical
Publication of CN108717732B publication Critical patent/CN108717732B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention belongs to an expression tracking technology, and particularly relates to an expression tracking method based on a MobileNet model. The invention mainly comprises the following steps: generating a training data set through preprocessing, wherein the preprocessing is to enable the face of each picture in the data set to have three-dimensional characteristic coordinates; constructing a neural network MobileNet model by adopting a standard convolution layer, 12 separation convolution layers, 1 mean pooling layer, a full-connection layer and Softmax; the 12 separate convolution layers are 6 depth convolutions and 6 point convolutions; training the constructed neural network MobileNet model by using the obtained training data set; acquiring the coordinates of the three-dimensional feature points of the face of the input image by adopting a trained neural network MobileNet model; and performing grid reconstruction on the coordinates of the three-dimensional feature points of the face extracted by the model to generate a deformation coefficient, and controlling the 3D model of the face to realize expression tracking. The method has the advantages that the size and the running speed of the model are considered, so that the method is suitable for mobile equipment and has high practicability.

Description

Expression tracking method based on MobileNet model
Technical Field
The invention belongs to an expression tracking technology, and particularly relates to an expression tracking method based on a MobileNet model.
Background
With the improvement of hardware equipment, the facial expression tracking technology is gradually applied to various fields such as movie making, VR social interaction, game making and the like. Movies such as interstellar war and avatar fully utilize tracking technology in the production of expressions and actions of characters, and the movies have excellent expression and action extensibility in facial expressions and actions, thereby achieving the effect similar to that of real people. In addition, according to the research results of some famous psychologist in the United states, the human emotions transmitted in the social contact are 7% in characters, 38% in tone and 55% in expression. Aiming at the phenomenon of popularization of the current Internet, the research work of the facial expression tracking technology has very practical research significance for improving the effectiveness of communication and the interest of leisure and entertainment on the premise of respecting the privacy of users.
The positioning of the human face characteristic points is a key link in the facial expression tracking process, and whether the extraction of the characteristic points is accurate or not directly influences the authenticity of a subsequent expression mapping part. The main realization process is as follows: after a face is input by equipment or a source path, the positions of the five sense organs and the facial contour of the face are positioned, the coordinate values of the positioning points are extracted, and the coordinate values are used for modeling a triangular mesh in the expression mapping process. The extracted feature points are subjected to difference calculation with the actual feature point coordinate values of the human face to evaluate the accuracy of the algorithm, and different feature positioning algorithms are mainly evaluated from three aspects of positioning accuracy (accuracy), speed (speed) and robustness (robustness).
The traditional cascade regression method firstly constructs an initial face, and then gradually approximates to the shape of the real face through training of a plurality of weak regressors. However, once the original face shape is too far from the actual face shape, the subsequent regression optimization will also have a large deviation. At present, some researchers are working on improving the quality of an initial face, the effect is improved to some extent, but errors caused by the initial face cannot be completely avoided. In addition, many researchers achieve feature extraction by training a neural network model at present, the method mainly depends on preprocessing of training data and construction and optimization of the network model, and a large number of network structures achieve certain results in speed and precision at present, but the results still need to be improved.
Disclosure of Invention
Aiming at the problems, the invention realizes the extraction of the feature points of the human face by building and training a deep learning model MobileNet and realizes the tracking and migration from the human face expression to the animation model expression in a mode of generating a deformation coefficient by the feature points. With the popularization of networks and the emergence of various intelligent applications, simple text information and voice information are difficult to meet interesting requirements of users on daily social contact and game entertainment, and potential safety hazards such as personal safety of users and privacy disclosure of users are considered, the expression tracking technology effectively solves the problem and is mainly realized by tracking the expressions of the users and migrating the expressions to the face of a virtual model.
The technical scheme of the invention is as follows:
an expression tracking method based on a MobileNet model is characterized by comprising the following steps:
s1, generating a training data set through preprocessing, wherein the preprocessing is to enable the face of each picture in the data set to have three-dimensional characteristic coordinates;
s2, constructing a neural network MobileNet model by adopting a standard convolution layer, 12 separation convolution layers, 1 mean pooling layer, a full-connection layer and Softmax; the 12 separate convolution layers are 6 depth convolutions and 6 point convolutions;
training the constructed neural network MobileNet model by adopting the training data set obtained in the step S1;
s3, acquiring coordinates of three-dimensional feature points of the face of the input image by adopting the trained neural network MobileNet model;
and S4, performing grid reconstruction on the coordinates of the three-dimensional feature points of the face extracted by the model to generate a deformation coefficient, and controlling the 3D model of the face to realize expression tracking.
Further, the three-dimensional feature coordinates in step S1 are a plurality of three-dimensional feature coordinates including facial features and external contours.
Further, the specific method for training the constructed neural network MobileNets model in step S2 is as follows:
setting 64 three-dimensional characteristic coordinates in total, wherein a standard convolution layer of a first layer of the neural network MobileNet model comprises 64 convolution kernels, and the height and width of a training set picture are h and w respectively, then:
after the first layer of standard convolution layer, processing the input picture into a characteristic size with the step length of 2 and the convolution of (h/2) × (w/2) × 64;
in the second layer, sequentially iterating 12 layers by the step size of 1 or 2, and processing the feature map into a feature size of (h/32) × (w/32) × 1024;
the mean pooling layer normalizes the feature map to 1 × 1 × 1024 size by step size m;
and finally, classifying the features into 3 x 68 three-dimensional coordinate points through a full connection layer, and realizing feature extraction of the training set picture.
The invention has the advantages that the lightweight network MobileNet is used, and the size and the running speed of the model are considered, so the method is suitable for mobile equipment and has strong practicability.
Drawings
FIG. 1 is a diagram of a depth separable convolution;
FIG. 2 is a graph of training results;
FIG. 3 is a diagram illustrating a single picture test result;
fig. 4 is a diagram illustrating expression mapping results.
Detailed Description
The technical solution of the present invention is described in detail below with reference to the accompanying drawings and examples.
In the invention, firstly, facial feature points of a user are extracted, an original MobileNet model is improved into three-dimensional feature point data which is used for training and outputting, the three-dimensional feature point data comprises 68 feature points of five sense organs and external contours, and a data set comprises different facial images of the old, the young, the Chinese and foreign countries and the like. The trained model effectively realizes the extraction of the three-dimensional characteristics of the human face, the extracted characteristic information is subjected to triangular mesh reconstruction to generate a deformation coefficient, and the expression of the animation model is changed along with the change of the deformation coefficient.
The preparation and processing of the data set are the basis of model training, for example, 100 videos containing human faces are downloaded, the videos are simply processed, and the three-dimensional characteristic coordinates of the human faces in the videos are marked. And then, cutting the video into single pictures according to frames, and subpackaging the three-dimensional coordinates according to the single frames to finally obtain tens of thousands of pictures and corresponding labels thereof. In the selection of the video, people in different age groups of different countries in different scenes can be selected, and the facial expressions in the video are rich enough, so that the trained model has stronger robustness.
In order to meet the requirements of simplification of image acquisition equipment and high speed of feature extraction, the model is improved based on the lightweight deep neural network MobileNet. The conventional convolutional neural network has good use effect on image processing and target detection, and a model with higher accuracy is trained by using a deeper network structure, however, the network has the problems that the speed is difficult to increase, and the network cannot be embedded into a mobile device for use due to an overlarge model. Compared with the traditional convolutional neural network, the MobileNet model is different in that the lightweight neural network is constructed on the basis of a deep separable convolutional construction model, and the problems of operation speed and model size are considered on the premise of ensuring precision.
The depth separable convolution structure is shown in fig. 1, and it decomposes the standard convolution into a depth convolution for convolution filtering and a 1 x 1 point convolution that combines the outputs of the depth convolutions. Assuming M as the number of input channels, N as the number of output channels, DFFor the spatial width and height of the square input feature map, the temporal complexity of the standard convolution is Dk·Dk·M·N·DF·DFThe time complexity of the deep convolution of the present invention is Dk·Dk·M·DF·DF+M·N·DF·DFComparing the two results, the result is as follows:
therefore, the model effectively reduces the complexity of the calculation time and the size of the model. In the invention, the MobileNet model is applied to human face feature point extraction for the first time, the structure treats deep convolution and point convolution as two independent modules, each convolution operation is followed by a Batchnorm and a ReLU, and downsampling is processed in the deep convolution and the first layer of standard convolution. The model structure is improved as the subsequent work needs triangular mesh reconstruction of the extracted feature points, and the improved structure is shown in table 1, namely a standard convolution layer, 12 separated convolution layers (6 deep convolutions +6 point convolutions), 1 mean pooling layer, a full connection layer and Softmax. Assuming that a picture I with a height h and a width w is input, the first layer is a standard convolution containing 64 convolution kernels, and the picture I is convolved with a step size of 2 to a characteristic size of (h/2) × (w/2) × 64. The second layer starts with a combination of split convolution and point convolution, and iterates through the 12 layers in sequence with a step size of 1 or 2, to step the feature map to the exact feature size of (h/32) × (w/32) × 1024. And then, a mean pooling layer normalizes the feature graph into a size of 1 × 1 × 1024 by step length m, and finally classifies the features into 3 × 68 three-dimensional coordinate points through a full connection layer to realize feature extraction of the picture I.
TABLE 1 MobileTs network architecture
Figure BDA0001666844760000041
Taking the unified processing of the sizes of the input pictures as 224 × 224 as an example, the method specifically includes:
the method comprises the following steps: assuming that the data set comprises N face images and corresponding label files (the (x, y, z) coordinates of 68 feature points), the face images and the corresponding label files are divided into a training set and a verification set according to the proportion of 7:3, and a test set picture is prepared independently, and M pictures obtained by dividing a complete video into frames and corresponding labels are generally used as a test set. The invention uses twenty thousand pictures for training, the video time length used for testing is about 20 seconds, and the video time length is divided into about 600 pictures according to frames.
Step two: the model is trained under a Pythrch framework, which provides a Tensor supporting a CPU and a GPU, and can greatly accelerate the calculation. To reduce the delay of picture reading and processing, the resize of the input picture (height h, width w) is unified to 224 × 224, and the corresponding x, y coordinates are scaled down according to the same proportion, and the z coordinate is unchanged, as follows:
h_r=224/h,w_r=224/w (1)
new_x=x×h,new_y=y×w_r (2)
wherein h _ r refers to the compression ratio of the picture height, w _ r refers to the compression ratio of the picture width, new _ x refers to the compressed x coordinate, and new _ y refers to the compressed vertical coordinate. Due to the large data size, the training sample is divided into 128-sized batchs, the value of epoch is set to 20 every time, the constructed model is stored every 5 epochs, and in addition, due to the small parameters in the separation convolution, the weight attenuation value is set to a small value: 1 e-4. To evaluate the error between the model output coordinates and the true coordinates, the SmoothL1Loss function under Pythrch is used as the Loss function of the model as follows:
Figure BDA0001666844760000051
the functional error is a squared loss at (-1,1), otherwise the L1 loss. Where the subscript i refers to the ith element of x. The data set was iteratively trained on the NVIDIA GeForce GPU, with the results of each training (20epoch) shown in fig. 2.
After training is finished, as the models are saved every 5 epochs, a plurality of models are finally generated, and the generated models are used for processing the data of the verification set, wherein val _ size is set to be 4, parameters are continuously adjusted to obtain the optimal model, weight _ decay is set to be 1e-4, lr is 1 e-3, and lr _ decay is 0.95.
Step three: after the optimal model is obtained, the model is predicted by using the test data, and the batch _ size is set to 1. Assuming that a single picture m is input for testing, as shown in fig. 3, the test result is an array structure. In the test process, resize is carried out on the picture m, so that the final output result is correspondingly reduced in equal proportioni/h_r,Y=YiAnd amplifying in a/w _ r mode for subsequent grid reconstruction.
Step four: through the steps, the trained convolutional neural network MobileNet model can be used for obtaining the three-dimensional characteristic point coordinates of the human face in the video, and the three-dimensional characteristic point coordinates are recorded as: s ═ X1,Y1,Z1,X2,...,Y68,Z68)T∈R3n. The expression mapping from the characteristic points to the animation model is realized by adopting a two-layer feedforward neural network containing SigMID activation and linear output, and the network is a triangular mesh constructed by a set S
Figure BDA0001666844760000061
The distance between the vertices within serves as input data. Triangular mesh
Figure BDA0001666844760000062
The construction process comprises the following steps: first, a structure is constructed that contains all scatter points S ═ S1,s2,...snAnd n is more than or equal to 1 and less than or equal to 68, and placing the super triangles into a triangle linked list k. Then, the point s is inserted1Finding out the triangle set T containing s1 in the circumcircle in the triangle linked list, where T is { T { (T) }1,t2,...tnAnd deleting the common edge of the set T, connecting the point s1 with all the vertices of the triangle in the set T, and completing the insertion of the points s1 to k. And finally, circularly inserting scattered points in the set S and optimizing the triangle to construct a grid. And (3) calculating Euclidean distances between vertexes in the grid, inputting the Euclidean distances into the model to obtain a deformation coefficient which can be identified by the animation model, and putting the generated deformation coefficient into a Unity3D project to realize expression mapping, as shown in FIG. 4.
In summary, the key point of the invention is a research method for extracting coordinates of three-dimensional feature points of a human face by using a lightweight convolutional neural network MobileNets and converting the feature points into deformation coefficients which can be identified by an animation model so as to realize expression mapping.
Extracting three-dimensional coordinates of the facial feature points: the invention utilizes the high efficiency of the lightweight convolutional neural network MobileNet to improve the lightweight convolutional neural network MobileNet and use the lightweight convolutional neural network MobileNet for extracting the human face characteristic points. And 2 thousands of face images and corresponding label file training networks are manufactured, and the optimal network is obtained by continuously adjusting parameters and is used for feature point extraction. Because the image and the label thereof are compressed in proportion after being input into the network, the extracted data size is processed, and finally the feature point 3D coordinates of the human face are obtained based on the MobileNet model.
Realizing expression mapping from the human face to the animation model by using the feature point 3D coordinates: after the 3D coordinates of the human face feature points are obtained, triangular mesh reconstruction is carried out on the feature points, Euclidean distances among vertexes are calculated, 178 distance values are used as input of a model for data conversion, a deformation coefficient which can be identified by an animation model is obtained, and finally the deformation coefficient is put into Unity3D to achieve expression mapping.
Considering that the problem that the model speed is higher and the running speed is lower due to the trend that the accuracy is improved by using deeper and more complex networks in the current deep learning research, the invention uses lightweight networks MobileNet, which takes the size and the running speed of the model into account, so that the method is suitable for mobile equipment and has stronger practicability. With the popularization of mobile electronic products, the time of a user staying in the network world is gradually increased, and the demand is gradually increased. In order to protect the personal privacy of the user on the premise of ensuring the interest of the user in leisure and entertainment, the expression transplantation technology is gradually developed, and the expression transplantation technology is mainly expressed as follows: in network social contact or office, a user can communicate with strange friends on the network through an expression transplanting technology, and the expression change of the opposite side can be seen under the condition that the personal appearance is not exposed, so that the social efficiency is higher compared with simple text and voice communication; in a game scene, the control of the expression of a game character by the expression of a player can be realized, so that the immersion of the game player is improved, and various interesting games which attract the player can be designed by the technology; in the production of the film with cool scenes such as the romance or the cartoon and the like, the film can be recorded by controlling the expressions and the limbs of the character model by the actors, so that the safety of the actors is protected, and a large amount of time and money cost can be saved. Therefore, the expression transplantation technology has great research significance. The invention uses lightweight network MobileNet to extract facial features, and performs specific processing on feature points to obtain a deformation coefficient which can be identified by an animation model, and finally realizes expression mapping in Unity 3D. The analysis shows that the invention can be applied to a plurality of fields and has stronger commercial value.

Claims (1)

1. An expression tracking method based on a MobileNet model is characterized by comprising the following steps:
s1, generating a training data set through preprocessing, wherein the preprocessing is to enable the face of each picture in the data set to have a plurality of three-dimensional feature coordinates including the facial features and the external contour;
s2, constructing a neural network MobileNet model by adopting a standard convolution layer, 12 separation convolution layers, 1 mean pooling layer, a full-connection layer and Softmax; the 12 separate convolution layers are 6 depth convolutions and 6 point convolutions;
training the constructed neural network MobileNet model by adopting the training data set obtained in the step S1; the specific method comprises the following steps:
and setting 68 three-dimensional characteristic coordinates in total, wherein the standard convolution layer of the first layer of the neural network MobileNet model comprises 64 convolution kernels, and the height of the training set picture is h, and the width of the training set picture is w:
after the first layer of standard convolution layer, processing the input picture into a characteristic size with the step length of 2 and the convolution of (h/2) × (w/2) × 64;
in the second layer, sequentially iterating 12 layers by the step size of 1 or 2, and processing the feature map into a feature size of (h/32) × (w/32) × 1024;
the mean pooling layer normalizes the feature map to 1 × 1 × 1024 size by step size m;
finally, classifying the features into 3 x 68 three-dimensional coordinate points through a full connection layer, and realizing feature extraction of the training set picture;
s3, acquiring coordinates of three-dimensional feature points of the face of the input image by adopting the trained neural network MobileNet model;
and S4, performing grid reconstruction on the coordinates of the three-dimensional feature points of the face extracted by the model to generate a deformation coefficient, and controlling the 3D model of the face to realize expression tracking.
CN201810486472.0A 2018-05-21 2018-05-21 Expression tracking method based on MobileNet model Active CN108717732B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810486472.0A CN108717732B (en) 2018-05-21 2018-05-21 Expression tracking method based on MobileNet model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810486472.0A CN108717732B (en) 2018-05-21 2018-05-21 Expression tracking method based on MobileNet model

Publications (2)

Publication Number Publication Date
CN108717732A CN108717732A (en) 2018-10-30
CN108717732B true CN108717732B (en) 2022-05-17

Family

ID=63900143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810486472.0A Active CN108717732B (en) 2018-05-21 2018-05-21 Expression tracking method based on MobileNet model

Country Status (1)

Country Link
CN (1) CN108717732B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711258A (en) * 2018-11-27 2019-05-03 哈尔滨工业大学(深圳) Lightweight face critical point detection method, system and storage medium based on convolutional network
CN109753996B (en) * 2018-12-17 2022-05-10 西北工业大学 Hyperspectral image classification method based on three-dimensional lightweight depth network
CN111332305A (en) * 2018-12-18 2020-06-26 朱向雷 Active early warning type traffic road perception auxiliary driving early warning system
CN110009015A (en) * 2019-03-25 2019-07-12 西北工业大学 EO-1 hyperion small sample classification method based on lightweight network and semi-supervised clustering
CN110415323B (en) * 2019-07-30 2023-05-26 成都数字天空科技有限公司 Fusion deformation coefficient obtaining method, fusion deformation coefficient obtaining device and storage medium
CN110782396B (en) * 2019-11-25 2023-03-28 武汉大学 Light-weight image super-resolution reconstruction network and reconstruction method
CN111191620B (en) * 2020-01-03 2022-03-22 西安电子科技大学 Method for constructing human-object interaction detection data set
CN113554734A (en) * 2021-07-19 2021-10-26 深圳东辉盛扬科技有限公司 Animation model generation method and device based on neural network

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778657A (en) * 2016-12-28 2017-05-31 南京邮电大学 Neonatal pain expression classification method based on convolutional neural networks

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10902243B2 (en) * 2016-10-25 2021-01-26 Deep North, Inc. Vision based target tracking that distinguishes facial feature targets
CN106919903B (en) * 2017-01-19 2019-12-17 中国科学院软件研究所 robust continuous emotion tracking method based on deep learning
CN107273933A (en) * 2017-06-27 2017-10-20 北京飞搜科技有限公司 The construction method of picture charge pattern grader a kind of and apply its face tracking methods

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778657A (en) * 2016-12-28 2017-05-31 南京邮电大学 Neonatal pain expression classification method based on convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
安防领域人工智能深度神经网络算法的创新突破;吴参毅;《中国安防》;20171101(第11期);第67-71页 *

Also Published As

Publication number Publication date
CN108717732A (en) 2018-10-30

Similar Documents

Publication Publication Date Title
CN108717732B (en) Expression tracking method based on MobileNet model
CN111354079B (en) Three-dimensional face reconstruction network training and virtual face image generation method and device
CN109508669B (en) Facial expression recognition method based on generative confrontation network
CN111325851B (en) Image processing method and device, electronic equipment and computer readable storage medium
CN111242841B (en) Image background style migration method based on semantic segmentation and deep learning
CN110033505A (en) A kind of human action capture based on deep learning and virtual animation producing method
CN108334816A (en) The Pose-varied face recognition method of network is fought based on profile symmetry constraint production
CN108961369A (en) The method and apparatus for generating 3D animation
CN105931283B (en) A kind of 3-dimensional digital content intelligence production cloud platform based on motion capture big data
Po et al. State of the art on diffusion models for visual computing
CN112085835B (en) Three-dimensional cartoon face generation method and device, electronic equipment and storage medium
CN112102480B (en) Image data processing method, apparatus, device and medium
CN113780249B (en) Expression recognition model processing method, device, equipment, medium and program product
Huang et al. Real-world automatic makeup via identity preservation makeup net
CN110176079A (en) A kind of three-dimensional model deformation algorithm based on quasi- Conformal
Marques et al. Deep spherical harmonics light probe estimator for mixed reality games
Wu et al. 3D film animation image acquisition and feature processing based on the latest virtual reconstruction technology
CN114202615A (en) Facial expression reconstruction method, device, equipment and storage medium
CN113222808A (en) Face mask removing method based on generative confrontation network
Chen et al. Double encoder conditional GAN for facial expression synthesis
Zhou et al. Deeptree: Modeling trees with situated latents
CN111368734A (en) Micro expression recognition method based on normal expression assistance
CN114419182A (en) Image processing method and device
Wang et al. Generative model with coordinate metric learning for object recognition based on 3D models
CN114529649A (en) Image processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant