CN112115954A

CN112115954A - Feature extraction method and device, machine readable medium and equipment

Info

Publication number: CN112115954A
Application number: CN202011055038.0A
Authority: CN
Inventors: 姚志强; 周曦; 曹睿
Original assignee: Guangzhou Cloudwalk Artificial Intelligence Technology Co ltd
Current assignee: Guangzhou Cloudwalk Artificial Intelligence Technology Co ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2020-12-22
Anticipated expiration: 2040-09-30
Also published as: CN112115954B

Abstract

The invention provides a feature extraction method, which comprises the following steps: acquiring original point cloud data of an image to be processed; sequentially overlapping the original point cloud data by using voxel grids of various sizes to perform space division, taking a point cloud layer formed by the last space division as a point cloud layer to be divided next time, and constructing a topological structure overlapped by multiple point cloud layers layer by layer; and extracting the characteristics of the original point cloud data based on a first point cloud depth learning network and the topological structure to obtain target characteristics. According to the method, a topological structure formed by overlapping a plurality of layers of point clouds is obtained by processing the original point clouds, the calculation and storage complexity of the topological structure is O (n), huge storage space occupation and calculation quantity requirements are avoided, meanwhile, the point cloud data are directly processed, so that the integrity and invariance of the 3-dimensional data are kept, the advantages of the 3-dimensional data are fully exerted, and the 3D biological identification technology based on the method has good effects on calculation efficiency and precision.

Description

Feature extraction method and device, machine readable medium and equipment

Technical Field

The invention relates to the field of feature processing, in particular to a feature extraction method, a feature extraction device, a machine readable medium and equipment.

Background

With the rise of artificial intelligence trend, the 2D face recognition technology based on the two-dimensional image is rapidly developed and matured and is applied in a large scale. However, in many cases (such as uneven illumination, insufficient illumination, large face angle, and makeup), the image texture of the two-dimensional face image changes greatly, which results in a great decrease in accuracy. The three-dimensional information is not interfered by texture change, and the problem can be well solved if face recognition is carried out on the basis of the three-dimensional information. In addition, the face recognition system using the 3-dimensional information can not be attacked by two-dimensional image and video spoofing, which is a great hidden danger of the current 2D face recognition technology.

However, the current 3D face recognition technology based on 3-dimensional information has a certain distance from a large number of landings. The current three-dimensional face recognition technology mainly comprises the following methods according to the extraction mode of the face features:

(1) the method based on artificial features comprises the following steps: processing the face data into three-dimensional point cloud or triangular mesh through information acquisition and pretreatment; and extracting artificial features of the processed three-dimensional face data by a method designed by human experts, such as calculating the curvature and the normal of the three-dimensional point cloud, designing SIFT, MMH and other features, and matching according to the extracted features, thereby distinguishing whether different face data belong to the same person. The disadvantage of this kind of algorithm is that the artificial features are mainly based on three-dimensional models, which change more when the facial expression changes more, and are therefore more sensitive to the facial expression. And some global artificial features such as a method for calculating parameters of a three-dimensional deformation model can solve the problem of expression change, but a large amount of extra calculation is needed. Meanwhile, similar to the field of 2-dimensional face recognition, artificial features do not perform well on large-scale three-dimensional face test sets.

(2) The method based on deep learning comprises the following steps: due to the success of deep learning in two-dimensional face recognition, the academic community also focuses on the study of three-dimensional face recognition by using a deep learning method. The deep learning-based approach also faces a great challenge, so that no large-scale application is available at present. The method based on deep learning mainly depends on the layer-by-layer superposition of a convolutional neural network and continuously abstracts low-level features, so that high-level face features are automatically learned. Currently, there are several main types of directions for using neural networks for 3-dimensional data:

a) simulating a processing mode of 2-dimensional data, expressing 3-dimensional data into 3-dimensional space grid information, expanding 2-dimensional convolution into 3-dimensional convolution, building a deep learning network by using a 3-dimensional convolution module and processing 3-dimensional information, but increasing one dimension brings exponential increase to data space occupation and calculation consumption, and the complexity is O (n) (n is an exponential increase)³) Therefore, the data accuracy which can be processed by limited resources is greatly reduced, the calculation time consumption is greatly enhanced, and the method is difficult to be used in engineering practice at present.

b) Another idea is to directly give the depth picture taken by the camera such as structured light, TOF and the like to the traditional 2-dimensional neural network for processing, but in doing so, many important 3-dimensional information is lost: firstly, the 2-dimensional depth map is only observation data of one visual angle, and as with the 2-dimensional face recognition situation, a large amount of invisible 3-dimensional face information is lost; secondly, depth map information is directly used, distortion caused by camera imaging is introduced as in the case of 2-dimensional face recognition, spatial invariance of 3-dimensional data is lost, and the advantages of the 3-dimensional data cannot be exerted.

c) Point cloud processing networks such as the typical point cloud processing network PointNet series (point cloud network), and DGCNN (dynamic graph rolling network) are designed. However, these networks need to calculate the distances between points inside the point cloud in real time, so as to establish the topological structure relationship inside the point cloud, so that the neural network can extract features on the topological structure layer by layer. However, the distance calculation is very memory and cpu consuming and has high complexity.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, the present invention provides a feature extraction method, device, machine-readable medium and apparatus, which are used to solve the problems of the prior art.

To achieve the above and other related objects, the present invention provides a feature extraction method, including:

acquiring original point cloud data of an image to be processed;

sequentially overlapping the original point cloud data by using voxel grids of various sizes to perform space division, taking a point cloud layer formed by the last space division as a point cloud layer to be divided next time, and constructing a topological structure overlapped by multiple point cloud layers layer by layer;

and extracting the characteristics of the original point cloud data based on a first point cloud deep learning network and the topological structure to obtain target characteristics.

Optionally, the number of point cloud layers of the topology structure is the same as the number of layers of the first point cloud deep learning network.

Optionally, the sequentially superimposing the original point cloud data by using voxel grids of various sizes to perform spatial division, and taking a point cloud layer formed by the last spatial division as a point cloud layer to be divided next time includes:

performing spatial division on the original point cloud data by using a voxel grid so that each point in the original point cloud data corresponds to a voxel space, wherein the original point cloud data is an original point cloud layer;

determining the center of each voxel space as a point in a new point cloud layer to generate a new point cloud layer;

the new point cloud layer is spatially divided by another voxel grid, each point in the new point cloud layer is made to correspond to a voxel space, the center of each voxel space is determined as the latest point in the point cloud layer, and so on,

and each point in each layer of point cloud layer corresponds to the voxel grid number of the corresponding voxel space.

Optionally, the generating process of the voxel grid number includes:

numbering p each point in the point cloud layer_iConversion to voxel coordinates

The transformation method comprises the following steps:

n, n is the total number of point clouds, p_minIs the minimum value in the point cloud coordinates x, y, z, and v is the size of the voxel grid;

for the voxel coordinate

Rounding to obtain the voxel grid number;

traversing the valid voxel coordinates

A valid voxel is obtained.

Optionally, the first point cloud deep learning network includes:

the characteristic processing modules are connected in sequence and used for carrying out characteristic processing on the original point cloud data for multiple times; the characteristic processing module corresponds to each layer of the first point cloud deep learning network one by one;

the input of the latter characteristic processing module is the output of the former characteristic processing module; the quantity of the features output by the latter feature processing module is smaller than that of the features output by the former feature processing module, and the dimensionality of the features output by the latter feature processing module is larger than that of the features output by the former feature processing module.

Optionally, each of the feature processing modules includes:

mlp submodule for performing feature calculation and dimension expansion on each point in the input point cloud;

the characteristic division submodule is used for dividing the characteristics corresponding to the point clouds with the same voxel grid number into a group to obtain a characteristic group;

the characteristic pooling submodule is used for performing pooling operation on the characteristic group to obtain pooling characteristics; the pooled feature is an output of the feature processing module.

Optionally, the mlp submodule includes a fully connected linear unit, a batch normalization unit, and an activation unit.

Optionally, the feature processing module further includes a residual sub-module and a feature splicing sub-module.

Optionally, if the object to be processed is a human face, the method further includes:

aligning the face point cloud data based on the target features;

performing feature extraction on the aligned face point cloud data by using a second point cloud deep learning model; the depth of the second point cloud deep learning model is larger than that of the first point cloud deep learning model.

To achieve the above and other related objects, the present invention provides a feature extraction device, comprising:

the point cloud data acquisition module is used for acquiring original point cloud data of an image to be processed;

the topological structure building module is used for sequentially superposing the original point cloud data by utilizing voxel grids with various sizes to perform space division, a point cloud layer formed by the last time of space division is used as a next divided point cloud layer, and a topological structure superposed layer by a plurality of point cloud layers is built;

and the characteristic extraction module is used for extracting the characteristics of the original point cloud data based on the first point cloud deep learning network and the topological structure to obtain target characteristics.

Optionally, the topology building module includes:

an initial point cloud layer construction submodule, configured to perform spatial division on the original point cloud data by using a voxel grid, so that each point in the original point cloud data corresponds to a voxel space, where the original point cloud data is an initial point cloud layer;

the first point cloud layer generating module is used for determining the center of each voxel space as a point in a new point cloud layer so as to generate a new point cloud layer;

the first point cloud layer generation module is used for carrying out space division on the new point cloud layer by using another voxel grid, enabling each point in the new point cloud layer to correspond to a voxel space, determining the center of each voxel space as the latest point in the point cloud layer, and so on;

Optionally, the generating process of the voxel grid number includes:

The transformation method comprises the following steps:

for the voxel coordinate

Rounding to obtain the voxel grid number;

traversing the valid voxel coordinates

A valid voxel is obtained.

Optionally, the first point cloud deep learning network includes:

Optionally, the feature processing module includes:

mlp submodule for performing dimension expansion on input data;

To achieve the above and other related objects, the present invention also provides an apparatus comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform one or more of the methods described previously.

To achieve the above objects and other related objects, the present invention also provides one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform one or more of the methods described above.

As described above, the feature extraction method, apparatus, machine-readable medium and device provided by the present invention have the following beneficial effects:

the invention relates to a feature extraction method, which comprises the following steps: acquiring original point cloud data of an image to be processed; sequentially overlapping the original point cloud data by using voxel grids of various sizes to perform space division, taking a point cloud layer formed by the last space division as a point cloud layer to be divided next time, and constructing a topological structure overlapped by multiple point cloud layers layer by layer; and extracting the characteristics of the original point cloud data based on a first point cloud deep learning network and the topological structure to obtain target characteristics. According to the method, a topological structure formed by overlapping a plurality of layers of point clouds is obtained by processing the original point clouds, the calculation and storage complexity of the topological structure is O (n), huge storage space occupation and calculation quantity requirements are avoided, meanwhile, the point cloud data are directly processed, so that the integrity and invariance of the 3-dimensional data are kept, the advantages of the 3-dimensional data are fully exerted, and the 3D biological identification technology based on the method has good effects on calculation efficiency and precision.

Drawings

FIG. 1 is a flow chart of a feature extraction method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for sequentially overlaying the original point cloud data with voxel grids of various sizes to perform spatial division according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a first point cloud deep learning network according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a feature extraction apparatus according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

As shown in fig. 1, the present invention provides a feature extraction method, including:

s11, acquiring original point cloud data of the image to be processed;

s12, sequentially superposing the original point cloud data by using voxel grids of various sizes to perform space division, taking a point cloud layer formed by the last space division as a next divided point cloud layer, and constructing a topological structure formed by superposing multiple point cloud layers layer by layer;

s13, extracting the features of the original point cloud data based on the first point cloud deep learning network and the topological structure to obtain target features. And the number of the point cloud layers of the topological structure is the same as that of the first point cloud deep learning network.

According to the invention, a topological structure formed by overlapping a plurality of layers of point clouds is obtained by processing the original point clouds, the calculation and storage complexity of the topological structure is O (n), huge storage space occupation and calculation quantity requirements are avoided, and meanwhile, the point cloud data is directly processed, so that the completeness and invariance of the 3-dimensional data are kept, and the advantages of the 3-dimensional data are fully exerted.

When the invention carries out hierarchical feature processing on point cloud, the computational space complexity of the topological structure is O (n), which is far superior to the prior common point cloud processing method: the complexity of the 3-dimensional convolution is O (n)³) And the complexity O (n) of the DGCNN of the dynamic graph convolutional network²). Compared with a 2D method for processing a depth map, the method provided by the invention has the advantages that point cloud data is directly processed, more 3-dimensional information is reserved, the distortion introduced by a camera is avoided, multi-frame data fusion and face rotation alignment can be simply and conveniently carried out, and the method is difficult to realize by the 2D method. The 3D face recognition technology based on the method has good effects on the calculation efficiency and precision.

In an embodiment, the original point cloud of the object to be processed may be converted into a point cloud by acquiring data through devices such as structured light, TOF camera, light field camera, lidar, and the like. After the point cloud is collected, each point in the point cloud can be numbered, wherein the number is p_iI is 0,1,2, and n is the total number of the point clouds.

The point cloud data refers to 3D point cloud data. The 3D point cloud data is a data of a structure recorded in the form of data points, each data point including three-dimensional coordinates. For example, the coordinate values may be on the x, y, z axes. Of course, each data point may also include other information such as a gray scale, a color, and the like, which is not limited in this embodiment.

If the object to be processed is a person, human body point cloud data are acquired, and if face recognition is needed, the human face point cloud data can be extracted from the human body point cloud data only by the point cloud data of the face part; of course, the human face point cloud data can also be directly collected.

In an embodiment, as shown in fig. 2, the sequentially superimposing the voxel grids of multiple sizes on the original point cloud data to perform spatial division, where a point cloud layer formed by a last spatial division is used as a next divided point cloud layer, and the method includes:

s21 spatially partitioning the original point cloud data using a voxel grid such that each point in the original point cloud data corresponds to a voxel space, wherein the original point cloud data is an original point cloud layer;

a voxel grid may be considered as a collection of tiny spatial three-dimensional cubes. And a voxel, or voxel, is an abbreviation of a volume pixel (voxel). Conceptually similar to the smallest unit of a two-dimensional space, a pixel, which is used on video data of a two-dimensional computer image. A volume pixel is the smallest unit of digital data on a three-dimensional partition, which can be understood as a three-dimensional cube in a voxel grid.

The voxel network may be created by the acquired point cloud, or a voxel grid may be given, and the size of the voxel grid is defined, where the size refers to the size of the voxel grid, i.e., the size of a three-dimensional cube. In the present embodiment, given the size v0 of the voxel grid, a uniform spatial division of the acquired point cloud is performed according to the size v0 of the voxel grid, each point p_iAll are divided into a corresponding voxel space, and the number of the voxel space is corresponding to p_iThe above. Note that each point p_iThe image is divided into a corresponding voxel space, and each voxel space can correspond to a plurality of points.

S22 determining a center of each voxel space as a point in a new point cloud layer to generate a new point cloud layer;

s23, using another voxel grid to perform space division to the new point cloud layer, making each point in the new point cloud layer correspond to a voxel space, determining the center of each voxel space as the point in the latest point cloud layer, and so on,

New points can be used

Showing that, the number of points is n1, a voxel size is given v1, v1 is more than or equal to v0, and the step S21 and the step S23 are executed to obtain a new layer of points

The number of dots is n₂And repeating the steps for m times to obtain a plurality of layers of point clouds, and overlapping the plurality of layers of point clouds to obtain a topological structure overlapped layer by layer. The computational space complexity of such a topology is o (n), which is negligible with respect to the overall system.

In an embodiment, the generating of the voxel grid number includes:

The transformation method comprises the following steps:

n, n is the total number of point clouds, p_minIs the minimum value in the point cloud coordinates x, y, z, and v is the size of the voxel grid; for the voxel coordinate

Rounding to obtain the voxel gridNumbering; traversing the valid voxel coordinates

A valid voxel is obtained. And voxel representation (complexity of O (n)³) Unlike, here, only valid voxels are processed and stored, which is not exponentially more computationally expensive to store and is only o (n) complex.

In an embodiment, as illustrated in fig. 3, the first point cloud deep learning network includes:

the feature processing modules 31 are connected in sequence and used for performing feature processing on the original point cloud data for multiple times; the number of the characteristic processing modules is the same as the number of layers of the point cloud layers in the topological structure; the characteristic processing module corresponds to each layer of the first point cloud deep learning network one by one;

As shown in fig. 3, the feature processing module includes:

mlp submodule 311, configured to perform feature calculation on each point in the input point cloud and perform dimension expansion;

a feature division submodule 312, configured to divide features corresponding to point clouds with the same voxel grid number into a group, so as to obtain a feature group;

the characteristic pooling submodule 313 is used for performing pooling operation on the characteristic group to obtain pooling characteristics; the pooled feature is an output of the feature processing module.

For example, the input is original point cloud data, which may be considered as a first layer point cloud, the number is n, the dimension is d0, if the point cloud is pure three-dimensional data, d0 is 3, if the point cloud is also appended with color information, the color may be stitched as an additional dimension behind the three-dimensional coordinate dimension, at this time, d0 is 3+ c, and c is the dimension of the color. And (2) performing feature calculation on each point cloud in the first-layer point cloud by using an mlp submodule (comprising a fully-connected linear unit, a batch normalization unit and an activation unit) to obtain feature points, wherein the feature points comprise feature quantity and feature dimensions, after the feature points are processed by a mlp submodule, the feature dimensions are expanded to d1, d1> d0, the feature quantity is n, and the number of each feature corresponds to the midpoint of the point cloud.

After the dimension of the feature is expanded, the feature points corresponding to the same voxel grid number are divided into a group through a feature division submodule by utilizing the relation index of the topological structure to obtain a feature group; finally, the feature pooling submodule performs a pooling operation on the features in the feature group, so that n1(n1 < n) new features are obtained, and the n1 new features can be regarded as pooled features. And continuously using the mlp module to perform feature extraction on the second-layer point cloud, expanding the feature dimension to d2, at the moment, dividing the feature points corresponding to the same voxel grid number by using the relationship index of the topological structure through the feature division submodule to obtain a feature group, and then performing pooling operation on the feature group. This is repeated to obtain a high-dimensional vector dm. The layer number of the first point cloud deep learning model is adjustable, the deeper the layer number is, the larger the calculated amount is, but more abstract and effective characteristics can be extracted.

For example, the following steps are carried out: the original point cloud data is ABCDEF, when the voxel grid is divided for the first time, two points AB are in the same voxel space, the number of the corresponding voxel grid is 1, and the center of the voxel space is a point in a second layer of point cloud layers; two points of the CD are in another voxel space, the number of a corresponding voxel grid is 2, and the center of the voxel space is a point in a second layer of point cloud; EF two points are in the same voxel space, the corresponding number is 3, and the center of the voxel space is the point in the second layer of point cloud. When the voxel grid division is performed for the second time, the point (two voxel centers) in the second layer corresponding to ABCD is divided into a voxel space, the number f1 of the corresponding voxel grid is determined, the point in the second layer corresponding to EF is divided into another voxel space, the number f2 of the corresponding voxel grid is determined, and the topology structure is obtained by analogy.

Inputting the original point cloud data ABCDEF to the first point cloud depthIn the learning network, feature extraction is carried out through a first mlp module to obtain A₁B₁C₁D₁E₁F₁. Performing pooling treatment on the extracted features, wherein according to the numbering relationship of the previous topological structure, since AB numbers are the same, CD numbers are the same, and EF numbers are the same, A numbers are the same₁B₁Is divided into a group, C₁D₁Is divided into a group, E₁F₁Are divided into a group. And (3) performing feature synthesis and pooling on the features in each group to obtain a feature vector for each group, and finally obtaining 3 new feature vectors (T1, T2 and T3 respectively). And inputting the 3 new feature vectors into a next mlp module for feature extraction processing to obtain new feature vectors T11, T22 and T33. Since the points in the cloud layer corresponding to the points in the second layer are divided into the same voxel space at the second voxel grid division, the point is divided into another voxel space at the number f1, the point in the second layer corresponding to the EF is divided into another voxel space at the number f2 corresponding to the voxel grid, the eigenvectors T11 and T22 are divided into one group, the T33 is divided into another group, the pooling operation is continued to be performed on the two groups of characteristics, new eigenvectors S1 and S2 are output respectively, and the like until the topmost layer of the topological structure, 1 eigenvector is finally output, and the dimension of the eigenvector is expanded to d_mDimension.

In an embodiment, the feature processing module further includes a residual sub-module and a feature splicing sub-module. By means of these two sub-modules, the corresponding performance can be enhanced.

In an embodiment, if the object to be processed is a human face, the method further includes:

aligning the face point cloud data based on the target features;

specifically, the pose (including the rotation angle and the offset) of the face is predicted based on the target features, and the face point cloud data is rotated and translated to a uniform direction based on the pose of the face. The first point cloud deep learning model has a smaller number of layers, since the pose prediction does not require particularly abstract features.

After the face point cloud data are aligned, the second point cloud deep learning model is favorable for extracting the features of the aligned face point cloud data.

The structure of the second point cloud deep learning model is similar to that of the first point cloud deep learning model, but the depth of the second point cloud deep learning model is larger than that of the first point cloud deep learning model.

The method for extracting the features of the aligned face point cloud data by using the second point cloud deep learning model is the same as the method for extracting the features of the original point cloud data by using the first point cloud deep learning model, except that the first point cloud deep learning model extracts the face features of the original face point cloud data, and the second point cloud deep learning model extracts the face features of the aligned face point cloud data. The method and the model used between the two can be mutually referred.

As shown in fig. 4, an embodiment of the present invention further provides a feature extraction device, including:

a point cloud data obtaining module 41, configured to obtain original point cloud data of an image to be processed;

a topological structure constructing module 42, configured to be a topological structure constructing module, configured to sequentially superimpose the original point cloud data by using voxel grids of multiple sizes to perform spatial division, where a point cloud layer formed by last spatial division is used as a point cloud layer to be divided next time, and a topological structure formed by superimposing multiple point cloud layers layer by layer is constructed;

and a feature extraction module 43, configured to perform feature extraction on the original point cloud data based on the first point cloud deep learning network and the topology structure, so as to obtain a target feature. And the number of the point cloud layers of the topological structure is the same as that of the first point cloud deep learning network.

Hair brushObviously, when hierarchical feature processing is performed on point clouds, the computational space complexity of a topological structure is O (n), which is far superior to the current common point cloud processing method: the complexity of the 3-dimensional convolution is O (n)³) And the complexity O (n) of the DGCNN of the dynamic graph convolutional network²). Compared with a 2D method for processing a depth map, the method provided by the invention has the advantages that point cloud data is directly processed, more 3-dimensional information is reserved, the distortion introduced by a camera is avoided, multi-frame data fusion and face rotation alignment can be simply and conveniently carried out, and the method is difficult to realize by the 2D method. The 3D face recognition technology based on the method has good effects on the calculation efficiency and precision.

In one embodiment, the topology construction module includes:

In an embodiment, the generating of the voxel grid number includes:

The transformation method comprises the following steps:

for the voxel coordinate

Rounding to obtain the voxel grid number;

traversing the valid voxel coordinates

A valid voxel is obtained.

In one embodiment, the first point cloud deep learning network comprises:

In one embodiment, the feature processing module comprises:

mlp submodule for performing dimension expansion on input data;

Since the device embodiment corresponds to the method embodiment, the method embodiment may be referred to in specific embodiments of the device embodiment, and details are not described here.

An embodiment of the present application further provides an apparatus, which may include: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of fig. 1. In practical applications, the device may be used as a terminal device, and may also be used as a server, where examples of the terminal device may include: the mobile terminal includes a smart phone, a tablet computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a vehicle-mounted computer, a desktop computer, a set-top box, an intelligent television, a wearable device, and the like.

The present application further provides a non-transitory readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the device may be caused to execute instructions (instructions) of steps included in the method in fig. 1 according to the present application.

Fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present application. As shown, the terminal device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.

Alternatively, the first processor 1101 may be, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the first processor 1101 is coupled to the input device 1100 and the output device 1102 through a wired or wireless connection.

Optionally, the input device 1100 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; the output devices 1102 may include output devices such as a display, audio, and the like.

In this embodiment, the processor of the terminal device includes a module for executing functions of each module in each device, and specific functions and technical effects may refer to the foregoing embodiments, which are not described herein again.

Fig. 6 is a schematic hardware structure diagram of a terminal device according to an embodiment of the present application. FIG. 6 is a specific embodiment of the implementation of FIG. 5. As shown, the terminal device of the present embodiment may include a second processor 1201 and a second memory 1202.

The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the method described in fig. 1 in the above embodiment.

The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

Optionally, a second processor 1201 is provided in the processing assembly 1200. The terminal device may further include: communication component 1203, power component 1204, multimedia component 1205, speech component 1206, input/output interfaces 1207, and/or sensor component 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.

The processing component 1200 generally controls the overall operation of the terminal device. The processing assembly 1200 may include one or more second processors 1201 to execute instructions to perform all or part of the steps of the data processing method described above. Further, the processing component 1200 can include one or more modules that facilitate interaction between the processing component 1200 and other components. For example, the processing component 1200 can include a multimedia module to facilitate interaction between the multimedia component 1205 and the processing component 1200.

The power supply component 1204 provides power to the various components of the terminal device. The power components 1204 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal device.

The multimedia components 1205 include a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The voice component 1206 is configured to output and/or input voice signals. For example, the voice component 1206 includes a Microphone (MIC) configured to receive external voice signals when the terminal device is in an operational mode, such as a voice recognition mode. The received speech signal may further be stored in the second memory 1202 or transmitted via the communication component 1203. In some embodiments, the speech component 1206 further comprises a speaker for outputting speech signals.

The input/output interface 1207 provides an interface between the processing component 1200 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.

The sensor component 1208 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor component 1208 may detect an open/closed state of the terminal device, relative positioning of the components, presence or absence of user contact with the terminal device. The sensor assembly 1208 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor assembly 1208 may also include a camera or the like.

The communication component 1203 is configured to facilitate communications between the terminal device and other devices in a wired or wireless manner. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card therein, so that the terminal device may log onto a GPRS network to establish communication with the server via the internet.

As can be seen from the above, the communication component 1203, the voice component 1206, the input/output interface 1207 and the sensor component 1208 referred to in the embodiment of fig. 6 can be implemented as the input device in the embodiment of fig. 5.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A method of feature extraction, comprising:

acquiring original point cloud data of an image to be processed;

2. The feature extraction method according to claim 1, wherein the number of point cloud layers of the topology is the same as the number of layers of the first point cloud deep learning network.

3. The feature extraction method according to claim 1, wherein the sequentially superimposing the original point cloud data by using the voxel grids of various sizes to perform spatial division, and taking a point cloud layer formed by a last spatial division as a point cloud layer to be divided next time comprises:

4. The feature extraction method according to claim 3, wherein the generation process of the voxel grid number includes:

The transformation method comprises the following steps:

for the voxel coordinate

Rounding to obtain the voxel grid number;

traversing the valid voxel coordinates

A valid voxel is obtained.

5. The feature extraction method according to claim 1, wherein the first point cloud deep learning network includes:

the characteristic processing modules are connected in sequence and used for carrying out characteristic processing on the original point cloud data for multiple times; the plurality of feature processing modules correspond to the point cloud layers of the topological structure one by one;

6. The feature extraction method according to claim 5, wherein each of the feature processing modules comprises:

7. The feature extraction method of claim 6, wherein the mlp sub-modules comprise a fully connected linear unit, a batch normalization unit and an activation unit.

8. The feature extraction method of claim 7, wherein the feature processing module further comprises a residual sub-module and a feature splicing sub-module.

9. The feature extraction method according to claim 1, wherein if the object to be processed is a human face, the method further comprises:

aligning the face point cloud data based on the target features;

10. A feature extraction device characterized by comprising:

11. The feature extraction apparatus according to claim 10, wherein the topology construction module includes:

12. The feature extraction apparatus according to claim 11, wherein the generation process of the voxel grid number includes:

The transformation method comprises the following steps:

n is the total number of point clouds, p_minIs the minimum value in the point cloud coordinates x, y, z, and v is the size of the voxel grid;

for the voxel coordinate

Rounding to obtain the voxel grid number;

traversing the valid voxel coordinates

A valid voxel is obtained.

13. The feature extraction apparatus according to claim 10, wherein the first point cloud deep learning network includes:

14. The feature extraction device according to claim 13, wherein the feature processing module includes:

mlp submodule for performing dimension expansion on input data;

15. An apparatus, comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method recited by one or more of claims 1-9.

16. One or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method recited by one or more of claims 1-9.