CN113590770B

CN113590770B - Response method, device, equipment and storage medium based on point cloud data

Info

Publication number: CN113590770B
Application number: CN202010367528.8A
Authority: CN
Inventors: 李艳丽; 赫桂望; 蔡金华
Original assignee: Beijing Jingdong Qianshi Technology Co Ltd
Current assignee: Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2024-03-08
Anticipated expiration: 2040-04-30
Also published as: CN113590770A

Abstract

The embodiment of the invention discloses a response method, a device, equipment and a storage medium based on point cloud data. The method comprises the following steps: acquiring information to be responded and point cloud data corresponding to the information to be responded; inputting the information to be responded and the point cloud data corresponding to the information to be responded into a trained question-answering model to obtain an output result of the question-answering model, wherein the question-answering model comprises a point cloud feature extraction module and a response information generation module: and determining response information according to the output result and outputting the response information. According to the response method based on the point cloud data, the information contained in the point cloud data is mined through the trained question-answer model, and accurate response based on the point cloud data is achieved.

Description

Response method, device, equipment and storage medium based on point cloud data

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a response method, a device, equipment and a storage medium based on point cloud data.

Background

With the advancement of autopilot, robotics, city simulation and three-dimensional printing, intelligent understanding of point cloud data is becoming increasingly important. In the process of implementing the present invention, the inventor finds that at least the following technical problems exist in the prior art: how to accurately realize intelligent understanding and response of point cloud data, for example, a problem that a He Zhendui driver presents to a street view point cloud scanned by a vehicle-mounted system, "what is a pedestrian wearing red clothes under a traffic light? The accurate response is a technical problem to be solved urgently.

Disclosure of Invention

The embodiment of the invention provides a response method, a device, equipment and a storage medium based on point cloud data, so as to realize accurate response based on the point cloud data.

In a first aspect, an embodiment of the present invention provides a response method based on point cloud data, including:

acquiring information to be responded and point cloud data corresponding to the information to be responded;

inputting the information to be responded and the point cloud data corresponding to the information to be responded into a trained question-answer model to obtain an output result of the question-answer model, wherein the question-answer model comprises a point cloud feature extraction module and a response information generation module:

and determining response information according to the output result and outputting the response information.

In a second aspect, an embodiment of the present invention further provides a response apparatus based on point cloud data, including:

the information to be responded acquisition module is used for acquiring information to be responded and point cloud data corresponding to the information to be responded;

the output result acquisition module is used for inputting the information to be responded and the point cloud data corresponding to the information to be responded into the trained question-answer model to obtain the output result of the question-answer model, wherein the trained question-answer model comprises a point cloud feature extraction module and a response information generation module;

And the response information output module is used for determining and outputting response information according to the output result.

In a third aspect, an embodiment of the present invention further provides a computer apparatus, the apparatus including:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a point cloud data based reply method as provided by any embodiment of the present invention.

In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements a response method based on point cloud data as provided in any embodiment of the present invention.

According to the embodiment of the invention, the point cloud data corresponding to the information to be responded is obtained; inputting the information to be responded and the point cloud data corresponding to the information to be responded into a trained question-answering model to obtain an output result of the question-answering model, wherein the question-answering model comprises a point cloud feature extraction module and a response information generation module: and determining and outputting response information according to the output result, and mining information contained in the point cloud data through a trained question-answer model, so that accurate response based on the point cloud data is realized.

Drawings

Fig. 1a is a flowchart of a response method based on point cloud data according to an embodiment of the present invention;

fig. 1b is a schematic structural diagram of a point cloud feature extraction module according to an embodiment of the present invention;

FIG. 1c is a schematic diagram of a single frame feature extraction sub-module according to an embodiment of the invention;

FIG. 1d is a schematic diagram of an original feature extraction network according to an embodiment of the present invention;

FIG. 1e is a schematic diagram of another single frame feature extraction sub-module according to one embodiment of the invention;

fig. 1f is a schematic structural diagram of a response information generating module according to an embodiment of the present invention;

FIG. 1g is a schematic diagram of a question-answering model according to one embodiment of the present invention;

fig. 2 is a flowchart of a response method based on point cloud data according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a response device based on point cloud data according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all structures related to the present invention are shown in the drawings.

Example 1

Fig. 1a is a flowchart of a response method based on point cloud data according to an embodiment of the present invention. The embodiment is applicable to the situation when intelligent response is performed, and particularly applicable to the situation when intelligent response is performed based on the point cloud data. The method may be performed by a point cloud data based answering device, which may be implemented in software and/or hardware, e.g. may be configured in a computer device. As shown in fig. 1a, the method comprises:

s110, obtaining information to be responded and point cloud data corresponding to the information to be responded.

In this embodiment, the information to be responded and the method for acquiring the point cloud data corresponding to the information to be responded are not limited herein, and may be determined according to the response scenario of the intelligent response. For example, if the intelligent response is applied to the vehicle-mounted system, the information to be responded may be information input by voice of the driver. If the pedestrian wearing the red clothes under the traffic light is doing what, the point cloud data corresponding to the information to be responded may be street view point cloud data scanned by the vehicle-mounted system.

S120, inputting the information to be responded and point cloud data corresponding to the information to be responded into a trained question-answering model, and obtaining an output result of the question-answering model.

In this embodiment, the trained question-answering model includes a point cloud feature extraction module and a response information generation module, where the point cloud feature extraction module is used to extract point cloud features in the point cloud data, and the response information generation module is used to generate an output result according to the point cloud features and the information to be responded. Alternatively, the output result of the question-answer model may be a one-dimensional effective code corresponding to each word in the text library, or may be answer information in text form.

In one embodiment of the present invention, inputting information to be responded and point cloud data corresponding to the information to be responded into a trained question-answering model, and obtaining an output result of the question-answering model, including: inputting the information to be responded and the point cloud data corresponding to the information to be responded into a point cloud feature extraction module to obtain the point cloud features output by the point cloud feature extraction module; and inputting the point cloud characteristics and the information to be responded into a response information generation module to obtain an output result of the response information generation module. Optionally, in order to improve accuracy of the response information, the point cloud feature extraction module may extract features of the point cloud data in combination with the information to be responded, and extract point cloud features locally attached to a portion related to the information to be responded, so that the response information generation module responds to the information to be responded according to the point cloud features locally attached to the portion related to the information to be responded, and generates an output result. The extracting the characteristics of the point cloud data by combining the information to be responded and extracting the point cloud characteristics of the part which is locally attached to the information to be responded may be: and carrying out feature extraction on the single-frame point cloud data by combining the information to be responded aiming at each single-frame point cloud data in the point cloud data to obtain single-frame features, and obtaining the point cloud features corresponding to the point cloud data according to the single-frame features of each single-frame point cloud data.

In this embodiment, considering that the scale of the point cloud data is large, one organization method of the large-scale point cloud data is formed by splicing single-frame point cloud data on a time sequence, for example, the point clouds acquired by vehicle-mounted laser, space-time consistency relations exist among the single-frame point cloud data, a simple convolutional neural network is difficult to show the time consistency among the single-frame point cloud data, and a question-answer model is constructed based on the convolutional neural network to extract high-level features in the point cloud data formed by splicing the single-frame point cloud data, so that feature extraction of the point cloud data is more accurate. In one embodiment of the present invention, the point cloud data includes at least one single-frame point cloud data, the point cloud feature extraction module includes a cyclic sub-module and a plurality of single-frame feature extraction sub-modules, the cyclic sub-module includes a plurality of first cyclic neural networks connected in a chained manner, the single-frame feature extraction sub-module corresponds to the first cyclic neural networks one by one, the point cloud data corresponding to the information to be responded and the information to be responded is input into the point cloud feature extraction module, and the point cloud feature output by the point cloud feature extraction module is obtained, including: inputting the single-frame point cloud data and the information to be responded into a single-frame feature extraction sub-module aiming at each single-frame point cloud data to obtain single-frame features corresponding to the single-frame point cloud data output by the single-frame feature extraction sub-module; according to the connection sequence of the first cyclic neural networks, each first cyclic neural network is sequentially used as a current first cyclic neural network, the single frame characteristics output by a single frame characteristic extraction submodule corresponding to the current first cyclic neural network and the network extraction characteristics output by the previous first cyclic neural network of the current first cyclic neural network are input into the current first cyclic neural network, the network extraction characteristics output by the current first cyclic neural network are obtained, and the network extraction characteristics output by the last first cyclic neural network are used as point cloud characteristics.

Alternatively, the single-frame characteristics of each single-frame point cloud data can be extracted through the single-frame characteristic extraction submodule, and then the point cloud characteristics corresponding to the point cloud data are obtained through the circulation submodule according to the single-frame characteristics of each single-frame point cloud data. Each single-frame feature extraction sub-module can independently extract single-frame features in the corresponding single-frame point cloud data, and can also combine the single-frame point cloud data corresponding to other single-frame feature extraction sub-modules to perform feature extraction on the corresponding single-frame point cloud data so as to obtain the single-frame features of the corresponding single-frame point cloud data.

In one embodiment, according to the single-frame feature of each single-frame point cloud data, the obtaining, by the circulation sub-module, the point cloud feature corresponding to the point cloud data may be: according to the chained sequence of the first cyclic neural network, the single frame characteristics output by the single frame characteristic extraction submodule corresponding to the first cyclic neural network and the network output by the previous first cyclic neural network of the first cyclic neural network are sequentially processedAnd inputting the network extraction features into the first cyclic neural network, and taking the network extraction features output by the last first cyclic neural network as the point cloud features of the point cloud data. In particular, the above-described process may be implemented by a first recurrent neural network of chained connections. Fig. 1b is a schematic structural diagram of a point cloud feature extraction module according to an embodiment of the present invention, where the point cloud feature extraction module includes a cyclic sub-module 120 and a plurality of single-frame feature extraction sub-modules (single-frame feature extraction sub-modules 111, 112, … …, and 113), each of which is used for extracting single-frame features of corresponding single-frame point cloud data, and the cyclic sub-module 120 is composed of a plurality of first cyclic neural networks (first cyclic neural network 121, first cyclic neural network 122, … …, and first cyclic neural network 123) connected in a chained manner, where the single-frame feature extraction sub-module 111 corresponds to the first cyclic neural network 121, the single-frame feature extraction sub-module 112 corresponds to the first cyclic neural network 122, … …, the single-frame feature extraction sub-module 113 corresponds to the first cyclic neural network 123, and the first cyclic neural network 121 is based on the single-frame features extracted by the single-frame feature extraction sub-module 111 and h ₀ Obtaining network extraction characteristics h corresponding to the first recurrent neural network 121 ₁ And outputs it to the first recurrent neural network 122, the first recurrent neural network 122 extracts the single-frame features extracted by the sub-module 112 and h according to the single-frame features ₁ Obtaining network extraction characteristics h corresponding to the first cyclic neural network 122 ₂ And outputs it to the next first recurrent neural network until the last first recurrent neural network 123 extracts the feature h according to the network output by the last first recurrent neural network _T-1 And the single-frame features extracted by the corresponding single-frame feature extraction submodule 113 obtain network extraction features h corresponding to the first recurrent neural network 123 _T And takes the point cloud as the point cloud characteristics of the point cloud data. The first recurrent neural network may be a gate recurrent neural network (Gated Recurrent Unit, GRU), a Long short-term memory (LSTM), various multi-layer multi-directional recurrent neural networks, and the like.

In this embodiment, the extraction of the single frame feature locally focused on the portion related to the information to be responded can be achieved by the attention mechanism. In one embodiment, the single-frame feature extraction submodule includes a first problem feature extraction network, an attention module and an original feature extraction network, inputs single-frame point cloud data and information to be responded into the single-frame feature extraction submodule, and obtains single-frame features corresponding to the single-frame point cloud data output by the single-frame feature extraction submodule, including: inputting the information to be responded into a first problem feature extraction network to obtain a first problem feature output by the first problem feature extraction network; inputting the single-frame point cloud data into an original feature extraction network to obtain original features of the single-frame point cloud data output by the original feature extraction network; the first question feature and the original feature are input into an attention module, and the single-frame feature output by the attention module is obtained.

Fig. 1c is a schematic structural diagram of a single frame feature extraction submodule according to an embodiment of the present invention. In the figure, the solid line boxes represent the network layer, and the dashed line boxes represent the data layer. As shown in fig. 1c, the single-frame feature extraction submodule includes a first problem feature extraction network 1111, an attention module 1113 and an original feature extraction network 1112, the original feature extraction network 1112 extracts an original feature (1×2048) in the single-frame point cloud data (n1×7), the first problem feature extraction network 1111 extracts a first problem feature (1×1024) in the information to be responded, and the attention module 1113 performs weighted transformation on the original feature (1×2048) according to the first problem feature to obtain the single-frame feature (1×2048) locally attached to the part related to the information to be responded.

Alternatively, the primitive feature extraction network 1112 may be a spatial convolution module PointNet, which includes three types of neurons T-Net, matrix multiplexing and mlp. The T-Net is a feature conversion unit, so that geometric transformation (3 x3 transformation) and feature transformation (64 x64 transformation) matrixes of input data can be learned, matrix multiplexing is matrix cross multiplication operation, and invariance of a model to specific space conversion is guaranteed by combining the T-Net with the matrix multiplexing; mlp is Multi-Layer Perceptron, mlp (64, 64) is two-Layer Perceptron units 3x64, 64x64, mlp (64,128,1024) is three-Layer Perceptron units 64x64, 64x128, 128x1024, each Layer of Perceptron units is convolution operation, shared weights are independently applied to each point cloud point, for example, the Perceptron units 3x64 and the data Layer 1x3 are convolution operation to obtain the data Layer 1x64, and the Perceptron units 128x1024 and the data Layer 1x128 are convolution output 1x1024. In this embodiment, the point cloud data may be 7-dimensional data (X, Y, Z, I, R, G, B), where (X, Y, Z) is a spatial coordinate, I is an intensity, and (R, G, B) is a color. The point cloud rolling operation compatible with the multidimensional features is realized by correspondingly splitting and fusing the PointNet aiming at the splitting of the point cloud data of the 7-dimensional features (x, y, z, I, R, G and B). Fig. 1d is a schematic diagram of an original feature extraction network according to an embodiment of the invention. As shown in fig. 1d, the original feature extraction network 1112 includes a data segmentation module 11121, two spatial convolution modules 11122 and a data fusion module 11123, where the data segmentation module 11121 is configured to segment the point cloud data n×7 into two parts of segmented point cloud data (x, y, z) n×3 and (I, R, G, B) n×4 according to channels, perform feature extraction on each part of segmented point cloud data by using its corresponding spatial convolution module pair, to obtain its corresponding global feature 1x1024, and then concatenate the global features 1x1024 extracted by the two spatial convolution modules into a single frame feature of 1x2048 by the data fusion module.

In one embodiment of the present invention, the first question feature extraction network is a second recurrent neural network, and the information to be responded is input into the first question feature extraction network, so as to obtain the first question feature output by the first question feature extraction network, including: according to the word order of the information to be responded, each character in the information to be responded is sequentially used as a current character, character features corresponding to the current character and characters before the current character are input into the second cyclic neural network, character features corresponding to the current character output by the second cyclic neural network are obtained, and character features corresponding to the last character are used as first problem features.

In order to be able to mine out the time sequence relation in the problem information, a recurrent neural network is used as a first problem feature extraction network that extracts the first problem feature. Specifically, each character in the information to be responded is sequentially used as a current character according to the word order, character features corresponding to the current character and characters corresponding to the characters before the current character are used as input of a first problem feature extraction network, and character features corresponding to the current character are obtained until character features corresponding to the last character are used as first problem features. For example, assuming that the information to be responded is what is done by a person below the traffic light, firstly inputting red into the second cyclic neural network to obtain a character feature of red output by the second cyclic neural network, then inputting the character feature of red and green into the second cyclic neural network to obtain a character feature of green output by the second cyclic neural network until inputting the character feature of assorted and what into the second cyclic neural network to obtain a character feature of what output by the second cyclic neural network as the first problem feature. Alternatively, the second recurrent neural network may be a GRU, LSTM, various multi-layer multidirectional recurrent neural networks, and the like.

In this embodiment, the original features of the single-frame point cloud data may be weighted and transformed by the attention mechanisms in space and time, respectively, to obtain the single-frame features that are locally attached to the information to be responded in both time and space. In one embodiment, the attention module includes a spatial attention module and a temporal attention module, the first problem feature and the original feature are input into the attention module, and a single frame feature output by the attention module is obtained, including: inputting the first problem feature and the original feature into a spatial attention module to obtain a spatial weighting feature output by the spatial attention module; and inputting the first problem feature and the space weighting feature output by the space attention module in each single-frame feature extraction sub-module into the time attention module to obtain the single-frame feature output by the time attention module.

Fig. 1e is a schematic structural diagram of yet another single-frame feature extraction submodule according to an embodiment of the present invention. In the figure, the solid line boxes represent the network layer, and the dashed line boxes represent the data layer. Fig. 1e, with respect to fig. 1c, embodies the attention module 1113 as a spatial attention module 1114 and a temporal attention module 1115. Specifically, the spatial attention module performs weighted transformation on the original feature (1×2048) extracted by the original feature extraction module 1112 according to the first problem feature extracted by the first problem feature extraction network 1111, outputs a spatial weighted feature (1×2048), and the temporal attention module 1115 performs weighted transformation on the spatial weighted feature (1×2048) output by the spatial attention module in each single-frame feature extraction sub-module according to the first problem feature (1×1024) extracted by the first problem feature extraction network (the spatial attention module in other single-frame feature extraction sub-modules are not shown in the figure), where both output time and space are locally paid attention to the single-frame feature (1×2048) of the information to be responded.

When the output result is generated, the response information generating module integrates the second problem feature and the point cloud feature by extracting the second problem feature in the information to be responded, and obtains the output result according to the integrated feature. Optionally, the response information generating module includes a second problem feature extraction network, a data integration network, and a third cyclic neural network, and inputs the point cloud feature and the information to be responded to the point cloud feature into the response information generating module, to obtain an output result of the response information generating module, including: inputting the information to be responded into a second question feature extraction network to obtain second question features corresponding to the information to be responded output by the second question feature extraction network; inputting the second problem feature and the point cloud feature into a data integration network to obtain a data integration feature output by the data integration network; and inputting the data integration characteristic into a third cyclic neural network to obtain an output result output by the third cyclic neural network.

Fig. 1f is a schematic structural diagram of a response information generating module according to an embodiment of the present invention. As shown in fig. 1f, the response information generating module includes a second problem feature extracting network 131, a data integrating network 132 and a third recurrent neural network 133, the second problem feature extracting network 131 extracts a second problem feature (1×1024) in the information to be responded, the data integrating network 132 performs data integration on the second problem feature (1×1024) and a point cloud feature (1×2048) output by the point cloud feature extracting module, so as to obtain a data integration feature, and inputs the data integration feature into the third recurrent neural network 133, and the third recurrent neural network 133 is used for generating and outputting an output result according to the data integration feature.

The data integration network may obtain the data integration feature by performing weighted summation on the second problem feature and the point cloud feature. The data integration network may be exemplified by m=w _Q Q+w _h h _T Obtaining a data integration feature, wherein M is the data integration feature, Q is the second problem feature, h _T For point cloud features, w _Q Weights, w, being characteristic of the second problem _h Is the weight of the point cloud feature. The third recurrent neural network may be GRU, LSTM, various multi-layer multidirectional recurrent neural networks, etc.

In order to make the output result have time sequence and more accurate, a cyclic neural network is also used as the output result. In one embodiment, inputting the data integration feature into the third recurrent neural network to obtain an output result output by the third recurrent neural network, including: inputting the data integration characteristic into a third cyclic neural network to obtain a current prediction response character output by the third cyclic neural network, inputting the data integration characteristic and the current prediction response character into the third cyclic neural network to obtain a next prediction response character output by the third cyclic neural network until all the prediction response characters output by the third cyclic neural network are obtained, and sequentially splicing all the prediction response characters to generate an output result.

The method includes the steps that firstly, data integration features are input into a third cyclic neural network, the third neural network obtains a first predictive response character in an output result according to the input data integration features, then the first predictive response character and the data integration features are input into the third cyclic neural network again, a second predictive response character output by the third cyclic neural network is obtained, until the third cyclic neural network outputs a last predictive response character, and all the predictive response characters are sequentially connected to obtain the output result.

FIG. 1g is a schematic illustration of yet another question-answering model according to one embodiment of the present inventionIs a schematic structural diagram of the (c). As shown in fig. 1g, the question-answering model includes an encoding portion and a decoding portion, the encoding portion is composed of a chained first cyclic neural network and a plurality of single-frame feature extraction sub-modules, each of which in turn includes a first question feature extraction network, an original feature extraction module, a spatial attention module, and a temporal attention module. The first cyclic neural network outputs coding features h according to the features extracted by the single-frame feature extraction submodule _T And inputs it to the decoding section. The decoding part consists of a second problem feature extraction network, a data integration network and a third cyclic neural network, wherein the data integration network encodes the feature h _T And the second problem feature extracted by the second problem feature extraction network is subjected to data integration, the data integration feature is output, and the third cyclic neural network obtains the output feature according to the data integration feature.

S130, determining response information according to the output result and outputting the response information.

In this embodiment, the output result of the question-answer model may be a one-dimensional effective code corresponding to each word in the text library, or may be answer information in text form. And assuming that the output result of the question-answer model is one-dimensional effective codes corresponding to each word in the text library, forming answer information in a text form according to the one-dimensional effective codes of each word, and outputting the answer information in the text form. If the question-answering model outputs the answer information in the form of text, the answer information in the form of text can be directly output. The output mode of the response information is not limited herein, and alternatively, the response information may be output by a voice broadcast mode or output by a text display mode.

According to the embodiment of the invention, the information to be responded is obtained, and the point cloud data corresponding to the information to be responded is obtained; inputting the information to be responded and the point cloud data corresponding to the information to be responded into a trained question-answering model to obtain an output result of the question-answering model, wherein the question-answering model comprises a point cloud feature extraction module and a response information generation module: and determining and outputting response information according to the output result, and mining information contained in the point cloud data through a trained question-answer model, so that accurate response based on the point cloud data is realized.

Example two

Fig. 2 is a flowchart of a response method based on point cloud data according to a second embodiment of the present invention. The embodiment adds the operation of training the question-answering model on the basis of the scheme. As shown in fig. 2, the method includes:

s210, sample point cloud data, sample problem information corresponding to the sample point cloud data and sample response information corresponding to the sample problem information are obtained.

In this embodiment, in order to train a question-answer model based on point cloud data, a large number of labeled point cloud data needs to be acquired as training samples of the model. Optionally, the point cloud data can be obtained in different modes, and the obtained point cloud data is manually marked to obtain marked point cloud data. Exemplary, the sample point cloud data, sample problem information corresponding to the sample point cloud data, and sample response information corresponding to the sample problem information may include, but are not limited to: (1) The live-action acquisition point cloud can acquire point cloud data by using Kinect equipment or a laser scanner as sample point cloud data, and then annotate text questions and answers under a fragment key frame in the sample point cloud data by a manual annotation mode to serve as sample problem information corresponding to the sample point cloud data and sample answer information corresponding to the sample problem information; (2) The virtual acquisition point cloud can set an acquisition track in a simulation environment, acquire key frame point clouds at certain track points, conduct text question answering on the segment key frame point clouds, and take the text question answering of the segment key frame point clouds as sample question information corresponding to sample point cloud data and sample response information corresponding to the sample question information.

S220, training sample data is generated based on the sample point cloud data, sample problem information corresponding to the sample point cloud data and sample response information corresponding to the sample problem information.

After sample point cloud data, sample problem information corresponding to the sample point cloud data and sample response information corresponding to the sample problem information are obtained, training sample data is generated based on the sample point cloud data, the sample problem information corresponding to the sample point cloud data and the sample response information corresponding to the sample problem information. Optionally, data processing operations such as format conversion may be performed on the sample point cloud data, sample problem information corresponding to the sample point cloud data, and part of data in sample response information corresponding to the sample problem information, and the sample point cloud data, the sample problem information corresponding to the sample point cloud data, and the sample response information corresponding to the sample problem information after the data processing are used as training sample data. For example, the sample reply information may be converted into a code corresponding to a word in a word stock in the text stock. For example, assuming that the text base is a fixed word base, for example, the text base includes 1024 words { W1, W2, …, W1024}, the sample response information is a word sequence { L1, L2, …, LN }, each word may be encoded by One-dimensional efficient Encoding (One-Hot Encoding), i.e., for the i-th word li=wj= [0, …,1, …,0] the j-th bit is encoded as 1, and the other bit is encoded as 0. And obtaining one-dimensional effective codes of the sample response information corresponding to each word, using the one-dimensional effective codes of the sample response information corresponding to each word as sample response information after data processing, and using the sample response information, sample point cloud data and sample question information as training sample data.

S230, training the pre-constructed question-answer model by using training sample data to obtain a trained question-answer model.

After sample training data are obtained, training a pre-constructed question-answer model by using the training sample data to obtain a trained question-answer model. When training the question-answer model by using training sample data, after the question-answer model generates the predicted response information corresponding to the sample question information, determining a loss value according to the predicted response information and the sample response information corresponding to the sample question information, and training the question-answer model by taking the loss value as a target and taking convergence conditions as the target to obtain the trained question-answer model. Alternatively, the loss value satisfying the convergence condition may be that the difference between two adjacent loss values is smaller than a set threshold, or the number of iterations reaches a set target number of iterations.

S240, obtaining the information to be responded and the point cloud data corresponding to the information to be responded.

S250, inputting the information to be responded and point cloud data corresponding to the information to be responded into the trained question-answering model, and obtaining an output result of the question-answering model.

And S260, determining response information according to the output result and outputting the response information.

According to the embodiment of the invention, the sample point cloud data, the sample problem information corresponding to the sample point cloud data and the sample response information corresponding to the sample problem information are obtained; generating training sample data based on the sample point cloud data, sample problem information corresponding to the sample point cloud data and response information corresponding to the sample problem information; training a question-answer model which is built on the basis of the cyclic neural network in advance by using training sample data to obtain a trained question-answer model, and obtaining timing information contained in the point cloud data through the question-answer model which is built on the basis of the cyclic neural network to realize accurate response based on the point cloud data.

Example III

Fig. 3 is a schematic structural diagram of a response device based on point cloud data according to a third embodiment of the present invention. The answering device based on the point cloud data can be implemented in a software and/or hardware mode, for example, the answering device based on the point cloud data can be configured in computer equipment. As shown in fig. 3, the apparatus includes a to-be-responded information acquisition module 310, an output result acquisition module 320, and a reply information output module 330, wherein:

the to-be-responded information obtaining module 310 is configured to obtain to-be-responded information and point cloud data corresponding to the to-be-responded information;

the output result obtaining module 320 is configured to input to-be-answered information and point cloud data corresponding to the to-be-answered information into a trained question-answering model, to obtain an output result of the question-answering model, where the trained question-answering model includes a point cloud feature extraction module and a response information generation module:

and the response information output module 330 is configured to determine and output response information according to the output result.

According to the embodiment of the invention, the information to be responded and the point cloud data corresponding to the information to be responded are acquired through the information to be responded acquisition module; the output result acquisition module inputs the information to be responded and the point cloud data corresponding to the information to be responded into the trained question-answering model to obtain an output result of the question-answering model, wherein the question-answering model comprises a point cloud feature extraction module and a response information generation module: and the response information output module determines and outputs response information according to the output result, and the information contained in the point cloud data is mined through the trained question-answer model, so that accurate response based on the point cloud data is realized.

Optionally, on the basis of the above scheme, the output result obtaining module 320 includes:

the point cloud feature extraction unit is used for inputting the information to be responded and the point cloud data corresponding to the information to be responded into the point cloud feature extraction module to obtain the point cloud features output by the point cloud feature extraction module;

and the output result generating unit is used for inputting the point cloud characteristics and the information to be responded into the response information generating module to obtain an output result of the response information generating module.

Optionally, on the basis of the above scheme, the point cloud data includes at least one single-frame point cloud data, the point cloud feature extraction module includes a cyclic submodule and a plurality of single-frame feature extraction submodules, the cyclic submodule includes a plurality of first cyclic neural networks connected in a chained manner, the single-frame feature extraction submodule corresponds to the first cyclic neural networks one by one, and the point cloud feature extraction unit includes:

the single-frame characteristic sub-unit is used for inputting the single-frame point cloud data and the information to be responded into the single-frame characteristic extraction sub-module aiming at each single-frame point cloud data to obtain single-frame characteristics corresponding to the single-frame point cloud data output by the single-frame characteristic extraction sub-module;

the point cloud feature subunit is configured to sequentially take each first cyclic neural network as a current first cyclic neural network according to a connection sequence of the first cyclic neural networks, input a single frame feature output by a single frame feature extraction submodule corresponding to the current first cyclic neural network and a network extraction feature output by a previous first cyclic neural network of the current first cyclic neural network into the current first cyclic neural network, obtain a network extraction feature output by the current first cyclic neural network, and take a network extraction feature output by a last first cyclic neural network as a point cloud feature.

Optionally, on the basis of the above solution, the single-frame feature extraction sub-module includes a first problem feature extraction network, an attention module, and an original feature extraction network, where the single-frame feature sub-unit is specifically configured to:

inputting information to be responded to a first problem feature extraction network to obtain a first problem feature output by the first problem feature extraction network;

inputting the single-frame point cloud data into an original feature extraction network to obtain original features of the single-frame point cloud data output by the original feature extraction network;

and inputting the first problem feature and the original feature into the attention module to obtain a single frame feature output by the attention module.

Optionally, based on the above scheme, the first problem feature extraction network is a second recurrent neural network, and the single-frame feature subunit is specifically configured to:

according to the word order of the sample problem information, each character in the sample problem information is sequentially used as a current character, the current character and the character characteristic corresponding to the character before the current character are input into the second cyclic neural network, the character characteristic corresponding to the current character output by the second cyclic neural network is obtained, and the character characteristic corresponding to the last character is used as a first problem characteristic.

Optionally, on the basis of the above scheme, the attention module includes a spatial attention module and a temporal attention module, and the single-frame feature subunit is specifically configured to:

inputting the first problem feature and the original feature into a spatial attention module to obtain a spatial weighting feature output by the spatial attention module;

and inputting the first problem feature and the space weighting feature output by the space attention module in each single-frame feature extraction sub-module into the time attention module to obtain the single-frame feature output by the time attention module.

Optionally, on the basis of the above solution, the answer information generating module includes a second problem feature extraction network, a data integration network, and a third recurrent neural network, and the output result generating unit includes:

the second problem feature subunit is used for inputting the information to be responded into the second problem feature extraction network to obtain a second problem feature corresponding to the information to be responded output by the second problem feature extraction network;

the integrated feature subunit is used for inputting the second problem feature and the point cloud feature into the data integration network to obtain the data integration feature output by the data integration network;

and the output result subunit is used for inputting the data integration characteristic into the third cyclic neural network to obtain an output result output by the third cyclic neural network.

Optionally, based on the above scheme, the output result subunit is specifically configured to:

inputting the data integration characteristic into a third cyclic neural network to obtain a current prediction response character output by the third cyclic neural network, inputting the data integration characteristic and the current prediction response character into the third cyclic neural network to obtain a next prediction response character output by the third cyclic neural network until all the prediction response characters output by the third cyclic neural network are obtained, and sequentially splicing all the prediction response characters to generate an output result.

Optionally, on the basis of the above scheme, the device further includes:

the model training module is used for acquiring sample point cloud data, sample question information corresponding to the sample point cloud data and sample response information corresponding to the sample question information before the information to be responded and the point cloud data corresponding to the information to be responded are input into the trained question-answer model;

generating training sample data based on the sample point cloud data, sample problem information corresponding to the sample point cloud data and sample response information corresponding to the sample problem information;

training a pre-constructed question-answer model by using training sample data to obtain a trained question-answer model.

The response device based on the point cloud data provided by the embodiment of the invention can execute the response method based on the point cloud data provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example IV

Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. Fig. 4 illustrates a block diagram of an exemplary computer device 412 suitable for use in implementing embodiments of the invention. The computer device 412 shown in fig. 4 is only an example and should not be construed as limiting the functionality and scope of use of embodiments of the invention.

As shown in FIG. 4, computer device 412 is in the form of a general purpose computing device. Components of computer device 412 may include, but are not limited to: one or more processors 416, a system memory 428, and a bus 418 that connects the various system components (including the system memory 428 and the processors 416).

Bus 418 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor 416, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 412 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 412 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 428 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 430 and/or cache memory 432. The computer device 412 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage 434 may be used to read from or write to non-removable, non-volatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard disk drive"). Although not shown in fig. 4, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 418 via one or more data medium interfaces. The memory 428 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the invention.

A program/utility 440 having a set (at least one) of program modules 442 may be stored in, for example, memory 428, such program modules 442 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 442 generally perform the functions and/or methodologies in the described embodiments of the invention.

The computer device 412 may also communicate with one or more external devices 414 (e.g., keyboard, pointing device, display 424, etc.), one or more devices that enable a user to interact with the computer device 412, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 412 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 422. Moreover, computer device 412 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 420. As shown, network adapter 420 communicates with other modules of computer device 412 over bus 418. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computer device 412, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processor 416 executes various functional applications and data processing by running programs stored in the system memory 428, for example, to implement a response method based on point cloud data provided by an embodiment of the present invention, and the method includes:

Of course, it can be understood by those skilled in the art that the processor may also implement the technical solution of the response method based on the point cloud data provided by any embodiment of the present invention. In addition, the method for answering by using the trained question-answer model and the method for training the question-answer model in the answer method based on the point cloud data provided by any embodiment of the invention can be applied to the same computer equipment and also can be applied to different computer equipment.

Example five

The fifth embodiment of the present invention further provides a computer readable storage medium having a computer program stored thereon, the program when executed by a processor implementing the response method based on point cloud data provided by the embodiment of the present invention, the method comprising:

Of course, the computer readable storage medium provided by the embodiments of the present invention, on which the computer program stored is not limited to the above-described method operations, but may also perform the relevant operations of the response method based on the point cloud data provided by any embodiment of the present invention.

The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of numerous obvious changes, rearrangements and substitutions without departing from the scope of the present invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. The response method based on the point cloud data is characterized by comprising the following steps of:

inputting the information to be responded and the point cloud data corresponding to the information to be responded into a trained question-answering model to obtain an output result of the question-answering model, wherein the question-answering model comprises a point cloud feature extraction module and a response information generation module:

determining response information according to the output result and outputting the response information;

the point cloud data comprises at least one single-frame point cloud data, the point cloud feature extraction module comprises a circulation sub-module and a plurality of single-frame feature extraction sub-modules, the circulation sub-module comprises a plurality of first circulation neural networks connected in a chained mode, the single-frame feature extraction sub-module corresponds to the first circulation neural networks one by one, the single-frame feature extraction sub-module comprises a first problem feature extraction network, an attention module and an original feature extraction network, the attention module comprises a spatial attention module and a time attention module, the point cloud data corresponding to information to be responded and the information to be responded are input into a trained question-answer model, and an output result of the question-answer model is obtained, and the method comprises the following steps:

Inputting the information to be responded to into the first problem feature extraction network aiming at each single-frame point cloud data to obtain a first problem feature output by the first problem feature extraction network, inputting the single-frame point cloud data into the original feature extraction network to obtain an original feature of the single-frame point cloud data output by the original feature extraction network, inputting the first problem feature and the original feature into the spatial attention module to obtain a spatial weighting feature output by the spatial attention module, and inputting the first problem feature and the spatial weighting feature output by the spatial attention module in each single-frame feature extraction sub-module into the time attention module to obtain a single-frame feature output by the time attention module;

according to the connection sequence of the first cyclic neural networks, each first cyclic neural network is sequentially used as a current first cyclic neural network, the single frame characteristics output by the single frame characteristic extraction sub-module corresponding to the current first cyclic neural network and the network extraction characteristics output by the first cyclic neural network before the current first cyclic neural network are input into the current first cyclic neural network, the network extraction characteristics output by the current first cyclic neural network are obtained, and the network extraction characteristics output by the last first cyclic neural network are used as point cloud characteristics;

And inputting the point cloud characteristics and the information to be responded to the response information generation module to obtain the output result of the response information generation module.

2. The method of claim 1, wherein the first question feature extraction network is a second recurrent neural network, the information to be responded to is input into the first question feature extraction network, and obtaining a first question feature output by the first question feature extraction network comprises:

according to the word order of the information to be responded, each character in the information to be responded is sequentially used as a current character, character features corresponding to the current character and characters corresponding to the characters before the current character are input into the second cyclic neural network, character features corresponding to the current character output by the second cyclic neural network are obtained, and character features corresponding to the last character are used as the first problem features.

3. The method according to claim 1, wherein the response information generating module includes a second problem feature extraction network, a data integration network, and a third recurrent neural network, the inputting the point cloud feature and the information to be responded to into the response information generating module, obtaining the output result of the response information generating module includes:

Inputting the information to be responded to the second problem feature extraction network to obtain a second problem feature corresponding to the information to be responded, which is output by the second problem feature extraction network;

inputting the second problem feature and the point cloud feature into the data integration network to obtain a data integration feature output by the data integration network;

and inputting the data integration characteristic into the third cyclic neural network to obtain the output result output by the third cyclic neural network.

4. A method according to claim 3, wherein said inputting the data integration feature into the third recurrent neural network to obtain the output result of the third recurrent neural network output comprises:

and inputting the data integration characteristic into the third cyclic neural network to obtain a current prediction response character output by the third cyclic neural network, inputting the data integration characteristic and the current prediction response character into the third cyclic neural network to obtain a next prediction response character output by the third cyclic neural network until all the prediction response characters output by the third cyclic neural network are obtained, and sequentially splicing the prediction response characters to generate the output result.

5. The method according to claim 1, further comprising, before inputting the information to be responded and the point cloud data corresponding to the information to be responded into the trained question-answering model:

acquiring sample point cloud data, sample problem information corresponding to the sample point cloud data and sample response information corresponding to the sample problem information;

and training the pre-constructed question-answering model by using the training sample data to obtain a trained question-answering model.

6. A point cloud data based answering device, comprising:

the device comprises a to-be-responded information acquisition module, a response information processing module and a response information processing module, wherein the to-be-responded information acquisition module is used for acquiring to-be-responded information and point cloud data corresponding to the to-be-responded information;

the output result acquisition module is used for inputting the information to be responded and the point cloud data corresponding to the information to be responded into the trained question-answering model to obtain an output result of the question-answering model, wherein the trained question-answering model comprises a point cloud feature extraction module and a response information generation module:

The response information output module is used for determining response information according to the output result and outputting the response information;

the point cloud data comprises at least one single-frame point cloud data, the point cloud feature extraction module comprises a circulation sub-module and a plurality of single-frame feature extraction sub-modules, the circulation sub-module comprises a plurality of first circulation neural networks connected in a chained mode, the single-frame feature extraction sub-modules are in one-to-one correspondence with the first circulation neural networks, the single-frame feature extraction sub-module comprises a first problem feature extraction network, an attention module and an original feature extraction network, the attention module comprises a space attention module and a time attention module, and the output result acquisition module is specifically used for:

7. A computer device, the device comprising:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the point cloud data based reply method of any of claims 1-5.

8. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a point cloud data based response method according to any one of claims 1-5.