CN113590770A

CN113590770A - Point cloud data-based response method, device, equipment and storage medium

Info

Publication number: CN113590770A
Application number: CN202010367528.8A
Authority: CN
Inventors: 李艳丽; 赫桂望; 蔡金华
Original assignee: Beijing Jingdong Qianshi Technology Co Ltd
Current assignee: Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2021-11-02
Anticipated expiration: 2040-04-30
Also published as: CN113590770B

Abstract

The embodiment of the invention discloses a response method, a response device, response equipment and a storage medium based on point cloud data. The method comprises the following steps: acquiring information to be responded and point cloud data corresponding to the information to be responded; inputting the information to be responded and the point cloud data corresponding to the information to be responded into a trained question-answer model to obtain an output result of the question-answer model, wherein the question-answer model comprises a point cloud feature extraction module and an answer information generation module: and determining response information according to the output result and outputting the response information. According to the response method based on the point cloud data, provided by the embodiment of the invention, the information contained in the point cloud data is mined through the trained question-answer model, so that accurate response based on the point cloud data is realized.

Description

Point cloud data-based response method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a point cloud data-based response method, a point cloud data-based response device, a point cloud data-based response equipment and a point cloud data-based storage medium.

Background

With the advance of automatic driving, robots, city simulation and three-dimensional printing, intelligent understanding of point cloud data becomes more and more important. In the process of implementing the invention, the inventor finds that at least the following technical problems exist in the prior art: how to accurately achieve intelligent understanding and response to point cloud data, for example, how to address the problem posed by drivers of street view point clouds scanned by vehicle-mounted systems "what are red-dressed pedestrians under traffic lights? "to make accurate response" is a technical problem to be solved urgently.

Disclosure of Invention

The embodiment of the invention provides a response method, a response device, response equipment and a storage medium based on point cloud data, so as to realize accurate response based on the point cloud data.

In a first aspect, an embodiment of the present invention provides a response method based on point cloud data, including:

acquiring information to be responded and point cloud data corresponding to the information to be responded;

inputting the information to be responded and the point cloud data corresponding to the information to be responded into a trained question-answer model to obtain an output result of the question-answer model, wherein the question-answer model comprises a point cloud feature extraction module and a response information generation module:

and determining response information according to the output result and outputting the response information.

In a second aspect, an embodiment of the present invention further provides a response apparatus based on point cloud data, including:

the system comprises a to-be-responded information acquisition module, a to-be-responded information acquisition module and a point cloud data acquisition module, wherein the to-be-responded information acquisition module is used for acquiring to-be-responded information and point cloud data corresponding to the to-be-responded information;

the system comprises an output result acquisition module, a question-answer model generation module and a question-answer model generation module, wherein the output result acquisition module is used for inputting information to be answered and point cloud data corresponding to the information to be answered into the trained question-answer model to acquire an output result of the question-answer model, and the trained question-answer model comprises a point cloud feature extraction module and a response information generation module;

and the response information output module is used for determining and outputting the response information according to the output result.

In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:

one or more processors;

storage means for storing one or more programs;

when executed by one or more processors, cause the one or more processors to implement a point cloud data-based answering method as provided by any of the embodiments of the invention.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the point cloud data-based answering method provided in any embodiment of the present invention.

According to the embodiment of the invention, information to be responded and point cloud data corresponding to the information to be responded are obtained; inputting the information to be responded and the point cloud data corresponding to the information to be responded into a trained question-answer model to obtain an output result of the question-answer model, wherein the question-answer model comprises a point cloud feature extraction module and a response information generation module: and determining response information according to the output result and outputting the response information, and mining information contained in the point cloud data through the trained question-answer model to realize accurate response based on the point cloud data.

Drawings

Fig. 1a is a flowchart of a point cloud data-based response method according to an embodiment of the present invention;

fig. 1b is a schematic structural diagram of a point cloud feature extraction module according to a first embodiment of the present invention;

fig. 1c is a schematic structural diagram of a single-frame feature extraction sub-module according to a first embodiment of the present invention;

fig. 1d is a schematic structural diagram of an original feature extraction network according to an embodiment of the present invention;

fig. 1e is a schematic structural diagram of another single-frame feature extraction sub-module according to a first embodiment of the present invention;

fig. 1f is a schematic structural diagram of a response message generation module according to a first embodiment of the present invention;

fig. 1g is a schematic structural diagram of another question-answering model provided in an embodiment of the present invention;

fig. 2 is a flowchart of a response method based on point cloud data according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a response apparatus based on point cloud data according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It is to be further noted that, for the convenience of description, only a part of the structure relating to the present invention is shown in the drawings, not the whole structure.

Example one

Fig. 1a is a flowchart of a response method based on point cloud data according to an embodiment of the present invention. The embodiment is applicable to the situation when performing intelligent response, and particularly applicable to the situation when performing intelligent response based on point cloud data. The method can be performed by a point cloud data-based answering device, which can be implemented in software and/or hardware, for example, which can be configured in a computer device. As shown in fig. 1a, the method comprises:

s110, information to be responded and point cloud data corresponding to the information to be responded are obtained.

In this embodiment, the information to be responded and the manner of acquiring the point cloud data corresponding to the information to be responded are not limited herein, and may be determined according to the response scene of the intelligent response. For example, if the smart response is applied to the vehicle-mounted system, the information to be responded may be information input by a voice of the driver. For example, "what the pedestrian wearing red clothes under the traffic light is doing", the point cloud data corresponding to the information to be responded may be street view point cloud data scanned by the vehicle-mounted system.

And S120, inputting the information to be responded and the point cloud data corresponding to the information to be responded into a trained question-answer model to obtain an output result of the question-answer model.

In this embodiment, the trained question-answering model includes a point cloud feature extraction module and a response information generation module, the point cloud feature extraction module is configured to extract point cloud features in the point cloud data, and the response information generation module is configured to generate an output result according to the point cloud features and information to be responded. Optionally, the output result of the question-answering model may be a one-dimensional effective code corresponding to each word in the text library, and may also be response information in the form of characters.

In an embodiment of the present invention, inputting information to be answered and point cloud data corresponding to the information to be answered into a trained question-answer model, and obtaining an output result of the question-answer model, the method includes: inputting information to be responded and point cloud data corresponding to the information to be responded into a point cloud feature extraction module to obtain point cloud features output by the point cloud feature extraction module; and inputting the point cloud characteristics and the information to be responded into a response information generation module to obtain an output result of the response information generation module. Optionally, in order to improve the accuracy of the response information, the point cloud feature extraction module may extract the feature of the point cloud data in combination with the information to be responded, and extract the point cloud feature that locally attaches importance to the part related to the information to be responded, so that the response information generation module responds to the information to be responded according to the point cloud feature that locally attaches importance to the part related to the information to be responded, and generates an output result. The extraction of the point cloud data features by combining the information to be responded and the extraction of the point cloud features which locally attach importance to the relevant part of the information to be responded can be as follows: and performing feature extraction on the single-frame point cloud data by combining to-be-responded information aiming at each single-frame point cloud data in the point cloud data to obtain a single-frame feature, and obtaining a point cloud feature corresponding to the point cloud data according to the single-frame feature of each single-frame point cloud data.

In this embodiment, considering that the point cloud data has a large scale, a method for organizing large-scale point cloud data is formed by splicing single-frame point cloud data in a time sequence, such as point clouds acquired by a vehicle-mounted laser, the single-frame point cloud data have a sometimes empty consistency relationship, a simple convolutional neural network hardly shows the time consistency between the single-frame point cloud data, and a question-answer model is constructed based on the cyclic neural network to extract high-level features in the point cloud data formed by splicing the single-frame point cloud data, so that the feature extraction of the point cloud data is more accurate. In an embodiment of the present invention, the point cloud data includes at least one single-frame point cloud data, the point cloud feature extraction module includes a circulation sub-module and a plurality of single-frame feature extraction sub-modules, the circulation sub-module includes a plurality of first circulation neural networks connected in a chain manner, the single-frame feature extraction sub-modules correspond to the first circulation neural networks one to one, the information to be responded and the point cloud data corresponding to the information to be responded are input into the point cloud feature extraction module, and the point cloud features output by the point cloud feature extraction module are obtained, including: aiming at each single-frame point cloud data, inputting the single-frame point cloud data and information to be responded into a single-frame feature extraction submodule to obtain single-frame features corresponding to the single-frame point cloud data output by the single-frame feature extraction submodule; according to the connection sequence of the first cyclic neural networks, sequentially taking each first cyclic neural network as a current first cyclic neural network, inputting the single-frame features output by the single-frame feature extraction submodule corresponding to the current first cyclic neural network and the network extraction features output by the previous first cyclic neural network of the current first cyclic neural network into the current first cyclic neural network, obtaining the network extraction features output by the current first cyclic neural network, and taking the network extraction features output by the last first cyclic neural network as point cloud features.

Optionally, the single-frame feature of each single-frame point cloud data may be extracted by the single-frame feature extraction submodule, and then the point cloud feature corresponding to the point cloud data is obtained by the circulation submodule according to the single-frame feature of each single-frame point cloud data. Each single-frame feature extraction submodule can independently extract the single-frame features in the corresponding single-frame point cloud data, and can also perform feature extraction on the corresponding single-frame point cloud data by combining the single-frame point cloud data corresponding to other single-frame feature extraction submodules to obtain the single-frame features of the corresponding single-frame point cloud data.

In one embodiment, a single frame from each single frame of point cloud data is generated by a rotation sub-moduleAnd (3) obtaining point cloud characteristics corresponding to the point cloud data, wherein the point cloud characteristics corresponding to the point cloud data can be as follows: according to the chain sequence of the first cyclic neural network, sequentially inputting the single-frame features output by the single-frame feature extraction sub-module corresponding to the first cyclic neural network and the network extraction features output by the first previous cyclic neural network of the first cyclic neural network into the first cyclic neural network, and taking the network extraction features output by the last first cyclic neural network as the point cloud features of the point cloud data. Specifically, the above process may be implemented by a chain-connected first recurrent neural network. Fig. 1b is a schematic structural diagram of a point cloud feature extraction module according to a first embodiment of the present invention, as shown in fig. 1b, the point cloud feature extraction module includes a cyclic submodule 120 and a plurality of single-frame feature extraction submodules (a single-frame feature extraction submodule 111, single-frame feature extraction submodules 112, … …, and a single-frame feature extraction submodule 113), each single-frame feature extraction submodule is configured to extract a single-frame feature of single-frame point cloud data corresponding to the cyclic submodule 120, the cyclic submodule 120 is composed of a plurality of first cyclic neural networks (a first cyclic neural network 121, a first cyclic neural network 122, a first cyclic neural network … …, and a first cyclic neural network 123) connected in a chain manner, wherein the single-frame feature extraction submodule 111 corresponds to the first cyclic neural network 121, the single-frame feature extraction submodule 112 corresponds to the first cyclic neural network 122, … …, and the single-frame feature extraction submodule 113 corresponds to the first cyclic neural network 123, the first recurrent neural network 121 extracts the single-frame feature and h from the single-frame feature extraction sub-module 111₀Obtaining the network extraction feature h corresponding to the first recurrent neural network 121₁And outputs it to the first recurrent neural network 122, and the first recurrent neural network 122 extracts the single-frame feature and h from the single-frame feature extraction sub-module 112₁Obtaining the network extraction feature h corresponding to the first recurrent neural network 122₂And outputs it to the next first recurrent neural network until the last first recurrent neural network 123 extracts the feature h according to the network output by the last first recurrent neural network thereon_T-1Obtaining the network extraction feature h corresponding to the first recurrent neural network 123 by using the single frame feature extracted by the single frame feature extraction submodule 113 corresponding to the single frame feature extraction submodule_TAnd the point cloud data is used as the point cloud characteristics of the point cloud data. The first Recurrent neural network may be a Gated Recurrent Unit (GRU), a Long-term memory network (LSTM), various multi-layer and multi-directional Recurrent neural networks, and the like.

In the present embodiment, the extraction of the single-frame feature that locally attaches importance to the portion related to the information to be responded may be realized by an attention mechanism. In one embodiment, the single-frame feature extraction submodule includes a first problem feature extraction network, an attention module and an original feature extraction network, and the single-frame point cloud data and the information to be responded are input into the single-frame feature extraction submodule to obtain a single-frame feature corresponding to the single-frame point cloud data output by the single-frame feature extraction submodule, and the method includes: inputting the information to be responded into a first problem feature extraction network to obtain a first problem feature output by the first problem feature extraction network; inputting the single-frame point cloud data into an original feature extraction network to obtain original features of the single-frame point cloud data output by the original feature extraction network; and inputting the first question feature and the original feature into the attention module to obtain the single-frame feature output by the attention module.

Fig. 1c is a schematic structural diagram of a single-frame feature extraction sub-module according to a first embodiment of the present invention. The solid line boxes in the figure represent the network layer and the dashed line boxes represent the data layer. As shown in fig. 1c, the single-frame feature extraction submodule includes a first problem feature extraction network 1111, an attention module 1113 and an original feature extraction network 1112, the original feature extraction network 1112 extracts an original feature (1 × 2048) in single-frame point cloud data (n1 × 7), the first problem feature extraction network 1111 extracts a first problem feature (1 × 1024) in the information to be responded, and the attention module 1113 performs weighted transformation on the original feature (1 × 2048) according to the first problem feature to obtain a single-frame feature (1 × 2048) with local emphasis on a part related to the information to be responded.

Optionally, the original feature extraction network 1112 may be a spatial convolution module PointNet, where the spatial convolution module includes three types of neurons T-Net, matrix multiplex and mlp. The T-Net is a feature conversion unit and can learn a geometric transformation (3x3 transform) and feature transformation (64x64 transform) matrix of input data, matrix multiplication is matrix cross multiplication operation, and invariance of a model to specific spatial conversion is guaranteed by the combination of the T-Net and the matrix multiplication; mlp is a Multi-Layer Perceptron (Multi-Layer Perceptron), mlp (64,64) is a two-Layer Perceptron unit 3x64, a 64x64, mlp (64,128,1024) is a three-Layer Perceptron unit 64x64, a 64x128, a 128x1024, each Layer of Perceptron unit is convolution operation, shared weight is independently applied to each cloud point, for example, the Perceptron unit 3x64 and the data Layer 1x3 are subjected to convolution operation to obtain a data Layer 1x64, and the Perceptron unit 128x1024 and the data Layer 1x128 are subjected to convolution output 1x 1024. In this embodiment, the point cloud data may be 7-dimensional data (X, Y, Z, I, R, G, B), where (X, Y, Z) is a spatial coordinate, I is intensity, and (R, G, B) is color. The PointNet is correspondingly split and fused by splitting the point cloud data of the 7-dimensional features (x, y, z, I, R, G and B), so that the point cloud convolution operation compatible with the multi-dimensional features is realized. Fig. 1d is a schematic structural diagram of an original feature extraction network according to an embodiment of the present invention. As shown in fig. 1d, the original feature extraction network 1112 includes a data segmentation module 11121, two spatial convolution modules 11122, and a data fusion module 11123, where the data segmentation module 11121 is configured to segment point cloud data n × 7 into two parts of segmented point cloud data (x, y, z) n × 3 and (I, R, G, B) n × 4 according to channels, and perform feature extraction using the corresponding spatial convolution module pair for each part of segmented point cloud data to obtain a corresponding global feature 1x1024, and then cascade the global features 1x1024 extracted by the two spatial convolution modules into a single-frame feature of 1x2048 by the data fusion module.

In an embodiment of the present invention, the first question feature extraction network is a second recurrent neural network, and the information to be responded is input into the first question feature extraction network to obtain the first question feature output by the first question feature extraction network, and the method includes: and according to the word sequence of the information to be responded, sequentially taking each character in the information to be responded as a current character, inputting the current character and the character feature corresponding to the character before the current character into a second recurrent neural network, obtaining the character feature corresponding to the current character output by the second recurrent neural network, and taking the character feature corresponding to the last character as a first problem feature.

In order to be able to mine the time-series relationship in the question information, a recurrent neural network is used as a first question feature extraction network that extracts a first question feature. Specifically, each character in the information to be responded is sequentially used as a current character according to the language order, the current character and the character feature corresponding to the character before the current character are used as the input of the first problem feature extraction network, the character feature corresponding to the output current character is obtained, and the character feature corresponding to the last character is used as the first problem feature. For example, assuming that the information to be responded is "what the person under the traffic light is doing", first "red" is input into the second recurrent neural network to obtain the character feature of "red" output by the second recurrent neural network, then the character feature of "red" and "green" are input into the second recurrent neural network to obtain the character feature of "green" output by the second recurrent neural network, and until the character feature of "sh" and "how" are input into the second recurrent neural network to obtain the character feature of "how" output by the second recurrent neural network, the character feature is taken as the first problem feature. Optionally, the second recurrent neural network may be a GRU, an LSTM, various multi-layer multidirectional recurrent neural networks, and the like.

In this embodiment, the original features of the single-frame point cloud data may be weighted and transformed by the attention mechanism in space and time, respectively, to obtain the single-frame features that locally attach importance to the information to be responded in both time and space. In one embodiment, the attention module comprises a spatial attention module and a temporal attention module, and the first question feature and the original feature are input into the attention module to obtain a single-frame feature output by the attention module, wherein the single-frame feature comprises: inputting the first problem feature and the original feature into a spatial attention module to obtain a spatial weighting feature output by the spatial attention module; and inputting the first problem feature and the spatial weighting feature output by the spatial attention module in each single-frame feature extraction submodule into the temporal attention module to obtain the single-frame feature output by the temporal attention module.

Fig. 1e is a schematic structural diagram of another single-frame feature extraction sub-module according to a first embodiment of the present invention. The solid line boxes in the figure represent the network layer and the dashed line boxes represent the data layer. FIG. 1e, in contrast to FIG. 1c, embodies the attention module 1113 as a spatial attention module 1114 and a temporal attention module 1115. Specifically, the spatial attention module performs weighted transformation on the original features (1 × 2048) extracted by the original feature extraction module 1112 according to the first problem features extracted by the first problem feature extraction network 1111 to output spatial weighted features (1 × 2048), the time attention module 1115 performs weighted transformation on the spatial weighted features (1 × 2048) output by the spatial attention module in each single-frame feature extraction submodule according to the first problem features (1 × 1024) extracted by the first problem feature extraction network (the spatial attention module in other single-frame feature extraction submodules is not shown in the figure), and outputs single-frame feature single-frame features (1 × 2048) which are locally attached to the information to be responded in time and space.

When the output result is generated, the response information generation module extracts a second problem feature in the information to be responded, integrates the second problem feature and the point cloud feature, and obtains the output result according to the integrated feature. Optionally, the response information generating module includes a second problem feature extraction network, a data integration network, and a third recurrent neural network, and the point cloud feature and the information to be responded are input into the response information generating module to obtain an output result of the response information generating module, including: inputting the information to be responded into a second question feature extraction network to obtain second question features corresponding to the information to be responded output by the second question feature extraction network; inputting the second problem characteristic and the point cloud characteristic into a data integration network to obtain a data integration characteristic output by the data integration network; and inputting the data integration characteristics into the third cyclic neural network to obtain an output result output by the third cyclic neural network.

Fig. 1f is a schematic structural diagram of a response information generating module according to a first embodiment of the present invention. As shown in fig. 1f, the response information generating module includes a second problem feature extracting network 131, a data integrating network 132, and a third recurrent neural network 133, the second problem feature extracting network 131 extracts a second problem feature (1 × 1024) in the information to be responded, the data integrating network 132 performs data integration on the second problem feature (1 × 1024) and the point cloud feature (1 × 2048) output by the point cloud feature extracting module to obtain a data integrating feature, and inputs the data integrating feature into the third recurrent neural network 133, and the third recurrent neural network 133 is configured to generate an output result according to the data integrating feature and output the output result.

And the data integration network can obtain the data integration characteristics by weighting and summing the second problem characteristics and the point cloud characteristics. For example, the data integration network may be represented by M ═ w_QQ+w_hh_TObtaining a data integration characteristic, wherein M is the data integration characteristic, Q is the second problem characteristic, h_TAs a point cloud feature, w_QIs the weight of the second problem feature, w_hIs the weight of the point cloud feature. The third recurrent neural network can be GRU, LSTM, various multi-layer multidirectional recurrent neural networks and the like.

In order to make the output result time-sequential and more accurate, a recurrent neural network is also used as the generation output result. In one embodiment, inputting the data integration feature into the third recurrent neural network to obtain an output result output by the third recurrent neural network includes: inputting the data integration characteristic into a third cyclic neural network to obtain a current predicted response character output by the third cyclic neural network, then inputting the data integration characteristic and the current predicted response character into the third cyclic neural network to obtain a next predicted response character output by the third cyclic neural network until all the predicted response characters output by the third cyclic neural network are obtained, and sequentially splicing the predicted response characters to generate an output result.

Illustratively, the data integration characteristic is firstly input into a third cyclic neural network, the third neural network obtains a first predicted response character in an output result according to the input data integration characteristic, then the first predicted response character and the data integration characteristic are input into the third cyclic neural network again to obtain a second predicted response character output by the third cyclic neural network until the third cyclic neural network outputs a last predicted response character, and the predicted response characters are connected in sequence to obtain the output result.

Fig. 1g is a schematic structural diagram of another question-answering model provided in an embodiment of the present invention. As shown in fig. 1g, the question-answering model includes an encoding portion and a decoding portion, the encoding portion is composed of a first cyclic neural network and a plurality of single-frame feature extraction submodules which are connected in a chain manner, and each single-frame feature extraction submodule includes a first question feature extraction network, an original feature extraction module, a spatial attention module and a temporal attention module. The first cyclic neural network outputs coding features h according to the features extracted by the single-frame feature extraction submodule_TAnd input it to the decoding section. The decoding part consists of a second problem feature extraction network, a data integration network and a third recurrent neural network, wherein the data integration network encodes the feature h_TAnd the second problem features extracted by the second problem feature extraction network are subjected to data integration, data integration features are output, and the third recurrent neural network obtains the output features according to the data integration features.

And S130, determining and outputting response information according to the output result.

In this embodiment, the output result of the question-answering model may be a one-dimensional effective code corresponding to each word in the text library, or may be response information in the form of characters. And if the output result of the question-answer model is the one-dimensional effective code corresponding to each word in the text library, forming character-type response information according to the one-dimensional effective code of each word, and outputting the character-type response information. If the answer information in the text form is output as the output result of the question-answer model, the answer information in the text form can be directly output. The output mode of the response information is not limited herein, and optionally, the response information may be output in a voice broadcast mode or in a text display mode.

According to the embodiment of the invention, information to be responded and point cloud data corresponding to the information to be responded are obtained; inputting the information to be responded and the point cloud data corresponding to the information to be responded into a trained question-answer model to obtain an output result of the question-answer model, wherein the question-answer model comprises a point cloud feature extraction module and a response information generation module: and determining response information according to the output result and outputting the response information, and mining information contained in the point cloud data through a trained question-answer model to realize accurate response based on the point cloud data.

Example two

Fig. 2 is a flowchart of a response method based on point cloud data according to a second embodiment of the present invention. On the basis of the scheme, the operation of training the question-answering model is added. As shown in fig. 2, the method includes:

s210, obtaining sample point cloud data, sample question information corresponding to the sample point cloud data and sample response information corresponding to the sample question information.

In this embodiment, in order to train a question-answering model based on point cloud data, a large amount of marked point cloud data needs to be acquired as a training sample of the model. Optionally, the point cloud data may be acquired in different manners, and the acquired point cloud data is manually labeled to obtain the labeled point cloud data. For example, the sample point cloud data, the sample question information corresponding to the sample point cloud data, and the sample response information corresponding to the sample question information may be obtained in a manner including, but not limited to: (1) collecting point cloud in a real scene, namely collecting point cloud data as sample point cloud data by using Kinect equipment or a laser scanner, and marking a text question and answer under a segment key frame in the sample point cloud data as sample question information corresponding to the sample point cloud data and sample answer information corresponding to the sample question information in a manual marking mode; (2) and virtually collecting point clouds, namely setting a collection track in a simulation environment, acquiring key frame point clouds at certain track points, performing text question answering on the segment key frame point clouds, and taking the text question answering of the segment key frame point clouds as sample question information corresponding to sample point cloud data and sample answer information corresponding to the sample question information.

S220, generating training sample data based on the sample point cloud data, the sample question information corresponding to the sample point cloud data and the sample response information corresponding to the sample question information.

After the sample point cloud data, the sample problem information corresponding to the sample point cloud data and the sample response information corresponding to the sample problem information are obtained, training sample data are generated based on the sample point cloud data, the sample problem information corresponding to the sample point cloud data and the sample response information corresponding to the sample problem information. Optionally, data processing operations such as format conversion and the like may be performed on the sample point cloud data, the sample problem information corresponding to the sample point cloud data, and part of data in the sample response information corresponding to the sample problem information, and the sample point cloud data after data processing, the sample problem information corresponding to the sample point cloud data, and the sample response information corresponding to the sample problem information are used as training sample data. For example, the sample response information may be converted to a code corresponding to a word in a word library in a text library. For example, assuming that the text base is a fixed word base, for example, the text base includes 1024 words { W1, W2, …, and W1024}, and the sample response information is a word sequence { L1, L2, …, LN }, each word may be encoded by One-Hot Encoding (One-Hot Encoding), that is, for the ith word Li ═ Wj ═ 0, 0, …,1, …, 0], the jth bit is encoded as 1, and the other bits are encoded as 0. And obtaining a one-dimensional effective code of the sample response information corresponding to each word, taking the one-dimensional effective code of the sample response information corresponding to each word as the sample response information after data processing, and taking the sample response information, the sample point cloud data and the sample question information as training sample data.

And S230, training the pre-constructed question-answer model by using the training sample data to obtain the trained question-answer model.

And after sample training data are obtained, training the pre-constructed question-answer model by using the training sample data to obtain the trained question-answer model. When training the question-answering model by using training sample data, after the question-answering model generates predicted response information corresponding to sample question information, determining a loss value according to the predicted response information and the sample response information corresponding to the sample question information, and training the question-answering model by taking the loss value reaching a convergence condition as a target to obtain the trained question-answering model. Optionally, the loss value meeting the convergence condition may be that a difference between two adjacent loss values is smaller than a set threshold, or the number of iterations reaches a set target number of iterations.

S240, obtaining the information to be responded and the point cloud data corresponding to the information to be responded.

And S250, inputting the information to be responded and the point cloud data corresponding to the information to be responded into the trained question-answer model to obtain an output result of the question-answer model.

And S260, determining and outputting response information according to the output result.

The method comprises the steps of obtaining sample point cloud data, sample problem information corresponding to the sample point cloud data and sample response information corresponding to the sample problem information; generating training sample data based on the sample point cloud data, the sample problem information corresponding to the sample point cloud data and the response information corresponding to the sample problem information; training a question-answer model constructed in advance based on a cyclic neural network by using training sample data to obtain the trained question-answer model, and acquiring time sequence information contained in point cloud data through the question-answer model constructed based on the cyclic neural network to realize accurate response based on the point cloud data.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a response apparatus based on point cloud data according to a third embodiment of the present invention. The point cloud data-based answering device can be implemented in software and/or hardware, for example, the point cloud data-based answering device can be configured in a computer device. As shown in fig. 3, the apparatus includes an information to be responded obtaining module 310, an output result obtaining module 320, and an information to be responded outputting module 330, wherein:

the to-be-responded information acquiring module 310 is configured to acquire to-be-responded information and point cloud data corresponding to the to-be-responded information;

an output result obtaining module 320, configured to input the information to be answered and the point cloud data corresponding to the information to be answered into a trained question-answer model, and obtain an output result of the question-answer model, where the trained question-answer model includes a point cloud feature extraction module and an answer information generation module:

and the response information output module 330 is configured to determine response information according to the output result and output the response information.

The embodiment of the invention obtains the information to be responded and the point cloud data corresponding to the information to be responded through the information to be responded obtaining module; the output result acquisition module inputs the information to be responded and the point cloud data corresponding to the information to be responded into a trained question-answer model to acquire an output result of the question-answer model, wherein the question-answer model comprises a point cloud feature extraction module and an answer information generation module: and the response information output module determines and outputs response information according to the output result, and mines information contained in the point cloud data through the trained question-answer model, so that accurate response based on the point cloud data is realized.

Optionally, on the basis of the foregoing scheme, the output result obtaining module 320 includes:

the point cloud feature extraction unit is used for inputting the information to be responded and point cloud data corresponding to the information to be responded into the point cloud feature extraction module to obtain point cloud features output by the point cloud feature extraction module;

and the output result generating unit is used for inputting the point cloud characteristics and the information to be responded into the response information generating module to obtain the output result of the response information generating module.

Optionally, on the basis of the above scheme, the point cloud data includes at least one single-frame point cloud data, the point cloud feature extraction module includes a circulation sub-module and a plurality of single-frame feature extraction sub-modules, the circulation sub-module includes a plurality of first circulation neural networks connected in a chain manner, the single-frame feature extraction sub-modules correspond to the first circulation neural networks one to one, and the point cloud feature extraction unit includes:

the single-frame feature subunit is used for inputting the single-frame point cloud data and the information to be responded into the single-frame feature extraction submodule aiming at each single-frame point cloud data to obtain a single-frame feature corresponding to the single-frame point cloud data output by the single-frame feature extraction submodule;

and the point cloud feature subunit is used for sequentially taking each first recurrent neural network as the current first recurrent neural network according to the connection sequence of the first recurrent neural networks, inputting the single-frame features output by the single-frame feature extraction submodule corresponding to the current first recurrent neural network and the network extraction features output by the previous first recurrent neural network of the current first recurrent neural network into the current first recurrent neural network, obtaining the network extraction features output by the current first recurrent neural network, and taking the network extraction features output by the last first recurrent neural network as the point cloud features.

Optionally, on the basis of the above scheme, the single-frame feature extraction sub-module includes a first problem feature extraction network, an attention module, and an original feature extraction network, and the single-frame feature sub-unit is specifically configured to:

inputting the information to be responded into a first problem feature extraction network to obtain a first problem feature output by the first problem feature extraction network;

inputting the single-frame point cloud data into an original feature extraction network to obtain original features of the single-frame point cloud data output by the original feature extraction network;

and inputting the first problem characteristic and the original characteristic into the attention module, and obtaining the single-frame characteristic output by the attention module.

Optionally, on the basis of the above scheme, the first problem feature extraction network is a second recurrent neural network, and the single-frame feature subunit is specifically configured to:

and according to the language order of the sample question information, sequentially taking each character in the sample question information as a current character, inputting the current character and character features corresponding to a character before the current character into a second recurrent neural network, obtaining character features corresponding to the current character output by the second recurrent neural network, and taking the character features corresponding to the last character as first question features.

Optionally, on the basis of the above scheme, the attention module includes a spatial attention module and a temporal attention module, and the single-frame feature subunit is specifically configured to:

inputting the first problem characteristic and the original characteristic into a space attention module to obtain a space weighting characteristic output by the space attention module;

and inputting the first problem feature and the spatial weighting feature output by the spatial attention module in each single-frame feature extraction submodule into the temporal attention module to obtain the single-frame feature output by the temporal attention module.

Optionally, on the basis of the above scheme, the response information generating module includes a second problem feature extraction network, a data integration network, and a third recurrent neural network, and the output result generating unit includes:

the second question feature subunit is used for inputting the information to be responded into a second question feature extraction network to obtain a second question feature corresponding to the information to be responded output by the second question feature extraction network;

the integrated characteristic subunit is used for inputting the second problem characteristic and the point cloud characteristic into the data integration network to obtain a data integration characteristic output by the data integration network;

and the output result subunit is used for inputting the data integration characteristics into the third recurrent neural network to obtain an output result output by the third recurrent neural network.

Optionally, on the basis of the above scheme, the output result subunit is specifically configured to:

inputting the data integration characteristic into a third cyclic neural network to obtain a current predicted response character output by the third cyclic neural network, then inputting the data integration characteristic and the current predicted response character into the third cyclic neural network to obtain a next predicted response character output by the third cyclic neural network until all predicted response characters output by the third cyclic neural network are obtained, and sequentially splicing the predicted response characters to generate an output result.

Optionally, on the basis of the above scheme, the apparatus further includes:

the model training module is used for acquiring sample point cloud data, sample question information corresponding to the sample point cloud data and sample response information corresponding to the sample question information before inputting the information to be responded and the point cloud data corresponding to the information to be responded into the trained question-answer model;

generating training sample data based on the sample point cloud data, sample problem information corresponding to the sample point cloud data and sample response information corresponding to the sample problem information;

and training the pre-constructed question-answering model by using the training sample data to obtain the trained question-answering model.

The point cloud data-based response device provided by the embodiment of the invention can execute the point cloud data-based response method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. FIG. 4 illustrates a block diagram of an exemplary computer device 412 suitable for use in implementing embodiments of the present invention. The computer device 412 shown in FIG. 4 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the present invention.

As shown in FIG. 4, computer device 412 is in the form of a general purpose computing device. Components of computer device 412 may include, but are not limited to: one or more processors 416, a system memory 428, and a bus 418 that couples the various system components (including the system memory 428 and the processors 416).

Bus 418 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and processor 416, or a local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 412 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 412 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 428 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)430 and/or cache memory 432. The computer device 412 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage 434 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 418 by one or more data media interfaces. Memory 428 can include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 440 having a set (at least one) of program modules 442 may be stored, for instance, in memory 428, such program modules 442 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination may include an implementation of a network environment. The program modules 442 generally perform the functions and/or methodologies of the described embodiments of the invention.

The computer device 412 may also communicate with one or more external devices 414 (e.g., keyboard, pointing device, display 424, etc.), with one or more devices that enable a user to interact with the computer device 412, and/or with any devices (e.g., network card, modem, etc.) that enable the computer device 412 to communicate with one or more other computing devices. Such communication may be through input/output (I/O) interfaces 422. Also, computer device 412 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) through network adapter 420. As shown, network adapter 420 communicates with the other modules of computer device 412 over bus 418. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the computer device 412, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processor 416 executes various functional applications and data processing by executing programs stored in the system memory 428, for example, implementing a point cloud data-based answering method provided by the embodiment of the present invention, the method includes:

Of course, those skilled in the art can understand that the processor may also implement the technical solution of the point cloud data-based response method provided in any embodiment of the present invention. In addition, the method for responding by using the trained question-answer model and the method for training the question-answer model in the response method based on the point cloud data provided by any embodiment of the invention can be applied to the same computer equipment and can also be applied to different computer equipment.

EXAMPLE five

The fifth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the point cloud data-based response method provided in the fifth embodiment of the present invention, where the method includes:

Of course, the computer program stored on the computer-readable storage medium provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations of the point cloud data-based response method provided by any embodiments of the present invention.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions without departing from the scope of the invention. Therefore, although the present invention has been described in more detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A response method based on point cloud data is characterized by comprising the following steps:

inputting the information to be responded and the point cloud data corresponding to the information to be responded into a trained question-answer model to obtain an output result of the question-answer model, wherein the question-answer model comprises a point cloud feature extraction module and an answer information generation module:

2. The method according to claim 1, wherein the inputting the information to be answered and the point cloud data corresponding to the information to be answered into a trained question-answering model to obtain an output result of the question-answering model comprises:

inputting the information to be responded and point cloud data corresponding to the information to be responded into the point cloud feature extraction module to obtain point cloud features output by the point cloud feature extraction module;

and inputting the point cloud characteristics and the information to be responded into the response information generation module to obtain the output result of the response information generation module.

3. The method according to claim 2, wherein the point cloud data includes at least one single frame of point cloud data, the point cloud feature extraction module includes a circulation sub-module and a plurality of single frame feature extraction sub-modules, the circulation sub-module includes a plurality of first circulation neural networks connected in a chain manner, the single frame feature extraction sub-modules correspond to the first circulation neural networks in a one-to-one manner, the inputting the information to be responded and the point cloud data corresponding to the information to be responded into the point cloud feature extraction module, and obtaining the point cloud features output by the point cloud feature extraction module includes:

for each single-frame point cloud data, inputting the single-frame point cloud data and the information to be responded into the single-frame feature extraction submodule to obtain a single-frame feature corresponding to the single-frame point cloud data output by the single-frame feature extraction submodule;

and according to the connection sequence of the first recurrent neural networks, sequentially taking each first recurrent neural network as a current first recurrent neural network, inputting the single-frame features output by the single-frame feature extraction submodule corresponding to the current first recurrent neural network and the network extraction features output by the previous first recurrent neural network of the current first recurrent neural network into the current first recurrent neural network, obtaining the network extraction features output by the current first recurrent neural network, and taking the last network extraction feature output by the first recurrent neural network as the point cloud features.

4. The method according to claim 3, wherein the single-frame feature extraction submodule includes a first problem feature extraction network, an attention module and an original feature extraction network, and the inputting the single-frame point cloud data and the information to be responded into the single-frame feature extraction submodule to obtain the single-frame feature corresponding to the single-frame point cloud data output by the single-frame feature extraction submodule includes:

inputting the information to be responded into the first problem feature extraction network to obtain a first problem feature output by the first problem feature extraction network;

inputting the single-frame point cloud data into the original feature extraction network to obtain original features of the single-frame point cloud data output by the original feature extraction network;

inputting the first question feature and the original feature into the attention module, and obtaining the single-frame feature output by the attention module.

5. The method according to claim 4, wherein the first question feature extraction network is a second recurrent neural network, the information to be responded is input into the first question feature extraction network, and the obtaining of the first question feature output by the first question feature extraction network comprises:

and according to the word sequence of the information to be responded, sequentially taking each character in the information to be responded as a current character, inputting the current character and the character feature corresponding to the character before the current character into the second recurrent neural network, obtaining the character feature corresponding to the current character output by the second recurrent neural network, and taking the character feature corresponding to the last character as the first problem feature.

6. The method of claim 4, wherein the attention module comprises a spatial attention module and a temporal attention module, and wherein the inputting the first question feature and the primitive feature into the attention module to obtain the single-frame feature output by the attention module comprises:

inputting the first question feature and the original feature into the spatial attention module to obtain a spatial weighted feature output by the spatial attention module;

and inputting the first problem feature and the spatial weighted feature output by the spatial attention module in each single-frame feature extraction submodule into the temporal attention module to obtain the single-frame feature output by the temporal attention module.

7. The method according to claim 2, wherein the response information generating module comprises a second problem feature extraction network, a data integration network and a third recurrent neural network, and the inputting of the point cloud feature and the information to be responded into the response information generating module to obtain the output result of the response information generating module comprises:

inputting the information to be responded into the second question feature extraction network to obtain a second question feature corresponding to the information to be responded output by the second question feature extraction network;

inputting the second problem feature and the point cloud feature into the data integration network to obtain a data integration feature output by the data integration network;

inputting the data integration feature into the third recurrent neural network to obtain the output result output by the third recurrent neural network.

8. The method of claim 7, wherein inputting the data integration characteristic into the third recurrent neural network to obtain the output result output by the third recurrent neural network comprises:

inputting the data integration characteristic into the third recurrent neural network to obtain a current predicted response character output by the third recurrent neural network, then inputting the data integration characteristic and the current predicted response character into the third recurrent neural network to obtain a next predicted response character output by the third recurrent neural network until all predicted response characters output by the third recurrent neural network are obtained, and sequentially splicing the predicted response characters to generate the output result.

9. The method according to claim 1, before inputting the information to be answered and the point cloud data corresponding to the information to be answered into the trained question-answering model, further comprising:

acquiring sample point cloud data, sample problem information corresponding to the sample point cloud data and sample response information corresponding to the sample problem information;

and training a pre-constructed question-answer model by using the training sample data to obtain the trained question-answer model.

10. A point cloud data-based answering device, comprising:

an output result obtaining module, configured to input the information to be responded and the point cloud data corresponding to the information to be responded into a trained question-answer model, and obtain an output result of the question-answer model, where the trained question-answer model includes a point cloud feature extraction module and an answer information generation module:

and the response information output module is used for determining and outputting response information according to the output result.

11. A computer device, the device comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the point cloud data-based answering method of any one of claims 1-9.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a point cloud data-based answering method according to any one of claims 1 to 9.