CN111241244A - Big data-based answer position acquisition method, device, equipment and medium - Google Patents

Big data-based answer position acquisition method, device, equipment and medium Download PDF

Info

Publication number
CN111241244A
CN111241244A CN202010037661.7A CN202010037661A CN111241244A CN 111241244 A CN111241244 A CN 111241244A CN 202010037661 A CN202010037661 A CN 202010037661A CN 111241244 A CN111241244 A CN 111241244A
Authority
CN
China
Prior art keywords
information
text
word
answer
word vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010037661.7A
Other languages
Chinese (zh)
Inventor
陈桢博
金戈
徐亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010037661.7A priority Critical patent/CN111241244A/en
Priority to PCT/CN2020/093349 priority patent/WO2021143021A1/en
Publication of CN111241244A publication Critical patent/CN111241244A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model

Abstract

The invention discloses an answer position obtaining method based on big data, which comprises the following steps: performing vectorization to obtain a word vector corresponding to the text information to be processed and a word vector corresponding to the problem information; performing feature extraction and dimension compression on the word vector corresponding to the problem information through a Bi-LSTM network to obtain problem coding information; adding a position vector and the problem coding information to each word vector in the text information to obtain text coding information; performing feature extraction on the text coding information through a preset multilayer convolution layer to obtain text feature information corresponding to the text coding information; carrying out sequence labeling on the text characteristic information through a Bi-LSTM network to obtain a first probability and a second probability of each word in the text information; and acquiring a word corresponding to the first probability maximum value as an answer starting position and a word corresponding to the second probability maximum value as an answer ending position. The invention solves the problems of large parameter quantity and long algorithm training time consumption of the existing question-answering model.

Description

Big data-based answer position acquisition method, device, equipment and medium
Technical Field
The invention relates to the technical field of information, in particular to a big data-based answer position acquisition method, device, equipment and medium.
Background
The existing question-answering model is mainly realized by a recurrent neural network or a more complex deep learning language model. However, the algorithm precision of the recurrent neural network is limited, and the model cannot perform parallel computation, so that long training time is required. Although the deep learning language model allows parallel computing and can achieve higher precision, the language model can deal with a plurality of natural language processing tasks, the model structure is complex, the parameter quantity is large, and the training is time-consuming. Therefore, whether the question-answer model is based on the recurrent neural network or the deep learning language model, the defect that algorithm training takes longer time exists, and the lightweight question-answer model cannot be constructed.
Therefore, finding a method for solving the problems of large parameter quantity and long algorithm training time consumption of the existing question-answering model becomes a technical problem to be solved urgently by the technical personnel in the field.
Disclosure of Invention
The embodiment of the invention provides a big data-based answer position acquisition method, a big data-based answer position acquisition device, a big data-based answer position acquisition equipment and a big data-based answer position acquisition medium, and aims to solve the problems that an existing question-answering model is large in parameter quantity and long in algorithm training time.
An answer position obtaining method based on big data comprises the following steps:
acquiring text information and problem information to be processed, and respectively executing vectorization on each word in the text information and the problem information to obtain a word vector corresponding to the text information and a word vector corresponding to the problem information;
performing feature extraction on the word vector corresponding to the problem information through a Bi-directional recurrent neural network Bi-LSTM, and compressing the extracted multi-dimensional feature matrix to a preset dimension to obtain problem coding information;
adding a position vector and the problem coding information to each word vector in the text information to obtain text coding information corresponding to the text information;
performing feature extraction on the text coding information through a preset multilayer convolution layer to obtain text feature information corresponding to the text information;
carrying out sequence labeling on the text characteristic information through a Bi-directional recurrent neural network Bi-LSTM to obtain a first probability and a second probability of each word in the text information;
and acquiring a word corresponding to the first probability maximum value as an answer starting position and a word corresponding to the second probability maximum value as an answer ending position.
Further, the performing feature extraction on the word vector corresponding to the problem information through a Bi-directional recurrent neural network Bi-LSTM, and compressing the extracted multi-dimensional feature matrix to a preset dimension to obtain the problem coding information includes:
performing feature extraction on the word vector corresponding to the problem information through a Bi-directional recurrent neural network Bi-LSTM to obtain an m x n dimensional feature matrix;
compressing the m x n dimensional feature matrix to a 1 x n dimensional feature matrix by an additive attention mechanism;
and converting the 1 x n-dimensional feature matrix into a 1 x k-dimensional feature matrix through a preset full connection layer, and taking the 1 x k-dimensional feature matrix as problem coding information.
Further, the adding a position vector and the question coding information to each word vector in the text information to obtain text coding information corresponding to the text information includes:
generating a position vector corresponding to each word vector in the text information through a preset sine function and a preset cosine function;
and sequentially adding the position vector and the question coding information to the word vector to obtain text coding information corresponding to the word vector.
Further, the performing feature extraction on the text coding information through a preset multilayer convolution layer to obtain text feature information corresponding to the text information includes:
performing feature extraction on the text coding information through a preset first convolution layer to obtain first text feature information corresponding to the text coding information;
respectively carrying out feature extraction on the first text feature information through three layers of parallel second convolution layers to obtain three groups of second text feature information;
matrix multiplication operation is carried out on any two groups of second text characteristic information, normalization processing is carried out on the operation result through a softmax function, and weight information is obtained;
adjusting another group of second text characteristic information according to the weight information to obtain third text characteristic information;
and performing summation processing on the third text characteristic information and the first text characteristic information, and taking an obtained result as text characteristic information corresponding to the text information.
Further, after obtaining text feature information corresponding to the text information, the method further includes:
and transmitting the text characteristic information into a preset full-connection layer to obtain a binary classification result of answer information corresponding to whether the question information exists in the text information to be processed or not.
An answer position acquisition apparatus based on big data, comprising:
the vectorization module is used for acquiring text information and question information to be processed, and respectively executing vectorization on each word in the text information and the question information to obtain a word vector corresponding to the text information and a word vector corresponding to the question information;
the first feature extraction module is used for extracting features of word vectors corresponding to the problem information through a Bi-directional recurrent neural network Bi-LSTM, and compressing the extracted multi-dimensional feature matrix to a preset dimension to obtain problem coding information;
the information adding module is used for adding a position vector and the problem coding information to each word vector in the text information to obtain text coding information corresponding to the text information;
the second feature extraction module is used for performing feature extraction on the text coding information through a preset multilayer convolution layer to obtain text feature information corresponding to the text information;
the probability acquisition module is used for carrying out sequence labeling on the text characteristic information through a Bi-directional recurrent neural network Bi-LSTM to obtain a first probability and a second probability of each word in the text information;
and the answer obtaining module is used for obtaining a word corresponding to the first maximum probability value as an answer starting position and a word corresponding to the second maximum probability value as an answer ending position.
Further, the information adding module comprises:
the position vector generating unit is used for generating a position vector corresponding to each word vector in the text information through a preset sine function and a preset cosine function;
and the adding unit is used for sequentially adding the position vector and the problem coding information to the word vector to obtain text coding information corresponding to the word vector.
Further, the second feature extraction module includes:
the first extraction unit is used for extracting the characteristics of the text coding information through a preset first convolution layer to obtain first text characteristic information corresponding to the text coding information;
the second extraction unit is used for respectively carrying out feature extraction on the first text feature information through three layers of parallel second convolution layers to obtain three groups of second text feature information;
the weight obtaining unit is used for performing matrix multiplication on any two groups of second text characteristic information and performing normalization processing on the operation result through a softmax function to obtain weight information;
the adjusting unit is used for adjusting the other group of second text characteristic information according to the weight information to obtain third text characteristic information;
and the text characteristic acquisition unit is used for performing summation processing on the third text characteristic information and the first text characteristic information, and taking an obtained result as text characteristic information corresponding to the text information.
A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the answer position acquisition method based on big data when executing the computer program.
A computer-readable storage medium storing a computer program which, when executed by a processor, implements the big-data-based answer position acquisition method described above.
The method comprises the steps that vectorization is performed on each word in text information and problem information to obtain a word vector corresponding to the text information and a word vector corresponding to the problem information; then, extracting the characteristics of the word vector corresponding to the problem information through a Bi-directional recurrent neural network Bi-LSTM, and compressing the multi-dimensional characteristic matrix obtained by extraction to a preset dimension to obtain problem coding information; adding a position vector and the problem coding information to each word vector in the text information to obtain text coding information corresponding to the text information; then, extracting the characteristics of the text coding information through a preset multilayer convolution layer to obtain text characteristic information corresponding to the text information; finally, carrying out sequence labeling on the text characteristic information through a bidirectional cyclic neural network to obtain a first probability and a second probability of each word in the text information; and acquiring a word corresponding to the first probability maximum value as an answer starting position and a word corresponding to the second probability maximum value as an answer ending position. Compared with the question-answer model realized only based on the convolutional neural network, the question-answer model realized based on the convolutional neural network has the light weight characteristics of short training time and relatively small parameter quantity, can be obtained by training with relatively small parameter quantity in short training time, improves the efficiency of the question-answer model training, and solves the problems of large parameter quantity and long algorithm training time of the existing question-answer model.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a flowchart of a big data based answer position obtaining method according to an embodiment of the present invention;
FIG. 2 is a flowchart of step S102 of the big data based answer position obtaining method according to another embodiment of the present invention;
FIG. 3 is a flowchart of step S103 of the big data based answer position obtaining method according to another embodiment of the present invention;
FIG. 4 is a flowchart of step S104 of the big data based answer position obtaining method according to another embodiment of the present invention;
FIG. 5 is a flowchart of a big data based answer position obtaining method according to another embodiment of the present invention;
FIG. 6 is a schematic block diagram of an answer position obtaining apparatus based on big data according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The answer position obtaining method based on big data provided in this embodiment is described in detail below. The big data-based method provided by the embodiment of the invention is used for constructing a question-answer model based on a Long Short-Term Memory network (LSTM) and a Convolutional Neural Network (CNN). Among them, LSTM is a time-cycled neural network suitable for processing and predicting significant events of relatively long intervals and delays in a time series. The question-answer model is equivalent to a reading understanding model, the input of the model is text information and question information, and the output of the model is answer position information searched in the text information according to the question information. Here, the question information refers to words, sentences or paragraphs containing question content, such as number of today, who likes to eat apples. The text information refers to sentences or paragraphs containing answers to questions. Optionally, the answer position obtaining method based on big data provided by the embodiment of the invention is applied to a server. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers. In one embodiment, as shown in fig. 1, a big data-based answer position obtaining method is provided, which includes the following steps:
in step S101, text information and question information to be processed are obtained, and vectorization is performed on each word in the text information and the question information, so as to obtain a word vector corresponding to the text information and a word vector corresponding to the question information.
Here, the embodiment of the present invention performs vectorization processing on text information and question information, respectively, with each word in the text information and question information being represented by a vector. Optionally, for text information, a corpus is formed by the text information, a Word2Vec algorithm is adopted, training is performed according to the corpus to obtain a Word vector corresponding to each Word in the text information, and the Word vectors corresponding to each Word in the text information are combined to obtain the Word vector corresponding to the text information. And for the problem information, forming a corpus by the problem information, training according to the corpus by adopting a Word2Vec algorithm to obtain a Word vector corresponding to each Word in the problem information, and combining the Word vectors corresponding to each Word in the problem information to obtain the Word vector corresponding to the problem information.
In step S102, feature extraction is performed on the word vector corresponding to the question information through a Bi-directional recurrent neural network Bi-LSTM, and the extracted multi-dimensional feature matrix is compressed to a preset dimension, so as to obtain question encoding information.
Here, the question encoding information is a feature matrix of the question information. The embodiment of the invention adopts a bidirectional recurrent neural network Bi-LSTM obtained by combining forward LSTM and backward LSTM to extract the characteristics of the word vector corresponding to the problem information, so as to obtain a multi-dimensional characteristic matrix.
The multi-dimensional feature matrix obtained above is the problem code of the preliminary extraction. Since the dimension of the problem code is related to the length of the problem information, the longer the dimension of the problem code, and vice versa. In order to enable the final problem coding information not to be limited by the length of the problem information, the embodiment of the invention further compresses the extracted multi-dimensional feature matrix to a preset dimension so as to unify the dimensions of the problem codes. Optionally, fig. 2 shows a specific implementation flow of step S102 provided in the embodiment of the present invention. As shown in fig. 2, the step S102 of extracting features of the word vector corresponding to the question information through the Bi-directional recurrent neural network Bi-LSTM, and compressing the extracted multi-dimensional feature matrix to a preset dimension to obtain question encoding information includes:
in step S201, feature extraction is performed on the word vector corresponding to the problem information through a Bi-directional recurrent neural network Bi-LSTM, so as to obtain an m × n dimensional feature matrix.
The Bi-directional recurrent neural network Bi-LSTM performs feature extraction on the word vector corresponding to the problem information, and then outputs an m × n dimensional feature matrix. Wherein m and n are positive integers, m represents the number of word vectors corresponding to the problem information, and n represents the characteristic dimension. It can be seen that this is a multi-dimensional feature matrix whose dimensions are related to the length of the problem information (i.e., the number of word vectors and the feature dimensions), and is not uniform.
In step S202, the m × n dimensional feature matrix is compressed to a 1 × n dimensional feature matrix by an additive attention mechanism.
In order to solve the problem that the dimensions of the multidimensional feature matrix corresponding to the problem information are not uniform, the embodiment of the invention compresses the m × n feature matrix to the 1 × n feature matrix by an additive attention mechanism. Specifically, by establishing a feedforward full-link layer, the algorithm formula of the full-link layer can be expressed as wx + b; where x represents the input, in this embodiment, the input is the m × n dimensional feature matrix, and w and b represent the learned parameters. And passing the m-dimensional feature matrix through the feedforward full-connection layer, and then normalizing through a softmax function to obtain the attention weight of 1 m dimension. And adding the attention weight of the 1 x m dimension and the m x n dimension feature matrix, thereby compressing the original m dimension to 1 dimension to obtain the 1 x n dimension feature matrix.
In step S203, the 1 × n dimensional feature moments are converted into a 1 × k dimensional feature matrix through a preset full connection layer, and the 1 × k dimensional feature matrix is used as problem coding information.
The 1 × n dimensional feature matrix obtained by the additive attention mechanism of step S202 is still related to the number of features and is not a uniform dimension. In this regard, the embodiment of the present invention further obtains problem coding information with uniform dimensions by constructing a full connection layer and outputting a dimension k by user-defining to convert the 1 × n dimensional feature moment into a 1 × k dimensional feature matrix. Wherein k is a positive integer.
According to the embodiment of the invention, through an additive attention mechanism and a full connection layer, the problem coding information with different dimensions output by the bidirectional cyclic neural network is converted into the problem coding information with uniform dimensions, so that the text coding information can be conveniently constructed subsequently, the number of parameters of the question-answering model is reduced, the training time of the question-answering model is shortened, and the lightweight question-answering model is realized.
In step S103, a position vector and the question encoding information are added to each word vector in the text information, so as to obtain text encoding information corresponding to the text information.
After the question coding information is obtained, the embodiment of the invention constructs text coding information based on the question coding information, and the question coding information represents the question information so as to search corresponding answers in the text information.
Here, the position vector indicates a position of a word corresponding to the word vector in the text information. In the subsequent steps, the feature extraction is carried out on the text coding information through the convolutional neural network, and the word sequence information is ignored when the feature extraction is carried out on the text coding information through the convolutional neural network, so that the position vector is added into the text coding information to supplement the information, and the question-answering model can utilize the sequence of the word vector sequence. Optionally, fig. 3 shows a specific implementation flow of step S103 provided in the embodiment of the present invention. As shown in fig. 3, the adding, in step S103, a position vector and the question coding information to each word vector in the text information to obtain text coding information corresponding to the text information includes:
in step S301, for each word vector in the text information, a position vector corresponding to the word vector is generated through a preset sine function and a preset cosine function.
The embodiment of the invention selects the sine function and the cosine function with different frequencies to generate the position vector corresponding to the word vector in the text information. The sine function and the cosine function are respectively as follows:
PE(pos,2i)=sin(pos/100002i/d)
PE(pos,2i+1)=cos(pos/100002i/d)
in the above formula, PE represents a position vector corresponding to a word vector, pos represents a position number of the word vector in the text information, i represents an i-th dimensional position vector, and i is a natural number; d represents the total dimension of the generated position vector, PE (pos,2i) represents the position vector of the 2 i-th dimension corresponding to the pos-th word vector, and PE (pos,2i +1) represents the position vector of the 2i + 1-th dimension corresponding to the pos-th word vector.
Optionally, in an embodiment of the present invention, the total dimension d is 100, that is, the dimension of the position vector is 100. Then, for the 10 th word vector in the text information, its 1 st-dimension position vector is PE (pos,2i +1) ═ PE (10,1) ═ cos (10/10000)0/100) By setting 2i +1 to 1, i is set to 0, and cos (pos/10000) is substituted2i/d) Obtaining;
the 2 nd-dimensional position vector is PE (pos,2i) ═ PE (10,2) ═ sin (10/10000)2/100) By making 2i equal to 2, i equal to 1 is obtained, and sin (pos/10000) is substituted2i/d) Obtaining;
the 3 rd-dimensional position vector is PE (pos,2i +1) ═ PE (10,3) ═ cos (10/10000)2/100) By substituting 2i +1 into 3, i-1 into cos (pos/10000) is obtained2i/d) Thus, the compound was obtained.
The position vectors of the other dimensions and so on.
In step S302, after the position vector and the question encoding information are sequentially added to the word vector, text encoding information corresponding to the word vector is obtained.
After a position vector corresponding to each word vector in the text information is obtained, the position vector is added to the word vector, then the question coding information is added to the position vector, and the obtained vector is used as the text coding information corresponding to the word vector. The text coding information represents the sequence information of the word vectors in the text information through the position vectors, and further supplements the word sequence information in the features extracted by the convolutional neural network; the question coding information represents question information, so that corresponding answers can be searched in text information more easily. And combining the text coding information corresponding to all the word vectors in the text information to obtain the text coding information corresponding to the text information.
Alternatively, in order to improve the training efficiency of the question-answering model and shorten the training time, the step 102 and the step S103 may be executed synchronously as two parallel structures of the question-answering model.
In step S104, feature extraction is performed on the text encoding information through a preset multilayer convolution layer, so as to obtain text feature information corresponding to the text information.
After the text information is encoded, the question-answering model provided by the embodiment of the invention performs feature extraction on the text encoded information through a plurality of layers of convolution layers. On the basis of the original convolution layer, a self-attention mechanism is added. The self-attention mechanism is a three-layer parallel convolutional layer structure additionally constructed on the basis of convolutional layers, wherein the output of the two parallel convolutional layers generates weight information for adjusting the output of the other parallel convolutional layer. And adding the output of the other parallel convolution layer after the weight adjustment and the output of the original convolution layer to obtain a result as text characteristic information corresponding to the text coding information. Due to the fact that interactive association may exist among texts, a higher attention weight can be allocated to key words in the text information through the self-attention mechanism, and the classification effect is improved.
Optionally, fig. 4 shows a specific implementation flow of step S104 provided in the embodiment of the present invention. As shown in fig. 4, the step S104 of performing feature extraction on the text encoding information through a preset multilayer convolution layer to obtain text feature information corresponding to the text information includes:
in step S401, feature extraction is performed on the text encoding information through a preset first convolution layer, so as to obtain first text feature information corresponding to the text encoding information.
Here, the first winding layer has only one layer. The text coding information is transmitted into the first convolution layer to carry out feature extraction, and feature information output by the first convolution layer is obtained and used as first text feature information corresponding to the text coding information. The first text characteristic information is the characteristic information obtained after text coding information is further purified.
In step S402, feature extraction is performed on the first text feature information by three parallel second convolutional layers, so as to obtain three groups of second text feature information.
And then respectively transmitting the first text feature information into three layers of parallel second convolution layers for feature extraction. And acquiring the characteristic information output by each second convolution layer as second text characteristic information. Here, the second text characteristic information is characteristic information obtained by further purifying the first text characteristic information. The three parallel layers of the second convolution layer may be the same, and the obtained three groups of second text characteristic information are also the same.
In step S403, matrix multiplication is performed on any two sets of second text feature information, and normalization processing is performed on the operation result by a softmax function, resulting in weight information.
Here, it is assumed that the obtained second text feature information is an M × N dimensional feature matrix, two groups of second text feature information are arbitrarily selected from the three groups of second text feature information, one group of second text feature information is transposed to obtain an N × M dimensional feature matrix, and matrix multiplication is performed on the selected second text feature information and the transposed second text feature information to obtain an M × M dimensional feature matrix. And then, performing normalization processing on the M-dimensional feature matrix through a softmax function to obtain M-dimensional weight information. Wherein M, N is a positive integer; the matrix multiplication is prior art and is not described herein.
In step S404, another group of second text feature information is adjusted according to the weight information, so as to obtain third text feature information.
Here, in the embodiment of the present invention, matrix multiplication is performed on the M × M-dimensional weight information and another set of second text feature information (M × N-dimensional feature matrix), so as to adjust the another set of second text feature information, and obtain an M × N-dimensional feature matrix as third text feature information. And the third text characteristic information is the characteristic information obtained by further purifying the second text characteristic information.
In step S405, a summation process is performed on the third text characteristic information and the first text characteristic information, and an obtained result is used as text characteristic information corresponding to the text information.
In the embodiment of the present invention, the summation operation refers to performing summation on corresponding positions of two matrices with the same dimension, and outputting the matrices with the same dimension. In the embodiment of the present invention, the corresponding positions are summed, and the obtained result M × N dimensional feature matrix is used as the text feature information corresponding to the text encoding information.
The parallel computation is performed through the convolutional neural network from the step S401 to the step S405 to extract features, and compared with a form that the convolutional neural network sequentially computes according to an input sequence, the training speed of the question-answering model is effectively increased, and the training time consumption of the question-answering model is reduced.
In step S105, the text feature information is subjected to sequence labeling through a Bi-directional recurrent neural network Bi-LSTM, so as to obtain a first probability and a second probability of each word in the text information.
The obtained text characteristic information is directly transmitted into the Bi-directional recurrent neural network Bi-LSTM. In the embodiment of the invention, the Bi-directional recurrent neural network Bi-LSTM executes sequence labeling according to the input text characteristic information and then outputs a group of target sequences. In the target sequence, two probabilities of each word in the text information are included, respectively a first probability of beginning as an answer and a second probability of ending as an answer.
In step S106, a word corresponding to the first maximum probability value is acquired as an answer start position and a word corresponding to the second maximum probability value is acquired as an answer end position.
After the target sequence is obtained, selecting a first probability maximum value and a second probability maximum value, taking a word corresponding to the first probability maximum value as an answer starting position, taking a word corresponding to the second probability maximum value as an answer ending position, combining contents between the answer starting position and the answer ending position to be used as an answer of the question information in the text information, and effectively improving the accuracy of answer labeling
In summary, compared with the question-answer model realized only based on the convolutional neural network, the improved attention mechanism effectively improves the precision performance and the training convergence efficiency of the question-answer model, has the light weight characteristics of short training time and relatively small parameter quantity, can be obtained by training with relatively small parameter quantity in short training time, improves the efficiency of the question-answer model training, and solves the problems of large parameter quantity and long algorithm training time of the existing question-answer model; and compared with the labeling, the full-connection layer output labeling effect is better when sequence coding is carried out through an LSTM structure.
Optionally, as a preferred example of the present invention, a result of whether an answer can be found in the text information may also be obtained according to the text feature information corresponding to the text encoding information output in step S104. Fig. 5 is another implementation flow of the answer position obtaining method based on big data according to the embodiment of the present invention, including steps S501 to S504, where steps S501 to S504 are the same as steps S501 to S504 described in the embodiment of fig. 1, and reference is specifically made to the description of the above embodiment, which is not repeated herein. The method further comprises the following steps:
in step S505, the text feature information is transmitted to a preset full connection layer, and a result of classifying whether answer information corresponding to the question information exists in the text information to be processed is obtained.
Here, the text feature information output after passing through the plurality of convolution layers is a matrix of M × N, where M denotes the number of word vectors in the text information. According to the embodiment of the invention, the text feature information is compressed into the N-dimensional vector by additive attention, and then the N-dimensional vector is accessed into a preset full-connection layer. The output of the fully-connected layer includes two dimensions representing that an answer can be found and representing that an answer cannot be found. Whether answer information corresponding to the question information exists in the text information to be processed or not can be classified through the full connection layer, namely the answer information corresponding to the question information exists in the text information or the answer information corresponding to the question information does not exist in the text information, and a classification result is obtained.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In an embodiment, an answer position obtaining device based on big data is provided, and the answer position obtaining device based on big data corresponds to the answer position obtaining method based on big data in the above embodiment one to one. As shown in fig. 6, the big data-based answer position obtaining apparatus includes a vectorization module 61, a first feature extraction module 62, an information adding module 63, a second feature extraction module 64, a probability obtaining module 65, and an answer obtaining module 66. The functional modules are explained in detail as follows:
the vectorization module 61 is configured to obtain text information and question information to be processed, and perform vectorization on each word in the text information and the question information respectively to obtain a word vector corresponding to the text information and a word vector corresponding to the question information;
the first feature extraction module 62 is configured to perform feature extraction on the word vector corresponding to the problem information through a Bi-directional recurrent neural network Bi-LSTM, and compress the extracted multidimensional feature matrix to a preset dimension to obtain problem coding information;
the information adding module 63 is configured to add a position vector and the question coding information to each word vector in text information to obtain text coding information corresponding to the text information;
a second feature extraction module 64, configured to perform feature extraction on the text coding information through a preset multilayer convolution layer to obtain text feature information corresponding to the text information;
the probability obtaining module 65 is configured to perform sequence labeling on the text feature information through a Bi-directional recurrent neural network Bi-LSTM to obtain a first probability and a second probability of each word in the text information;
and the answer obtaining module 66 is configured to obtain a word corresponding to the first maximum probability value as an answer starting position and a word corresponding to the second maximum probability value as an answer ending position.
Optionally, the first feature extraction module 62 includes:
the feature extraction unit is used for extracting features of the word vectors corresponding to the problem information through a Bi-directional recurrent neural network Bi-LSTM to obtain an m-n dimensional feature matrix;
a compression unit for compressing the m x n dimensional feature matrix to a 1 x n dimensional feature matrix by an additive attention mechanism;
and the conversion unit is used for converting the 1 x n-dimensional feature matrix into a 1 x k-dimensional feature matrix through a preset full connection layer, and taking the 1 x k-dimensional feature matrix as problem coding information.
Optionally, the information adding module 63 includes:
the position vector generating unit is used for generating a position vector corresponding to each word vector in the text information through a preset sine function and a preset cosine function;
and the adding unit is used for sequentially adding the position vector and the problem coding information to the word vector to obtain text coding information corresponding to the word vector.
Optionally, the second feature extraction module 64 includes:
the first extraction unit is used for extracting the characteristics of the text coding information through a preset first convolution layer to obtain first text characteristic information corresponding to the text coding information;
the second extraction unit is used for respectively carrying out feature extraction on the first text feature information through three layers of parallel second convolution layers to obtain three groups of second text feature information;
the weight obtaining unit is used for performing matrix multiplication on any two groups of second text characteristic information and performing normalization processing on the operation result through a softmax function to obtain weight information;
the adjusting unit is used for adjusting the other group of second text characteristic information according to the weight information to obtain third text characteristic information;
and the text characteristic acquisition unit is used for performing summation processing on the third text characteristic information and the first text characteristic information, and taking an obtained result as text characteristic information corresponding to the text information.
Optionally, the apparatus further comprises:
and the classification module is used for transmitting the text characteristic information into a preset full connection layer to obtain a classification result of whether answer information corresponding to the question information exists in the text information to be processed or not.
For the specific definition of the big data based answer position obtaining device, reference may be made to the above definition of the big data based answer position obtaining method, which is not described herein again. The modules in the big data-based answer position obtaining device may be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a big data based answer position acquisition method.
In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
acquiring text information and problem information to be processed, and respectively executing vectorization on each word in the text information and the problem information to obtain a word vector corresponding to the text information and a word vector corresponding to the problem information;
performing feature extraction on the word vector corresponding to the problem information through a Bi-directional recurrent neural network Bi-LSTM, and compressing the extracted multi-dimensional feature matrix to a preset dimension to obtain problem coding information;
adding a position vector and the problem coding information to each word vector in the text information to obtain text coding information corresponding to the text information;
performing feature extraction on the text coding information through a preset multilayer convolution layer to obtain text feature information corresponding to the text information;
carrying out sequence labeling on the text characteristic information through a Bi-directional recurrent neural network Bi-LSTM to obtain a first probability and a second probability of each word in the text information;
and acquiring a word corresponding to the first probability maximum value as an answer starting position and a word corresponding to the second probability maximum value as an answer ending position.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. An answer position obtaining method based on big data is characterized by comprising the following steps:
acquiring text information and problem information to be processed, and respectively executing vectorization on each word in the text information and the problem information to obtain a word vector corresponding to the text information and a word vector corresponding to the problem information;
performing feature extraction on the word vector corresponding to the problem information through a Bi-directional recurrent neural network Bi-LSTM, and compressing the extracted multi-dimensional feature matrix to a preset dimension to obtain problem coding information;
adding a position vector and the problem coding information to each word vector in the text information to obtain text coding information corresponding to the text information;
performing feature extraction on the text coding information through a preset multilayer convolution layer to obtain text feature information corresponding to the text information;
carrying out sequence labeling on the text characteristic information through a Bi-directional recurrent neural network Bi-LSTM to obtain a first probability and a second probability of each word in the text information;
and acquiring a word corresponding to the first probability maximum value as an answer starting position and a word corresponding to the second probability maximum value as an answer ending position.
2. The big-data-based answer position obtaining method according to claim 1, wherein the performing feature extraction on the word vector corresponding to the question information through a Bi-directional recurrent neural network Bi-LSTM, and compressing the extracted multi-dimensional feature matrix to a preset dimension to obtain question coding information comprises:
performing feature extraction on the word vector corresponding to the problem information through a Bi-directional recurrent neural network Bi-LSTM to obtain an m x n dimensional feature matrix;
compressing the m x n dimensional feature matrix to a 1 x n dimensional feature matrix by an additive attention mechanism;
and converting the 1 x n-dimensional feature matrix into a 1 x k-dimensional feature matrix through a preset full connection layer, and taking the 1 x k-dimensional feature matrix as problem coding information.
3. The big-data-based answer position obtaining method according to claim 1 or 2, wherein the adding a position vector and the question coding information to each word vector in the text information to obtain the text coding information corresponding to the text information comprises:
generating a position vector corresponding to each word vector in the text information through a preset sine function and a preset cosine function;
and sequentially adding the position vector and the question coding information to the word vector to obtain text coding information corresponding to the word vector.
4. The big-data-based answer position obtaining method according to claim 1 or 2, wherein the obtaining of the text feature information corresponding to the text information by feature extraction of the text coding information through a preset multilayer convolution layer comprises:
performing feature extraction on the text coding information through a preset first convolution layer to obtain first text feature information corresponding to the text coding information;
respectively carrying out feature extraction on the first text feature information through three layers of parallel second convolution layers to obtain three groups of second text feature information;
matrix multiplication operation is carried out on any two groups of second text characteristic information, normalization processing is carried out on the operation result through a softmax function, and weight information is obtained;
adjusting another group of second text characteristic information according to the weight information to obtain third text characteristic information;
and performing summation processing on the third text characteristic information and the first text characteristic information, and taking an obtained result as text characteristic information corresponding to the text information.
5. The big-data-based answer position acquisition method according to claim 1 or 2, wherein after obtaining text feature information corresponding to the text information, the method further comprises:
and transmitting the text characteristic information into a preset full-connection layer to obtain a binary classification result of answer information corresponding to whether the question information exists in the text information to be processed or not.
6. An answer position acquisition apparatus based on big data, comprising:
the vectorization module is used for acquiring text information and question information to be processed, and respectively executing vectorization on each word in the text information and the question information to obtain a word vector corresponding to the text information and a word vector corresponding to the question information;
the first feature extraction module is used for extracting features of word vectors corresponding to the problem information through a Bi-directional recurrent neural network Bi-LSTM, and compressing the extracted multi-dimensional feature matrix to a preset dimension to obtain problem coding information;
the information adding module is used for adding a position vector and the problem coding information to each word vector in the text information to obtain text coding information corresponding to the text information;
the second feature extraction module is used for performing feature extraction on the text coding information through a preset multilayer convolution layer to obtain text feature information corresponding to the text information;
the probability acquisition module is used for carrying out sequence labeling on the text characteristic information through a Bi-directional recurrent neural network Bi-LSTM to obtain a first probability and a second probability of each word in the text information;
and the answer obtaining module is used for obtaining a word corresponding to the first maximum probability value as an answer starting position and a word corresponding to the second maximum probability value as an answer ending position.
7. The big-data-based answer position acquisition apparatus according to claim 6, wherein the information adding module includes:
the position vector generating unit is used for generating a position vector corresponding to each word vector in the text information through a preset sine function and a preset cosine function;
and the adding unit is used for sequentially adding the position vector and the problem coding information to the word vector to obtain text coding information corresponding to the word vector.
8. The big-data-based answer position acquisition apparatus according to claim 6 or 7, wherein the second feature extraction module includes:
the first extraction unit is used for extracting the characteristics of the text coding information through a preset first convolution layer to obtain first text characteristic information corresponding to the text coding information;
the second extraction unit is used for respectively carrying out feature extraction on the first text feature information through three layers of parallel second convolution layers to obtain three groups of second text feature information;
the weight obtaining unit is used for performing matrix multiplication on any two groups of second text characteristic information and performing normalization processing on the operation result through a softmax function to obtain weight information;
the adjusting unit is used for adjusting the other group of second text characteristic information according to the weight information to obtain third text characteristic information;
and the text characteristic acquisition unit is used for performing summation processing on the third text characteristic information and the first text characteristic information, and taking an obtained result as text characteristic information corresponding to the text information.
9. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the big-data based answer position acquisition method according to any one of claims 1 to 5 when executing the computer program.
10. A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the big-data based answer position obtaining method according to any one of claims 1 to 5.
CN202010037661.7A 2020-01-14 2020-01-14 Big data-based answer position acquisition method, device, equipment and medium Pending CN111241244A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010037661.7A CN111241244A (en) 2020-01-14 2020-01-14 Big data-based answer position acquisition method, device, equipment and medium
PCT/CN2020/093349 WO2021143021A1 (en) 2020-01-14 2020-05-29 Big data-based answer position acquisition method, apparatus, device, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010037661.7A CN111241244A (en) 2020-01-14 2020-01-14 Big data-based answer position acquisition method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN111241244A true CN111241244A (en) 2020-06-05

Family

ID=70880924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010037661.7A Pending CN111241244A (en) 2020-01-14 2020-01-14 Big data-based answer position acquisition method, device, equipment and medium

Country Status (2)

Country Link
CN (1) CN111241244A (en)
WO (1) WO2021143021A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949791A (en) * 2020-07-28 2020-11-17 中国工商银行股份有限公司 Text classification method, device and equipment
CN112632256A (en) * 2020-12-29 2021-04-09 平安科技(深圳)有限公司 Information query method and device based on question-answering system, computer equipment and medium
CN112732942A (en) * 2021-01-16 2021-04-30 江苏网进科技股份有限公司 User-oriented multi-turn question-answer legal document entity relationship extraction method
CN113239165A (en) * 2021-05-17 2021-08-10 山东新一代信息产业技术研究院有限公司 Reading understanding method and system based on cloud robot and storage medium
US20210406467A1 (en) * 2020-06-24 2021-12-30 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for generating triple sample, electronic device and computer storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3454260A1 (en) * 2017-09-11 2019-03-13 Tata Consultancy Services Limited Bilstm-siamese network based classifier for identifying target class of queries and providing responses thereof
US11501076B2 (en) * 2018-02-09 2022-11-15 Salesforce.Com, Inc. Multitask learning as question answering
CN109033068B (en) * 2018-06-14 2022-07-12 北京慧闻科技(集团)有限公司 Method and device for reading and understanding based on attention mechanism and electronic equipment
CN110083682B (en) * 2019-04-19 2021-05-28 西安交通大学 Machine reading comprehension answer obtaining method based on multi-round attention mechanism
CN110334184A (en) * 2019-07-04 2019-10-15 河海大学常州校区 The intelligent Answer System understood is read based on machine
CN110442691A (en) * 2019-07-04 2019-11-12 平安科技(深圳)有限公司 Machine reads the method, apparatus and computer equipment for understanding Chinese
CN110347802B (en) * 2019-07-17 2022-09-02 北京金山数字娱乐科技有限公司 Text analysis method and device
CN110502627A (en) * 2019-08-28 2019-11-26 上海海事大学 A kind of answer generation method based on multilayer Transformer polymerization encoder

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210406467A1 (en) * 2020-06-24 2021-12-30 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for generating triple sample, electronic device and computer storage medium
CN111949791A (en) * 2020-07-28 2020-11-17 中国工商银行股份有限公司 Text classification method, device and equipment
CN111949791B (en) * 2020-07-28 2024-01-30 中国工商银行股份有限公司 Text classification method, device and equipment
CN112632256A (en) * 2020-12-29 2021-04-09 平安科技(深圳)有限公司 Information query method and device based on question-answering system, computer equipment and medium
CN112732942A (en) * 2021-01-16 2021-04-30 江苏网进科技股份有限公司 User-oriented multi-turn question-answer legal document entity relationship extraction method
CN113239165A (en) * 2021-05-17 2021-08-10 山东新一代信息产业技术研究院有限公司 Reading understanding method and system based on cloud robot and storage medium

Also Published As

Publication number Publication date
WO2021143021A1 (en) 2021-07-22

Similar Documents

Publication Publication Date Title
CN111241244A (en) Big data-based answer position acquisition method, device, equipment and medium
CN110413785B (en) Text automatic classification method based on BERT and feature fusion
CN108427771B (en) Abstract text generation method and device and computer equipment
CN108563782B (en) Commodity information format processing method and device, computer equipment and storage medium
WO2020258506A1 (en) Text information matching degree detection method and apparatus, computer device and storage medium
CN111985228B (en) Text keyword extraction method, text keyword extraction device, computer equipment and storage medium
CN109034378A (en) Network representation generation method, device, storage medium and the equipment of neural network
CN107395211B (en) Data processing method and device based on convolutional neural network model
CN111143563A (en) Text classification method based on integration of BERT, LSTM and CNN
US11544542B2 (en) Computing device and method
WO2020204904A1 (en) Learning compressible features
CN113741858B (en) Memory multiply-add computing method, memory multiply-add computing device, chip and computing equipment
CN112528634A (en) Text error correction model training and recognition method, device, equipment and storage medium
CN109918507B (en) textCNN (text-based network communication network) improved text classification method
CN113157919B (en) Sentence text aspect-level emotion classification method and sentence text aspect-level emotion classification system
CN111767697B (en) Text processing method and device, computer equipment and storage medium
CN113886550A (en) Question-answer matching method, device, equipment and storage medium based on attention mechanism
CN112732864A (en) Document retrieval method based on dense pseudo query vector representation
CN115017178A (en) Training method and device for data-to-text generation model
CN114332500A (en) Image processing model training method and device, computer equipment and storage medium
CN116434741A (en) Speech recognition model training method, device, computer equipment and storage medium
CN111832303A (en) Named entity identification method and device
CN111737406B (en) Text retrieval method, device and equipment and training method of text retrieval model
CN113282707A (en) Data prediction method and device based on Transformer model, server and storage medium
CN109472366B (en) Coding and decoding method and device of machine learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination