WO2021082086A1 - 机器阅读方法、系统、装置及存储介质 - Google Patents

机器阅读方法、系统、装置及存储介质 Download PDF

Info

Publication number
WO2021082086A1
WO2021082086A1 PCT/CN2019/118501 CN2019118501W WO2021082086A1 WO 2021082086 A1 WO2021082086 A1 WO 2021082086A1 CN 2019118501 W CN2019118501 W CN 2019118501W WO 2021082086 A1 WO2021082086 A1 WO 2021082086A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
question
matrix
probability
word
Prior art date
Application number
PCT/CN2019/118501
Other languages
English (en)
French (fr)
Inventor
周宸
骆加维
周宝
陈远旭
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021082086A1 publication Critical patent/WO2021082086A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a machine reading method, system, electronic device and storage medium.
  • Machine reading is a branch of natural language processing. Its main function is to find answers in the text with questions based on the questions and texts raised by users. At present, machine reading technology has made great contributions to machine reading from the initial seq2seq based on machine translation to RNN-based models such as Bidaf, mlstm and r-net to QAnet and BERT models that rely on transformers.
  • RNN-based models such as Bidaf, mlstm and r-net to QAnet and BERT models that rely on transformers.
  • the well-known machine reading data sets include SQuAD of Stanford University and MS MARCO of Microsoft in English, and dureader of Baidu in Chinese.
  • SQuAD of Stanford University
  • MS MARCO of Microsoft in English
  • dureader of Baidu in Chinese
  • most of the technical research is based on the SQuAD data set.
  • the current popular Bidaf, QAnet and BERT have all made great progress on the SQuAD data set.
  • the current machine reading models are based on RNN, such as mlstm and bidaf or transformer frameworks. The applicant realizes that although there are The model can reflect the context of the text, and some models can extract the overall relevance of the sentence, but there is currently no method that can simultaneously obtain the overall relevance and local relevance of the sentence.
  • This application provides a machine reading method, system, electronic device, and computer readable storage medium. Its main purpose is to form a new machine reading network structure by combining the transformer and the lstm model to solve the current overall correlation of sentences that cannot be obtained at the same time. The issue of sex and local relevance.
  • this application provides a machine reading method, including:
  • the input vector processed by the highway nonlinear conversion layer is processed by lstm to obtain the text with local characteristics
  • the input vector processed by the highway nonlinear conversion layer is processed by the transformer to obtain the text with the overall characteristics. And fusing the text with the local feature and the text with the overall feature to form a text with the local feature and the overall feature;
  • Use lstm to process all the relevant information of the question and answer in the obtained text, and output the start probability and the end probability, and multiply the start probability and the end probability, and use the sentence with the highest probability after the multiplication as the question in the text s answer.
  • this application also provides a machine reading system, including:
  • the word vector acquisition module is used to use the glove word vector training model to pre-train all the words in the text to be processed, and to obtain the word vectors of all words in the text to be processed that are mapped in the same vector space;
  • the field embedding acquisition module is used to process the word vectors of all words obtained by using a character-level convolutional neural network to obtain field embeddings;
  • An input vector forming module which is used for splicing the word vector and the field embedding to form an input vector, and processing the input vector through a highway nonlinear conversion layer;
  • the local feature and global feature acquisition module is used to process the input vector processed by the highway nonlinear conversion layer through lstm to obtain the text of the local feature, and use the transformer to process the input vector processed through the highway nonlinear conversion layer
  • the vector is processed to obtain the text with the overall feature, and the text with the local feature and the text with the overall feature are merged to form the text with the local feature and the overall feature;
  • the associated information acquisition module is used to process the acquired text with local features and overall features through the Bidirectional Attention Flow model and transformer, and obtain all relevant information about the questions and answers in the text;
  • the answer obtaining module is used to use lstm to process all the related information of the question and answer in the obtained text, and output the start probability and the end probability, and multiply the start probability and the end probability, and the highest probability after the multiplication
  • One sentence is the answer to the question in the text.
  • the present application also provides an electronic device, the electronic device includes: a memory, a processor, the memory includes a machine reading program based on transformer and lstm, the machine reading program based on transformer and lstm When executed by the processor, the following steps are implemented:
  • the input vector processed by the highway nonlinear conversion layer is processed by lstm to obtain the text with local characteristics, and the input vector processed by the highway nonlinear conversion layer is processed by the transformer to obtain the text with the overall characteristics. And merge the text with the local feature and the text with the overall feature to form;
  • Use lstm to process all the relevant information of the question and answer in the obtained text, and output the start probability and the end probability, and multiply the start probability and the end probability, and use the sentence with the highest probability after the multiplication as the question in the text s answer.
  • this application also provides a computer-readable storage medium, the computer-readable storage medium includes a machine reading program based on transformer and lstm, the machine reading program based on transformer and lstm is executed by a processor When, implement any step in the machine reading method as described above.
  • the machine reading method, system, electronic device and storage medium proposed in this application construct a reading network structure through transformer and lstm, obtain partial information in the text through lstm, and obtain overall information in the text through transformer.
  • the constructed device reading network structure solves the current problem that the overall relevance and partial relevance of sentences cannot be obtained at the same time.
  • FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of the machine reading method of this application
  • FIG. 2 is a schematic diagram of modules of a preferred embodiment of the machine reading program based on transformer and lstm in FIG. 1;
  • FIG. 3 is a flowchart of a preferred embodiment of the machine reading method of this application.
  • This application provides a machine reading method, which is applied to an electronic device 1.
  • FIG. 1 it is a schematic diagram of the application environment of the preferred embodiment of the machine reading method of this application.
  • the electronic device 1 may be a terminal device with arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
  • the electronic device 1 includes a processor 12, a memory 11, a network interface 14 and a communication bus 15.
  • the memory 11 includes at least one type of readable storage medium.
  • the readable storage medium of the memory 11 is generally used to store a machine reading program 10 based on transformer and lstm installed in the electronic device 1 and the like.
  • the memory 11 can also be used to temporarily store data that has been output or will be output.
  • the processor 12 may be a central processing unit (CPU), a microprocessor or other data processing chip, which is used to run the program code or process data stored in the memory 11, for example, based on transformer and lstm The machine reader program 10 and so on.
  • CPU central processing unit
  • microprocessor or other data processing chip, which is used to run the program code or process data stored in the memory 11, for example, based on transformer and lstm The machine reader program 10 and so on.
  • the network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is usually used to establish a communication connection between the electronic device 1 and other electronic devices.
  • a standard wired interface and a wireless interface such as a WI-FI interface
  • the communication bus 15 is used to realize the connection and communication between these components.
  • FIG. 1 only shows the electronic device 1 with the components 11-15, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the electronic device 1 may also include a user interface, a display, a touch sensor, and a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
  • RF radio frequency
  • the memory 11 as a computer storage medium may include an operating system and a machine reading program 10 based on transformer and lstm; when the processor 12 executes the machine reading program 10 based on transformer and lstm To achieve the following steps:
  • the input vector processed by the highway nonlinear conversion layer is processed by lstm to obtain the text with local characteristics
  • the input vector processed by the highway nonlinear conversion layer is processed by the transformer to obtain the text with the overall characteristics. And fusing the text with the local feature and the text with the overall feature to form a text with the local feature and the overall feature;
  • Use lstm to process all the relevant information of the question and answer in the obtained text, and output the start probability and the end probability, and multiply the start probability and the end probability, and use the sentence with the highest probability after the multiplication as the question in the text s answer.
  • the glove word vector training model formula is:
  • P ij is the co-occurrence probability
  • ⁇ i , ⁇ j are word vectors
  • f is the weight function
  • the field is embedded as:
  • C ⁇ represents any embedded character vector whose dimension satisfies B x D.
  • the processing of the acquired text with local features and overall features through the Bidirectional Attention Flow model and transformer includes the following steps:
  • t represents each piece of text
  • j represents each question
  • S tj matrix of t*j
  • + m represents addition by matrix multiplication
  • i in ti represents a subscript
  • the i-th word in the question, i in ji represents the attention weight value of the i-th word in the question in the text;
  • H t A matrix G with a dimension of t*4d is obtained by fusion through the G function, where the matrix G is all relevant information of the question and the answer in the fused text.
  • the lstm is used to process all the relevant information of the question and the answer in the obtained text, and output the start probability and the end probability, and multiply the start probability and the end probability, and the sentence with the highest probability after the multiplication
  • the answer to the question in the text includes the following steps:
  • the electronic device 1 proposed in the above embodiment constructs a device to read the network structure through transformer and lstm.
  • the network structure local information in the text is obtained through lstm, and the overall information in the text is obtained through the transformer. Therefore, the construction of this application The reader reads the network structure and solves the current problem that the overall relevance and partial relevance of sentences cannot be obtained at the same time.
  • the machine reading program 10 based on the transformer and lstm may also be divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by the processor 12 to complete the application.
  • the module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
  • FIG. 2 it is a program module diagram of a preferred embodiment of the machine reading program 10 based on transformer and lstm in FIG. 1.
  • the machine reading program 10 based on transformer and lstm can be divided into: word vector acquisition module 110, field embedding acquisition module 120, input vector formation module 130, local feature and overall feature acquisition module 140, associated information acquisition module 150, and answers Get the module 160.
  • the functions or operation steps implemented by the modules 110-160 are all similar to the above, and will not be described in detail here. Illustratively, for example, where:
  • the word vector acquisition module 110 is configured to use the glove word vector training model to pre-train all the words in the text to be processed, and to obtain the word vectors of all the words in the text to be processed that are mapped in the same vector space;
  • the field embedding obtaining module 120 is used to process the word vectors of all the words obtained by using a character-level convolutional neural network to obtain field embeddings;
  • the input vector forming module 130 is configured to splice the word vector and the field embedding to form an input vector, and process the input vector through a highway nonlinear conversion layer;
  • the local feature and global feature acquisition module 140 is used to process the input vector processed by the highway nonlinear conversion layer through lstm to acquire the text of the local feature, and use the transformer to process the input vector processed through the highway nonlinear conversion layer. Processing the input vector to obtain the text of the overall feature, and fusing the text of the local feature and the text of the overall feature to form the text with the local feature and the overall feature;
  • the associated information acquisition module 150 is configured to process the acquired text with local features and overall features through the Bidirectional Attention Flow model and transformer, and acquire all associated information about the questions and answers in the text;
  • the answer obtaining module 160 is configured to use lstm to process all the related information of the question and the answer in the obtained text, and output the start probability and the end probability, and multiply the start probability and the end probability, and the multiplied probability is the highest One sentence of as the answer to the question in the text.
  • this application also provides a machine reading method.
  • FIG. 3 it is a flowchart of a preferred embodiment of a machine reading method based on transformer and lstm of this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the machine reading method includes: step S10-step S60.
  • Step S10 Use the glove word vector training model to pre-train all the words in the text to be processed, and obtain the word vectors of all the words in the text to be processed that are mapped in the same vector space;
  • Step S20 Use the character-level convolutional neural network to process the word vectors of all the words obtained, and obtain the field embeddings;
  • Step S30 Splicing the word vector and the field embedding to form an input vector, and processing the input vector through a highway nonlinear conversion layer;
  • Step S40 Process the input vector processed by the highway nonlinear conversion layer through lstm to obtain text with local features, and process the input vector processed through the highway nonlinear conversion layer through a transformer to obtain overall features , And merge the text with the local features and the text with the overall features to form a text with local features and overall features;
  • Step S50 Use the Bidirectional Attention Flow model and transformer to process the obtained text with local and overall features, and obtain all relevant information about the question and the answer in the text;
  • Step S60 Use lstm to process all related information of the question and answer in the obtained text, and output the start probability and the end probability, and multiply the start probability and the end probability, and take the sentence with the highest probability after the multiplication as The answer to the question in the text.
  • step S10 using the training glove word vector for the word model vectors of all the words obtained pretraining word-embedding (word vectorization), W e ⁇ B ⁇ H, where B is the number of words, H is a vector embedding dimension size; W e represents a vector of words of the word, the word vector dimension is B x H.
  • the word vector model based on statistics and the word vector model based on prediction.
  • the former is represented by the LSA model based on SVD decomposition technology, but the semantic vector obtained by this type of model is often difficult to grasp the linear relationship between words (such as the famous King, Queen, Man, Man equations).
  • the latter is represented by the Skip-gram model based on neural networks, which obtains the embedding word vector by predicting the probability of a word appearing in the context.
  • the disadvantage of this type of model lies in its insufficient use of statistical information, and the training time is closely related to the size of the corpus, so the performance on many tasks is slightly better than the SVD model.
  • the Skip-gram model can be used to dig out the reasons behind the linear relationship between words, and then by constructing similar conditions on the co-occurrence matrix, a word based on global information is obtained.
  • Vector model-glove word vector training model the formula of glove word vector training model is:
  • P ij is the co-occurrence probability
  • ⁇ i , ⁇ j are word vectors
  • f is the weight function
  • char-CNN character-level convolutional neural network
  • Char-embedding is embedding based on the letters of each word. After getting all letter vectors of a word, the word vector of this word is obtained by weighted average.
  • step S30 the input vector is spliced, and the input vector is spliced by the word vector at the front position plus the context word vector at the rear position. It serves as an input to the model.
  • step S20 the char-embedding obtained in step S20 and the pre-training glove word-embedding spliced to produce a contextual embedding, Cont e ⁇ B ⁇ ( H + D).
  • Another highway layer is used to process the contextual embedding to prevent its gradient from exploding or disappearing.
  • the highway layer is a nonlinear conversion layer, which is used to avoid the gradient explosion and disappearance of the gradient after the input weight is updated.
  • the model structure is optimized, and the highway nonlinear conversion layer is connected after the contextual embedding layer to avoid the problem of gradient disappearance or gradient explosion in the BP process. But the classic model bidaf structure of the subsequent access machine reading is the same.
  • step S40 a lstm and a transformer are used to encode contextual embedding at the same time, and then they are spliced together.
  • lstm performs a local feature extraction
  • the transformer performs an overall feature extraction, so one is combined
  • the contextual embedding of local features and overall features is completed.
  • the contextual embedding for the content is H_t (t*d matrix)
  • the contextual embedding for the problem is U_j (j*d matrix).
  • the local feature is that the question sentence uses three-dimensional feature vectors to strengthen the question sentence itself.
  • the overall feature is used to extract the characterization between the question sentence and the original text.
  • step S50 the question and answer sentence vectors in the text are cross-characterized. Improve the feature strength of the keywords in the sentence in the sentence vector.
  • BiDAF model is a classic reading comprehension model.
  • the biggest feature of the BiDAF model is the introduction of a two-way attention mechanism in the interaction layer, which calculates Query2Context and Context2Query attention, and is based on attention Calculate the original text representation of query-aware.
  • the first step is to process the questions and answers in the text through the Bidirectional Attention Flow model.
  • the specific formula is as follows:
  • t represents each piece of text
  • j represents each question
  • S tj matrix of t*j
  • + m represents addition according to matrix multiplication
  • i in ti represents the subscript problem
  • the i-th word in ji, i in ji represents the attention weight value of the i-th word in the question in the text.
  • Step 3 Perform a weighted average of the questions, It is a matrix of t*d.
  • Change the preceding H t Use a G function to fuse to obtain a matrix of dimension t*4d, which contains all the relevant information that combines the problem-to-content and content-to-problem. Where for It needs to be encoded with lstm.
  • the input of this layer is the original text H and the question U vector
  • the output is the query-aware vector of context words and the contextual-embeddings passed down from the previous layer.
  • step one Do ‘attention’ in both context-to-query and query-to-context directions, and first calculate the similarity matrix S;
  • Step 2 Re-normalize and calculate the attention score at, and perform a weighted average according to the obtained at.
  • the attention vector at each moment is related to the embedding of the previous layer, and can flow to the subsequent network layer.
  • This design scheme can alleviate the lack of information due to premature generalization.
  • Step 3 Use the extracted H from the characterization and the U obtained from the weighted calculation to concatenate to obtain G.
  • Each column vector in G can be regarded as a query-aware representation of each contex word.
  • step S60 after passing the F matrix through a transformer layer, the start probability output is performed through an lstm, and then the end probability position is output according to the start probability and the result of the previous layer lstm. Finally, multiply the start probability and the end probability, and take the sentence with the highest probability as the answer.
  • the model structure of the decoder decoding layer is upgraded. First, use a single-layer two-way LSTM and then perform softmax to get the start probability and the end probability. Then output the ending probability position according to the start probability and the result of the previous layer lstm. Finally, multiply the start probability and the end probability, and take the sentence with the highest probability as the answer.
  • step S60 also includes the following steps:
  • step S50 input the attention matrix G obtained in step S50 into a bidirectional lstm to obtain a matrix M that captures the relationship between contexwords under a given query.
  • Step 2 Decoder layer, the process of the second step is as follows:
  • the input parameter G of the decoder layer is the query-aware representation result of the word in the context), which is spliced with the query-contextword matrix obtained in the above steps as the first input of the decoder layer;
  • Step 2 Input the spliced matrix into one-way lstm, and then do softmax on the result. This step is to get the maximum probability P1 of the starting position of the answer in the answer text;
  • the third step Then use the position of the maximum probability and the G matrix obtained from S150 and the M matrix obtained from S161 as input parameters, and put them into the new one-way lstm layer to find the end position of the answer; the reason for the one-way use is machine translation
  • the search for answers should be in line with human reading habits, searching in order from front to back.
  • the fourth step output P2 to the end probability position according to the start probability and the result of the previous layer lstm;
  • Step 5 Finally, multiply the start probability and the end probability, and take the sentence with the highest probability as the answer.
  • the output layer is oriented to specific tasks, so it can be modified according to specific tasks.
  • the specific formula is as follows:
  • the one-way LTSM structure is used to characterize and integrate the sentence vectors output by the decoder, and what is obtained is the strength of the influence of each word in the text on the question (probability related to the answer) and then the maximum probability (answer correlation) is obtained through softmax The highest word) is used as the probability that the answer starts from the word in the text.
  • the end probability, start probability and output position probability are generated.
  • Supervised learning is carried out through the marked data set, so that the model learns to find the position of the answer in the text of the question.
  • lstm and transformer are used to jointly encode contextual embedding at the beginning, and after bidirectional attention processing, a transformer is used to fuse all the information, and then lstm is used to output the start and end probabilities. . Therefore, the final output not only considers the relevance of the text, but also considers the relevance of the text as a whole.
  • the machine reading method proposed in the above embodiment constructs a network structure for reading through a transformer and lstm.
  • the network structure local information in the text is obtained through lstm, and the overall information in the text is obtained through the transformer. Therefore, the construction of this application The reader reads the network structure and solves the current problem that the overall relevance and partial relevance of sentences cannot be obtained at the same time.
  • this application also provides a machine reading system whose logical structure is similar to that of the aforementioned electronic device based on the transformer and lstm machine reading program 10 (shown in Figure 2).
  • the word vector The functions or operation steps implemented by the acquisition module 110, the field embedding acquisition module 120, the input vector formation module 130, the local feature and overall feature acquisition module 140, the associated information acquisition module 150, and the answer acquisition module 160 are the same as those implemented by the machine reading of this embodiment.
  • the logical structure of the system is similar. For example:
  • the word vector acquisition module is used to use the glove word vector training model to pre-train all the words in the text to be processed, and to obtain the word vectors of all words in the text to be processed that are mapped in the same vector space;
  • the field embedding acquisition module is used to process the word vectors of all words acquired by the word vector acquisition module by using a character-level convolutional neural network to obtain field embeddings;
  • the input vector forming module is used for splicing the word vector obtained by the word vector obtaining module and the field embedding obtained by the field embedding obtaining module to form an input vector, and processing the input vector through the highway nonlinear conversion layer;
  • the local feature and global feature acquisition module is used to process the input vector processed through the highway nonlinear conversion layer through lstm, obtain the text of the local feature, and process the input vector processed through the highway nonlinear conversion layer through the transformer, Acquiring the text with the overall feature, and fusing the text with the local feature and the text with the overall feature to form a text with the local feature and the overall feature;
  • the associated information acquisition module is used to process the acquired text with local features and overall features through the Bidirectional Attention Flow model and transformer, and obtain all relevant information about the questions and answers in the text;
  • the answer obtaining module is used to use lstm to process all the related information of the question and answer in the obtained text, and output the start probability and the end probability, and multiply the start probability and the end probability, and the highest probability after the multiplication
  • One sentence is the answer to the question in the text.
  • the machine reading system of this embodiment may also include a glove word vector training model acquisition module (not shown in the figure).
  • the glove word vector training model acquisition module uses the Skip-gram model to mine the linear relationship between words. The reason behind; then according to the reason behind the linear relationship between words, by constructing similar conditions on the co-occurrence matrix, a word vector model based on global information—glove word vector training model is obtained.
  • the glove word vector training model formula is:
  • P ij is the co-occurrence probability
  • ⁇ i , ⁇ j are word vectors
  • f is the weight function
  • char-CNN character-level convolutional neural network
  • Char-embedding is embedding based on the letters of each word. After getting all letter vectors of a word, the word vector of this word is obtained by weighted average.
  • the local feature is that the question sentence uses three-dimensional feature vectors to strengthen the question sentence itself.
  • the overall feature is used to extract the characterization between the question sentence and the original text.
  • Local features and overall features are spliced together by weighted averaging or in series.
  • the local feature and overall feature acquisition module can well extract the contextual relationship of the text through lstm, and the transformer can extract the overall relevance of the sentence.
  • the associated information acquisition module may also include the following components (not shown in the figure):
  • the preprocessing unit is used to process the questions and answers in the text through the Bidirectional Attention Flow model, and its formula is
  • t represents each piece of text
  • j represents each question
  • S tj matrix of t*j
  • + m represents addition by matrix multiplication
  • i in ti represents a subscript
  • the i-th word in the question, i in ji represents the attention weight value of the i-th word in the question in the text;
  • the weight processing unit is used to calculate the weight of each question word in each answer.
  • the weighted average unit is used to perform weighted average processing on the problem. Its formula is: Is a matrix of t*d;
  • Fusion unit for H t A matrix G with a dimension of t*4d is obtained by fusion through the G function, where the matrix G is all relevant information of the question and the answer in the fused text.
  • the answer obtaining module may further include: a relation matrix obtaining unit, configured to input the obtained matrix G into the bidirectional lstm to obtain the relationship between the words under the question in the text Matrix M; a splicing unit, used to splice the context information representation with the matrix M to obtain a spliced matrix; a start probability acquisition unit, used to input the acquired splicing matrix into the first unidirectional lstm, and compare the first one-way lstm The result of a one-way lstm processing is subjected to softmax processing to obtain the start probability of the answer in the text; the end probability acquisition unit is used to input the start probability, the matrix G, and the matrix M as input parameters to the second One-way lstm processes to obtain the end probability of the answer in the text; the integration unit is used to multiply the start probability and the end probability, and according to the result of the multiplication, the sentence with the highest probability is used as the answer
  • this application also proposes a computer-readable storage medium, the computer-readable storage medium includes a machine reading program based on transformer and lstm, the machine reading program based on transformer and lstm and the above-mentioned second embodiment based on transformer Similar to the machine reading program 10 of lstm, when executed by the processor, it can realize the steps of the machine reading method as described and the operation of the machine reading system as described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种机器阅读方法、系统、装置以及可存储介质,其中的方法包括:通过glove词向量训练模型获取词向量;采用字符级卷积神经网络获取字段嵌入;将词向量和字段嵌入进行拼接形成输入向量;通过lstm和transformer对输入向量进行特征的提取,获取具有局部特征和整体特征的文本;通过Bidirectional Attention Flow模型以及transformer对获取的文本进行处理,获取文本中问题与答案所有关联信息;通过lstm对获取的文本中问题与答案所有关联信息进行处理,将概率最高的一句话作为文本中问题的答案。该方法通过将transformer和lstm模型相互结合形成的新的机器阅读网络结构,解决目前不能同时获取句子的整体相关性和局部相关性的问题。

Description

机器阅读方法、系统、装置及存储介质
本申请要求申请号为201911037790.X,申请日为2019年10月29日,发明创造名称为“基于transformer和lstm的机器阅读方法、电子装置及可读存储介质”的专利申请的优先权。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种机器阅读方法、系统、电子装置及存储介质。
背景技术
机器阅读是自然语言处理的一个分支,主要的作用是根据用户提出的问题和文本,带着问题去文本中寻找答案。目前机器阅读的技术从最初的根据机器翻译的seq2seq到Bidaf、mlstm和r-net等依靠RNN为基础的模型到依靠transformer的QAnet和BERT模型,都为机器阅读做出了巨大的贡献。
目前著名的机器阅读的数据集,英文的有斯坦福大学的SQuAD和微软的MS MARCO,中文的有百度的dureader。总的来说大部分技术的研究是基于SQuAD的数据集。目前流行的Bidaf、QAnet和BERT都是在SQuAD数据集上取得了巨大的进展,其中,目前的机器阅读模型都是基于RNN,如mlstm和bidaf或者transformer框架等等,申请人意识到,虽然有的模型可以体现文本的上下文关系,有的模型可以提取句子的整体相关性,但是目前还没有一种方法,能够同时获取句子的整体性相关性和局部相关性。
为了解决上述问题,亟需一种可以同时让句子获得整体相关性和局部相关性的方法。
发明内容
本申请提供一种机器阅读方法、系统、电子装置及计算机可读存储介质,其主要目的在于通过将transformer和lstm模型相互结合形成的新的机器阅读网络结构,解决目前不能同时获取句子的整体相关性和局部相关性的问题。
为实现上述目的,本申请提供一种机器阅读方法,包括:
采用glove词向量训练模型对待处理文本中所有的单词进行预训练,获取映射在同一向量空间中的待处理文本中所有单词的词向量;
采用字符级卷积神经网络对获取的所有单词的词向量进行处理,获取字段嵌入;
将所述词向量和所述字段嵌入进行拼接,形成输入向量,并通过highway非线性转换层对所述输入向量进行处理;
通过lstm对通过所述highway非线性转换层处理过的输入向量进行处理,获取局部特征的文本,通过transformer对通过所述highway非线性转换层处理过的输入向量进行处理,获取整体特征的文本,并对所述局部特征的文本与所述整体特征的文本进行融合形成具有局部特征和整体特征的文本;
通过Bidirectional Attention Flow模型以及transformer对获取的具有局部特征和整体特征的文本进行处理,获取文本中问题与答案所有关联信息;
采用lstm对获取的文本中问题与答案所有关联信息进行处理,并输出开始概率和结束概率,并将所述开始概率和结束概率相乘,并将相乘后概率最高的一句话作为文本中问题的答案。
相应的,本申请还提供一种机器阅读系统,包括:
词向量获取模块,用于采用glove词向量训练模型对待处理文本中所有的单词进行预训练,获取映射在同一向量空间中的待处理文本中所有单词的词向量;
字段嵌入获取模块,用于采用字符级卷积神经网络对获取的所有单词的词向量进行处理,获取字段嵌入;
输入向量形成模块,用于将所述词向量和所述字段嵌入进行拼接,形成输入向量,并通过highway非线性转换层对所述输入向量进行处理;
局部特征和整体特征获取模块,用于通过lstm对通过所述highway非线性转换层处理过的输入向量进行处理,获取局部特征的文本,通过transformer对通过所述highway非线性转换层处理过的输入向量进行处理,获取整体特征的文本,并对所述局部特征的文本与所述整体特征的文本进行融合形成具有局部特征和整体特征的文本;
关联信息获取模块,用于通过Bidirectional Attention Flow模型以及transformer对获取的具有局部特征和整体特征的文本进行处理,获取文本中问题与答案所有关联信息;
答案获取模块,用于采用lstm对获取的文本中问题与答案所有关联信息进行处理,并输出开始概率和结束概率,并将所述开始概率和结束概率相乘,并将相乘后概率最高的一句话作为文本中问题的答案。
此外,为实现上述目的,本申请还提供一种电子装置,该电子装置包括:存储器、处理器,所述存储器中包括基于transformer和lstm的机器阅读程序,所述基于transformer和lstm的机器阅读程序被所述处理器执行时实现如下步骤:
采用glove词向量训练模型对待处理文本中所有的单词进行预训练,获取映射在同一向量空间中的待处理文本中所有单词的词向量;
采用字符级卷积神经网络对获取的所有单词的词向量进行处理,获取字段嵌入;
将所述词向量和所述字段嵌入进行拼接,形成输入向量,并通过highway非线性转换层对所述输入向量进行处理;
通过lstm对通过所述highway非线性转换层处理过的输入向量进行处理,获取局部特征的文本,通过transformer对通过所述highway非线性转换层处理过的输入向量进行处理,获取整体特征的文本,并对所述局部特征的文本与所述整体特征的文本进行融合形成;
通过Bidirectional Attention Flow模型以及transformer对获取的具有局部特征和整体特征的文本进行处理,获取文本中问题与答案所有关联信息;
采用lstm对获取的文本中问题与答案所有关联信息进行处理,并输出开始概率和结束概率,并将所述开始概率和结束概率相乘,并将相乘后概率最高的一句话作为文本中问题的答案。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,该计算机可读存储介质中包括基于transformer和lstm的机器阅读程序,所述基于transformer和lstm的机器阅读程序被处理器执行时,实现如上所述的机器阅读方法中的任意步骤。
本申请提出的机器阅读方法、系统、电子装置及存储介质,通过transformer和lstm构建一个器阅读网络结构,通过lstm获取文本中的局部信息,通过transformer获取文本中的整体信息,因此,本申请的构建的器阅读网络结构,解决目前不能同时获取句子的整体相关性和局部相关性的问题。
附图说明
图1为本申请的机器阅读方法较佳实施例的应用环境示意图;
图2为图1中基于transformer和lstm的机器阅读程序较佳实施例的模块示意图;
图3为本申请的机器阅读方法较佳实施例的流程图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
实施例一
本申请提供一种机器阅读方法,应用于一种电子装置1。参照图1所示,为本申请的机器阅读方法较佳实施例的应用环境示意图。
在本实施例中,电子装置1可以是服务器、智能手机、平板电脑、便携计算机、桌上型计算机等具有运算功能的终端设备。
该电子装置1包括:处理器12、存储器11、网络接口14及通信总线15。
存储器11包括至少一种类型的可读存储介质。在本实施例中,所述存储器11的可读存储介质通常用于存储安装于所述电子装置1的基于transformer和lstm的机器阅读程序10等。所述存储器11还可以用于暂时地存储已经输出或者将要输出的数据。
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如基于transformer和lstm的机器阅读程序10等。
网络接口14可选地可以包括标准的有线接口、无线接口(如WI-FI接口), 通常用于在该电子装置1与其他电子设备之间建立通信连接。
通信总线15用于实现这些组件之间的连接通信。
图1仅示出了具有组件11-15的电子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
可选地,该电子装置1还可以包括用户接口、显示器、触摸传感器以及射频(Radio Frequency,RF)电路,传感器、音频电路等等,在此不再赘述。
在图1所示的装置实施例中,作为一种计算机存储介质的存储器11中可以包括操作系统以及基于transformer和lstm的机器阅读程序10;处理器12执行基于transformer和lstm的机器阅读程序10时实现如下步骤:
采用glove词向量训练模型对待处理文本中所有的单词进行预训练,获取映射在同一向量空间中的待处理文本中所有单词的词向量;
采用字符级卷积神经网络对获取的所有单词的词向量进行处理,获取字段嵌入;
将所述词向量和所述字段嵌入进行拼接,形成输入向量,并通过highway非线性转换层对所述输入向量进行处理;
通过lstm对通过所述highway非线性转换层处理过的输入向量进行处理,获取局部特征的文本,通过transformer对通过所述highway非线性转换层处理过的输入向量进行处理,获取整体特征的文本,并对所述局部特征的文本与所述整体特征的文本进行融合形成具有局部特征和整体特征的文本;
通过Bidirectional Attention Flow模型以及transformer对获取的具有局部特征和整体特征的文本进行处理,获取文本中问题与答案所有关联信息;
采用lstm对获取的文本中问题与答案所有关联信息进行处理,并输出开始概率和结束概率,并将所述开始概率和结束概率相乘,并将相乘后概率最高的一句话作为文本中问题的答案。
优选地,所述glove词向量训练模型公式为:
Figure PCTCN2019118501-appb-000001
其中,P ij为共现概率;υ i、υ j为词向量;f为权重函数。
优选地,所述字段嵌入为:
C θ∈B×D
其中,C θ表示任意一个经过embedded的字符向量,其维度满足B x D。
优选地,所述通过Bidirectional Attention Flow模型以及transformer对获取的具有局部特征和整体特征的文本进行处理包括如下步骤:
通过所述Bidirectional Attention Flow模型对所述文本中的问题和答案进行处理,其公式如下:
Figure PCTCN2019118501-appb-000002
其中,t代表每一段文本,j代表每一个问题,S tj(t*j的矩阵)代表t文本和j问题的相关度,+ m表示按照矩阵乘法的方式进行加法,ti中i表示下标问题中的第i个单词,ji中i表示文本中对于问题中的第i个单词的注意力权重值;
计算出每一个问题的字在每一个答案中的权重,其公式如下:
a t=softmax(S t:)
对所述问题进行加权平均处理,其公式为:
Figure PCTCN2019118501-appb-000003
是一个t*d的矩阵;
将H t:
Figure PCTCN2019118501-appb-000004
通过G函数进行融合获取维度为t*4d的矩阵G,其中,所述矩阵G为融合文本中问题与答案所有关联信息。
优选地,所述采用lstm对获取的文本中问题与答案所有关联信息进行处理,并输出开始概率和结束概率,并将所述开始概率和结束概率相乘,并将相乘后概率最高的一句话作为文本中问题的答案包括如下步骤:
将获取的矩阵G输入双向lstm,得到所述文本中问题下的单词之间的关系的矩阵M;
将上下文信息表征与所述矩阵M进行拼接,获取到拼接矩阵;
将获取到的拼接矩阵输入第一单向lstm,并对所述第一单向lstm处理后的结果进行softmax处理,获取文本中答案的开始概率;
将所述开始概率、所述矩阵G以及所述矩阵M作为输入参数,输入到第二单向lstm进行处理,获取文本中答案的结束概率;
将所述开始概率和所述结束概率相乘,根据相乘的结果,将概率最高的那一句话作为答案。
上述实施例提出的电子装置1,通过transformer和lstm构建一个器阅读 网络结构,在网络结构中,通过lstm获取文本中的局部信息,通过transformer获取文本中的整体信息,因此,本申请的构建的器阅读网络结构,解决目前不能同时获取句子的整体相关性和局部相关性的问题。
实施例二
在其他实施例中,基于transformer和lstm的机器阅读程序10还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器11中,并由处理器12执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。参照图2所示,为图1中基于transformer和lstm的机器阅读程序10较佳实施例的程序模块图。所述基于transformer和lstm的机器阅读程序10可以被分割为:词向量获取模块110、字段嵌入获取模块120、输入向量形成模块130、局部特征和整体特征获取模块140、关联信息获取模块150和答案获取模块160。所述模块110-160所实现的功能或操作步骤均与上文类似,此处不再详述,示例性地,例如其中:
词向量获取模块110,用于采用glove词向量训练模型对待处理文本中所有的单词进行预训练,获取映射在同一向量空间中的待处理文本中所有单词的词向量;
字段嵌入获取模块120,用于采用字符级卷积神经网络对获取的所有单词的词向量进行处理,获取字段嵌入;
输入向量形成模块130,用于将所述词向量和所述字段嵌入进行拼接,形成输入向量,并通过highway非线性转换层对所述输入向量进行处理;
局部特征和整体特征获取模块140,用于通过lstm对通过所述highway非线性转换层处理过的输入向量进行处理,获取局部特征的文本,通过transformer对通过所述highway非线性转换层处理过的输入向量进行处理,获取整体特征的文本,并对所述局部特征的文本与所述整体特征的文本进行融合形成具有局部特征和整体特征的文本;
关联信息获取模块150,用于通过Bidirectional Attention Flow模型以及transformer对获取的具有局部特征和整体特征的文本进行处理,获取文本中问题与答案所有关联信息;
答案获取模块160,用于采用lstm对获取的文本中问题与答案所有关联信息进行处理,并输出开始概率和结束概率,并将所述开始概率和结束概率 相乘,并将相乘后概率最高的一句话作为文本中问题的答案。
实施例三
此外,本申请还提供一种机器阅读方法。参照图3所示,为本申请基于transformer和lstm的机器阅读方法较佳实施例的流程图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。
在本实施例中,机器阅读方法包括:步骤S10-步骤S60。
步骤S10:采用glove词向量训练模型对待处理文本中所有的单词进行预训练,获取映射在同一向量空间中的待处理文本中所有单词的词向量;
步骤S20:采用字符级卷积神经网络对获取的所有单词的词向量进行处理,获取字段嵌入;
步骤S30:将所述词向量和所述字段嵌入进行拼接,形成输入向量,并通过highway非线性转换层对所述输入向量进行处理;
步骤S40:通过lstm对通过所述highway非线性转换层处理过的输入向量进行处理,获取局部特征的文本,通过transformer对通过所述highway非线性转换层处理过的输入向量进行处理,获取整体特征的文本,并对所述局部特征的文本与所述整体特征的文本进行融合形成具有局部特征和整体特征的文本;
步骤S50:通过Bidirectional Attention Flow模型以及transformer对获取的具有局部特征和整体特征的文本进行处理,获取文本中问题与答案所有关联信息;
步骤S60:采用lstm对获取的文本中问题与答案所有关联信息进行处理,并输出开始概率和结束概率,并将所述开始概率和结束概率相乘,并将相乘后概率最高的一句话作为文本中问题的答案。
在步骤S10中,采用glove词向量训练模型对词向量进行预训练得到所有单词的word-embedding(单词向量化),W e∈B×H,这里B为单词个数,H为embedding向量的维度大小;W e表示一个单词的词向量,此词向量的维度是B x H。
具体地,基于统计的词向量模型和基于预测的词向量模型。前者以基于SVD分解技术的LSA模型为代表,但是这类模型得到的语义向量往往很难把握词与词之间的线性关系(例如著名的King、Queen、Man、Woman等式)。 后者则以基于神经网络的Skip-gram模型为代表,通过预测一个词出现在上下文里的概率得到embedding词向量。这类模型的缺陷在于其对统计信息的利用不充分,训练时间与语料大小息息相关,因此在很多任务上的表现都要略优于SVD模型。
其中,对于glove词向量训练模型来说,首先利用Skip-gram模型能够挖掘出词与词之间线性关系的背后成因,然后通过在共现矩阵上构造相似的条件,得到一个基于全局信息的词向量模型——glove词向量训练模型,glove词向量训练模型公式为:
Figure PCTCN2019118501-appb-000005
其中,P ij为共现概率;υ i、υ j为词向量;f为权重函数。
在步骤S20中,将glove词向量训练模型预训练的字向量用char-CNN(字符级卷积神经网络)进行处理,这里选择了一个尺寸为:[H=5,W=一个字的embedding的维度,OC=64]的filter,这里H为filter的高度,W为一个字embedding的维度,OC为输出通道,最后得到的char-embedding,C e∈B×D,其中,C e表示任意一个经过embedded(嵌入处理)的字符向量,其维度满足B x D。
其中,在本申请的实施例中,使用了两个颗粒度的嵌入操作:
1.直接在单词维度计算的word embedding词向量;
2.char-embedding是基于每个单词的字母进行embedding,在得到一个单词的所有字母向量后加权平均得到这个单词的词向量。
在步骤S30中,对输入向量进行拼接,输入向量由前置位置的词向量加上后置位置语境词向量拼接而成。其作为模型的输入。
具体地,将步骤S20得到的char-embedding和glove预训练的word-embedding进行拼接产生一个contextual embedding,Cont e∈B×(H+D)。再用一个highway层对contextual embedding进行处理防止其梯度爆炸或者梯度消失。
highway层就是非线性转换层,用于避免输入权重更新后的梯度爆炸和梯 度消失的情况。模型结构优化,在contextual embedding层后接入highway非线性转换层避免在BP过程中的梯度消失或梯度爆炸的问题。但是后续接入的机器阅读的经典模型bidaf结构是一样的。
在步骤S40中,同时用一个lstm和一个transformer对contextual embedding进行编码,再将其拼接起来,也就说,lstm进行了一个局部的特征提取,transformer进行了一个整体的特征提取,所以一个融合了局部特征和整体特征的contextual embedding就完成了,针对内容的contextual embedding是H_t(t*d的矩阵),针对问题的contextual embedding是U_j(j*d的矩阵)。
在本申请的实施例中,局部特征是问句利用三个维度的特征向量对问句本身进行一个特征强化。整体特征是用于问句和原文之间的表征提取。
此外,局部特征和整体特征通过加权求平均的方式或者串联的方式拼接在一起。在此步骤中,通过lstm能够很好的提取文本的上下文关系,transformer可以提取句子的整体相关性。
在步骤S50中,文本中的问题与答案句子向量经过交叉表征。提升句中关键词在句子向量中的特征强度。
其中,Bidirectional AttentionFlow模型,简称为:BiDAF模型,是一种经典的阅读理解模型,BiDAF模型最大的特点是在interaction层引入了双向注意力机制,计算Query2Context和Context2Query两种注意力,并基于注意力计算query-aware的原文表示。具体地,第一步:通过所述Bidirectional Attention Flow模型对所述文本中的问题和答案进行处理,具体公式如下,
Figure PCTCN2019118501-appb-000006
t代表每一段文本,j代表每一个问题,S tj(t*j的矩阵)代表t文本和j问题的相关度,+ m表示按照矩阵乘法的方式进行加法,ti中i表示下标问题中的第i个单词,ji中i表示文本中对于问题中的第i个单词的注意力权重值。
第二步:a t=softmax(S t:)求出每一个问题的字在每一个答案中的权重,
第三步:对问题进行加权平均,
Figure PCTCN2019118501-appb-000007
它是一个t*d的矩阵。
第四步:再选择出对于内容中每个词,问题哪个词最重要所以有了b=softmax(max row(S)),
Figure PCTCN2019118501-appb-000008
它是一个问题对内容的向量为d的attention,这意味着,对于问题来说,它已经抓住了内容中对于它最重要的词,再将
Figure PCTCN2019118501-appb-000009
复制扩大为一个
Figure PCTCN2019118501-appb-000010
的矩阵。将前面的H t:
Figure PCTCN2019118501-appb-000011
用一个G函数 进行融合得到一个维度为t*4d的矩阵,此矩阵包含了融合了问题对内容和内容对问题的所有关联信息。其中对于
Figure PCTCN2019118501-appb-000012
需要对其进行一个lstm编码。
也就是说,该层的输入是原文H和问句U向量,输出是context words的query-aware vector,以及上一层传下来的contextual-embeddings。
具体地,步骤一:做context-to-query以及query-to-context两个方向的‘attention’,先计算相似度矩阵S;
步骤二:再归一化计算attention分数at,根据得到的at进行加权平均。
也就是说,每个时刻的注意力向量都与其之前层的嵌入相关,且都可以流向之后的网络层。这种设计方案可以减缓由于过早归纳总结而导致的信息缺失。
步骤三:使用表征提取后的H和加权计算得到的U拼接起来得到G。G中每个列向量可以视为每个contex word的query-aware表征。
在步骤S60中,将F矩阵通过一个transformer层之后通过一个lstm进行开始概率输出,再根据开始概率和前层lstm结果对结束概率位置输出。最后将开始概率和结束概率相乘,取出概率最高的那一句话作为答案。
具体地,对decoder解码层的模型结构进行升级。首先使用单层双向LSTM后进行softmax得到开始概率和结束概率。再根据开始概率和前层lstm结果对结束概率位置输出。最后将开始概率和结束概率相乘,取出概率最高的那一句话作为答案。
其中,步骤S60还包括如下步骤:
步骤一:建模层
首先将步骤S50中得到的注意力矩阵G输入一个双向lstm得到一个捕捉的是在给定query下contexwords之间的关系的矩阵M。
步骤二:decoder层,其中第二步的过程如下:
第一步:decoder层的输入参数G即context中单词的query-aware representation结果),与上述步骤中得到的query-contextword矩阵进行拼接,作为decoder层的首次输入;
第二步:将拼接后的矩阵输入单向lstm,再对结果做softmax这一步是为了得到答案文本中答案的开始位置的最大概率的P1;
第三步:随后将最大概率的位置与从S150得到的G矩阵以及S161得到 的M矩阵作为输入参数,放入新的单向lstm层找到答案结束位置;使用的单向的原因是,机器翻译答案的搜寻应该符合人类阅读习惯,从前到后顺序寻找。
第四步:再根据开始概率和前层lstm结果对结束概率位置输出P2;
第五步:最后将开始概率和结束概率相乘,取出概率最高的那一句话作为答案。
其中,输出层是面向具体任务的,所以可以根据具体任务而做相应修改。预测开始位置p1和结束位置p2,具体公式如下:
Figure PCTCN2019118501-appb-000013
Figure PCTCN2019118501-appb-000014
也就是,使用单向LTSM结构对解码器输出的句子向量进行表征整合,得到的是文本中每一个单词对于问句的影响强度(与答案相关的概率)再通过softmax得到概率最大(答案相关性最高的单词)作为答案从该文本中词语开始的概率。同理生成结束概率,开始概率和输出位置概率,整个的训练原理是相同的,通过已标注好的数据集进行监督学习,让模型学会去寻找针对问句的文本中答案的位置。
在本申请的实施例中,使用单向lstm的原因有两个:
1.在略微降低准确率的情况下使得计算量(相对双层lstm)减半。
2.基于设计目的,是为了从文本中找到答案,因此更注重文本中的词语对问句的相关性(单向)。
综上所述,本申请提出的机器阅读方法中,最开始用lstm和transformer联合编码contextual embedding,再经过bidirectional attention的处理以后,用一个transformer去融合所有信息,再用lstm去输出开始和结束概率。所以,最后的输出不止考虑了文本局部的相关性,也考虑了文本整体的相关性。
上述实施例提出的机器阅读方法,通过transformer和lstm构建一个器阅读网络结构,在网络结构中,通过lstm获取文本中的局部信息,通过transformer获取文本中的整体信息,因此,本申请的构建的器阅读网络结构,解决目前不能同时获取句子的整体相关性和局部相关性的问题。
实施例四
与前述机器阅读方法相对应,本申请还提供一种机器阅读系统,其逻辑结构与前述电子装置中基于transformer和lstm的机器阅读程序10(如图2所示)的模块构成相类似,词向量获取模块110、字段嵌入获取模块120、输入向量形成模块130、局部特征和整体特征获取模块140、关联信息获取模块150和答案获取模块160所实现的功能或操作步骤均与本实施例的机器阅读系统的逻辑构成类似。例如其中:
词向量获取模块,用于采用glove词向量训练模型对待处理文本中所有的单词进行预训练,获取映射在同一向量空间中的待处理文本中所有单词的词向量;
字段嵌入获取模块,用于采用字符级卷积神经网络对词向量获取模块所获取的所有单词的词向量进行处理,获取字段嵌入;
输入向量形成模块,用于将词向量获取模块所获取的词向量和字段嵌入获取模块所获取的字段嵌入进行拼接,形成输入向量,并通过highway非线性转换层对该输入向量进行处理;
局部特征和整体特征获取模块,用于通过lstm对通过highway非线性转换层处理过的输入向量进行处理,获取局部特征的文本,通过transformer对通过highway非线性转换层处理过的输入向量进行处理,获取整体特征的文本,并对所述局部特征的文本与所述整体特征的文本进行融合形成具有局部特征和整体特征的文本;
关联信息获取模块,用于通过Bidirectional Attention Flow模型以及transformer对获取的具有局部特征和整体特征的文本进行处理,获取文本中问题与答案所有关联信息;
答案获取模块,用于采用lstm对获取的文本中问题与答案所有关联信息进行处理,并输出开始概率和结束概率,并将所述开始概率和结束概率相乘,并将相乘后概率最高的一句话作为文本中问题的答案。
此外,本实施例的机器阅读系统还可以包括glove词向量训练模型获取模块(图中未示出),该glove词向量训练模型获取模块利用Skip-gram模型挖掘出词与词之间线性关系的背后成因;然后根据词与词之间线性关系的背后成因,通过在共现矩阵上构造相似的条件,得到一个基于全局信息的词向量模型—glove词向量训练模型。
其中,glove词向量训练模型公式为:
Figure PCTCN2019118501-appb-000015
其中,P ij为共现概率;υ i、υ j为词向量;f为权重函数。
字段嵌入获取模块将glove预训练的字向量用char-CNN(字符级卷积神经网络)进行处理,这里选择了一个尺寸为:[H=5,W=一个字的embedding的维度,OC=64]的filter,这里H为filter的高度,W为一个字embedding的维度,OC为输出通道,最后得到的char-embedding,C e∈B×D,其中,C e表示任意一个经过embedded(嵌入处理)的字符向量,其维度满足B x D。
其中,在本发明的实施例中,使用了两个颗粒度的嵌入操作:
1.直接在单词维度计算的word embedding词向量;
2.char-embedding是基于每个单词的字母进行embedding,在得到一个单词的所有字母向量后加权平均得到这个单词的词向量。
在本申请的实施例中,局部特征是问句利用三个维度的特征向量对问句本身进行一个特征强化。整体特征是用于问句和原文之间的表征提取。局部特征和整体特征通过加权求平均的方式或者串联的方式拼接在一起。局部特征和整体特征获取模块通过lstm能够很好的提取文本的上下文关系,transformer可以提取句子的整体相关性。
在本申实施例的一个具体实施方式中,关联信息获取模块还可以包括如下组成部分(图中未示出):
预处理单元,用于通过所述Bidirectional Attention Flow模型对所述文本中的问题和答案进行处理,其公式为
Figure PCTCN2019118501-appb-000016
其中,t代表每一段文本,j代表每一个问题,S tj(t*j的矩阵)代表t文本和j问题的相关度,+ m表示按照矩阵乘法的方式进行加法,ti中i表示下标问题中的第i个单词,ji中i表示文本中对于问题中的第i个单词的注意力权重值;
权重处理单元,用于计算出每一个问题的字在每一个答案中的权重,其公式如下:a t=softmax(S t:);
加权平均单元,用于对所述问题进行加权平均处理,其公式为:
Figure PCTCN2019118501-appb-000017
Figure PCTCN2019118501-appb-000018
是一个t*d的矩阵;
融合单元,用于将H t:
Figure PCTCN2019118501-appb-000019
通过G函数进行融合获取维度为t*4d的矩阵G,其中,所述矩阵G为融合文本中问题与答案所有关联信息。
在本申请实施例的另一具体实施方式中,答案获取模块还可以包括:关系矩阵获取单元,用于将获取的矩阵G输入双向lstm,得到所述文本中问题下的单词之间的关系的矩阵M;拼接单元,用于将上下文信息表征与所述矩阵M进行拼接,获取到拼接矩阵;开始概率获取单元,用于将获取到的拼接矩阵输入第一单向lstm,并对所述第一单向lstm处理后的结果进行softmax处理,获取文本中答案的开始概率;结束概率获取单元,用于将所述开始概率、所述矩阵G以及所述矩阵M作为输入参数,输入到第二单向lstm进行处理,获取文本中答案的结束概率;整合单元,用于将所述开始概率和所述结束概率相乘,根据相乘的结果,将概率最高的那一句话作为答案。
应当明了,上述实施方式并非本实施例si的所有实施方式,本实施例四的具体实施方式与前述机器阅读方法、电子装置的具体实施方式大致相同,在此不再赘述。
实施例五
此外,本申请还提出一种计算机可读存储介质,所述计算机可读存储介质中包括基于transformer和lstm的机器阅读程序,该基于transformer和lstm的机器阅读程序和前述实施例二中的基于transformer和lstm的机器阅读程序10相同,被处理器执行时能够实现如所述的机器阅读方法的步骤以及如前所述的机器阅读系统的操作。
本申请之计算机可读存储介质的具体实施方式与上述机器阅读方法、系统、电子装置的具体实施方式大致相同,在此不再赘述。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种机器阅读方法,应用于电子装置,其特征在于,所述方法包括:
    采用glove词向量训练模型对待处理文本中所有的单词进行预训练,获取映射在同一向量空间中的待处理文本中所有单词的词向量;
    采用字符级卷积神经网络对获取的所有单词的词向量进行处理,获取字段嵌入;
    将所述词向量和所述字段嵌入进行拼接,形成输入向量,并通过highway非线性转换层对所述输入向量进行处理;
    通过lstm对通过所述highway非线性转换层处理过的输入向量进行处理,获取局部特征的文本,通过transformer对通过所述highway非线性转换层处理过的输入向量进行处理,获取整体特征的文本,并对所述局部特征的文本与所述整体特征的文本进行融合形成具有局部特征和整体特征的文本;
    通过Bidirectional Attention Flow模型以及transformer对获取的具有局部特征和整体特征的文本进行处理,获取文本中问题与答案所有关联信息;
    采用所述lstm对获取的文本中问题与答案所有关联信息进行处理,并输出开始概率和结束概率,并将所述开始概率和结束概率相乘,并将相乘后概率最高的一句话作为文本中问题的答案。
  2. 根据权利要求1所述的机器阅读方法,其特征在于,在采用glove词向量训练模型对待处理文本中所有的单词进行预训练之前,还包括:利用Skip-gram模型挖掘出词与词之间线性关系的背后成因;
    根据所述词与词之间线性关系的背后成因,通过在共现矩阵上构造相似的条件,得到所述glove词向量训练模型。
  3. 根据权利要求1所述的机器阅读方法,其特征在于,
    所述glove词向量训练模型公式为:
    Figure PCTCN2019118501-appb-100001
    其中,P ij为共现概率;υ i、υ j为词向量;f为权重函数。
  4. 根据权利要求1所述的机器阅读方法,其特征在于,所述字段嵌入为:
    C θ∈B×D
    其中,C θ表示任意一个经过嵌入处理的字符向量,其维度满足B x D。
  5. 根据权利要求4所述的机器阅读方法,其特征在于,所述字段嵌入包括如下两个颗粒度的嵌入操作:
    直接在单词维度计算的word embedding词向量;以及
    基于每个单词的字母进行embedding得到char-embedding词向量,在得到一个单词的所有字母向量后加权平均得到所述单词的词向量。
  6. 根据权利要求1~5中任一项所述的机器阅读方法,其特征在于,
    所述局部特征为问句利用三个维度的特征向量对问句本身进行的特征强化,所述整体特征为用于问句和原文之间的表征提取;并且,
    所述局部特征和所述整体特征通过加权求平均的方式或者串联的方式拼接在一起。
  7. 根据权利要求1所述的机器阅读方法,其特征在于,所述通过Bidirectional Attention Flow模型以及transformer对获取的具有局部特征和整体特征的文本进行处理包括如下步骤:
    通过所述Bidirectional Attention Flow模型对所述文本中的问题和答案进行处理,其公式如下:
    Figure PCTCN2019118501-appb-100002
    其中,t代表每一段文本,j代表每一个问题,S tj(t*j的矩阵)代表t文本和j问题的相关度,+ m表示按照矩阵乘法的方式进行加法,ti中i表示下标问题中的第i个单词,ji中i表示文本中对于问题中的第i个单词的注意力权重值;
    计算出每一个问题的字在每一个答案中的权重,其公式如下:
    a t=softmax(S t:)
    对所述问题进行加权平均处理,其公式为:
    Figure PCTCN2019118501-appb-100003
    是一个t*d的矩阵;
    将H t:
    Figure PCTCN2019118501-appb-100004
    通过G函数进行融合获取维度为t*4d的矩阵G,其中,所述矩阵G为融合文本中问题与答案所有关联信息。
  8. 根据权利要求7所述的机器阅读方法,其特征在于,所述采用lstm 对获取的文本中问题与答案所有关联信息进行处理,并输出开始概率和结束概率,并将所述开始概率和结束概率相乘,并将相乘后概率最高的一句话作为文本中问题的答案包括如下步骤:
    将获取的矩阵G输入双向lstm,得到所述文本中问题下的单词之间的关系的矩阵M;
    将上下文信息表征与所述矩阵M进行拼接,获取到拼接矩阵;
    将获取到的拼接矩阵输入第一单向lstm,并对所述第一单向lstm处理后的结果进行softmax处理,获取文本中答案的开始概率;
    将所述开始概率、所述矩阵G以及所述矩阵M作为输入参数,输入到第二单向lstm进行处理,获取文本中答案的结束概率;
    将所述开始概率和所述结束概率相乘,根据相乘的结果,将概率最高的那一句话作为答案。
  9. 一种机器阅读系统,其特征在于,所述系统包括:
    词向量获取模块,用于采用glove词向量训练模型对待处理文本中所有的单词进行预训练,获取映射在同一向量空间中的待处理文本中所有单词的词向量;
    字段嵌入获取模块,用于采用字符级卷积神经网络对获取的所有单词的词向量进行处理,获取字段嵌入;
    输入向量形成模块,用于将所述词向量和所述字段嵌入进行拼接,形成输入向量,并通过highway非线性转换层对所述输入向量进行处理;
    局部特征和整体特征获取模块,用于通过lstm对通过所述highway非线性转换层处理过的输入向量进行处理,获取局部特征的文本,通过transformer对通过所述highway非线性转换层处理过的输入向量进行处理,获取整体特征的文本,并对所述局部特征的文本与所述整体特征的文本进行融合形成具有局部特征和整体特征的文本;
    关联信息获取模块,用于通过Bidirectional Attention Flow模型以及transformer对获取的具有局部特征和整体特征的文本进行处理,获取文本中问题与答案所有关联信息;
    答案获取模块,用于采用lstm对获取的文本中问题与答案所有关联信息进行处理,并输出开始概率和结束概率,并将所述开始概率和结束概率相乘, 并将相乘后概率最高的一句话作为文本中问题的答案。
  10. 根据权利要求9所述的机器阅读系统,其特征在于,还包括glove词向量训练模型获取模块,其中,所述glove词向量训练模型获取模块利用Skip-gram模型挖掘出词与词之间线性关系的背后成因;然后根据所述词与词之间线性关系的背后成因,通过在共现矩阵上构造相似的条件,得到所述glove词向量训练模型。
  11. 根据权利要求9所述的机器阅读系统,其特征在于,所述glove词向量训练模型公式为:
    Figure PCTCN2019118501-appb-100005
    其中,P ij为共现概率;υ i、υ j为词向量;f为权重函数。
  12. 根据权利要求9所述的机器阅读系统,其特征在于,所述字段嵌入为:
    C θ∈B×D
    其中,C θ表示任意一个经过嵌入处理的字符向量,其维度满足B x D。
  13. 根据权利要求9~12中任一项所述的机器阅读系统,其特征在于,
    所述局部特征为问句利用三个维度的特征向量对问句本身进行的特征强化,所述整体特征为用于问句和原文之间的表征提取;并且,
    所述局部特征和所述整体特征通过加权求平均的方式或者串联的方式拼接在一起。
  14. 根据权利要求9所述的机器阅读系统,其特征在于,所述关联信息获取模块包括:
    预处理单元,用于通过所述Bidirectional Attention Flow模型对所述文本中的问题和答案进行处理,其公式如下:
    Figure PCTCN2019118501-appb-100006
    其中,t代表每一段文本,j代表每一个问题,S tj(t*j的矩阵)代表t文本和j问题的相关度,+ m表示按照矩阵乘法的方式进行加法,ti中i表示下标问题中的第i个单词,ji中i表示文本中对于问题中的第i个单词的注意力 权重值;
    权重处理单元,用于计算出每一个问题的字在每一个答案中的权重,其公式如下:
    a t=softmax(S t:)
    加权平均单元,用于对所述问题进行加权平均处理,其公式为:
    Figure PCTCN2019118501-appb-100007
    Figure PCTCN2019118501-appb-100008
    是一个t*d的矩阵;
    融合单元,用于将H t:
    Figure PCTCN2019118501-appb-100009
    通过G函数进行融合获取维度为t*4d的矩阵G,其中,所述矩阵G为融合文本中问题与答案所有关联信息。
  15. 根据权利要求9所述的机器阅读系统,其特征在于,所述答案获取模块包括:
    关系矩阵获取单元,用于将获取的矩阵G输入双向lstm,得到所述文本中问题下的单词之间的关系的矩阵M;
    拼接单元,用于将上下文信息表征与所述矩阵M进行拼接,获取到拼接矩阵;
    开始概率获取单元,用于将获取到的拼接矩阵输入第一单向lstm,并对所述第一单向lstm处理后的结果进行softmax处理,获取文本中答案的开始概率;
    结束概率获取单元,用于将所述开始概率、所述矩阵G以及所述矩阵M作为输入参数,输入到第二单向lstm进行处理,获取文本中答案的结束概率;
    整合单元,用于将所述开始概率和所述结束概率相乘,根据相乘的结果,将概率最高的那一句话作为答案。
  16. 一种电子装置,其特征在于,该电子装置包括:存储器、处理器,所述存储器中包括基于transformer和lstm的机器阅读程序,所述基于transformer和lstm的机器阅读程序被所述处理器执行时实现如下步骤:
    采用glove词向量训练模型对待处理文本中所有的单词进行预训练,获取映射在同一向量空间中的待处理文本中所有单词的词向量;
    采用字符级卷积神经网络对获取的所有单词的词向量进行处理,获取字段嵌入;
    将所述词向量和所述字段嵌入进行拼接,形成输入向量,并通过highway非线性转换层对所述输入向量进行处理;
    通过lstm对通过所述highway非线性转换层处理过的输入向量进行处理,获取局部特征的文本,通过transformer对通过所述highway非线性转换层处理过的输入向量进行处理,获取整体特征的文本,并对所述局部特征的文本与所述整体特征的文本进行融合形成具有局部特征和整体特征的文本;
    通过Bidirectional Attention Flow模型以及transformer对获取的具有局部特征和整体特征的文本进行处理,获取文本中问题与答案所有关联信息;
    采用所述lstm对获取的文本中问题与答案所有关联信息进行处理,并输出开始概率和结束概率,并将所述开始概率和结束概率相乘,并将相乘后概率最高的一句话作为文本中问题的答案。
  17. 根据权利要求6所述的电子装置,其特征在于,
    所述glove词向量训练模型公式为:
    Figure PCTCN2019118501-appb-100010
    其中,P ij为共现概率;υ i、υ j为词向量;f为权重函数。
  18. 根据权利要求6所述的电子装置,其特征在于,
    所述通过Bidirectional Attention Flow模型以及transformer对获取的具有局部特征和整体特征的文本进行处理包括如下步骤:
    通过所述Bidirectional Attention Flow模型对所述文本中的问题和答案进行处理,其公式如下:
    Figure PCTCN2019118501-appb-100011
    其中,t代表每一段文本,j代表每一个问题,S tj(t*j的矩阵)代表t文本和j问题的相关度,+ m表示按照矩阵乘法的方式进行加法,ti中i表示下标问题中的第i个单词,ji中i表示文本中对于问题中的第i个单词的注意力权重值;
    计算出每一个问题的字在每一个答案中的权重,其公式如下:
    a t=softmax(S t:)
    对所述问题进行加权平均处理,其公式为:
    Figure PCTCN2019118501-appb-100012
    是一个t*d的矩阵;
    将H t:
    Figure PCTCN2019118501-appb-100013
    通过G函数进行融合获取维度为t*4d的矩阵G,其中, 所述矩阵G为融合文本中问题与答案所有关联信息。
  19. 根据权利要求8所述的电子装置,其特征在于,
    所述采用lstm对获取的文本中问题与答案所有关联信息进行处理,并输出开始概率和结束概率,并将所述开始概率和结束概率相乘,并将相乘后概率最高的一句话作为文本中问题的答案包括如下步骤:
    将获取的矩阵G输入双向lstm,得到所述文本中问题下的单词之间的关系的矩阵M;
    将上下文信息表征与所述矩阵M进行拼接,获取到拼接矩阵;
    将获取到的拼接矩阵输入第一单向lstm,并对所述第一单向lstm处理后的结果进行softmax处理,获取文本中答案的开始概率;
    将所述开始概率、所述矩阵G以及所述矩阵M作为输入参数,输入到第二单向lstm进行处理,获取文本中答案的结束概率;
    将所述开始概率和所述结束概率相乘,根据相乘的结果,将概率最高的那一句话作为答案。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中包括基于transformer和lstm的机器阅读程序,所述基于transformer和lstm的机器阅读程序被处理器执行时,实现如权利要求1至8中任一项所述的机器阅读方法的步骤。
PCT/CN2019/118501 2019-10-29 2019-11-14 机器阅读方法、系统、装置及存储介质 WO2021082086A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911037790.X 2019-10-29
CN201911037790.XA CN110866098B (zh) 2019-10-29 2019-10-29 基于transformer和lstm的机器阅读方法、装置及可读存储介质

Publications (1)

Publication Number Publication Date
WO2021082086A1 true WO2021082086A1 (zh) 2021-05-06

Family

ID=69652976

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118501 WO2021082086A1 (zh) 2019-10-29 2019-11-14 机器阅读方法、系统、装置及存储介质

Country Status (2)

Country Link
CN (1) CN110866098B (zh)
WO (1) WO2021082086A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536798A (zh) * 2021-07-16 2021-10-22 北京易道博识科技有限公司 一种多实例文档关键信息抽取方法和系统
US20210406467A1 (en) * 2020-06-24 2021-12-30 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for generating triple sample, electronic device and computer storage medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476031A (zh) * 2020-03-11 2020-07-31 重庆邮电大学 一种基于Lattice-LSTM的改进中文命名实体识别方法
CN111582020B (zh) * 2020-03-25 2024-06-18 平安科技(深圳)有限公司 信号处理方法、装置、计算机设备及存储介质
CN112100328B (zh) * 2020-08-31 2023-05-30 广州探迹科技有限公司 一种基于多轮对话的意向判断方法
CN113743118B (zh) * 2021-07-22 2024-06-21 武汉工程大学 基于融合关系信息编码的法律文书中的实体关系抽取方法
CN113850078B (zh) * 2021-09-29 2024-06-18 平安科技(深圳)有限公司 基于机器学习的多意图识别方法、设备及可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460553A (zh) * 2018-11-05 2019-03-12 中山大学 一种基于门限卷积神经网络的机器阅读理解方法
CN109492227A (zh) * 2018-11-16 2019-03-19 大连理工大学 一种基于多头注意力机制和动态迭代的机器阅读理解方法
CN109933661A (zh) * 2019-04-03 2019-06-25 上海乐言信息科技有限公司 一种基于深度生成模型的半监督问答对归纳方法和系统
US20190251168A1 (en) * 2018-02-09 2019-08-15 Salesforce.Com, Inc. Multitask Learning As Question Answering
CN110222152A (zh) * 2019-05-29 2019-09-10 北京邮电大学 一种基于机器阅读理解的问题答案获取方法及系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10540967B2 (en) * 2016-11-14 2020-01-21 Xerox Corporation Machine reading method for dialog state tracking
CN110162636B (zh) * 2019-05-30 2020-05-19 中森云链(成都)科技有限责任公司 基于d-lstm的情绪原因识别方法
CN110222349B (zh) * 2019-06-13 2020-05-19 成都信息工程大学 一种深度动态上下文词语表示的方法及计算机

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190251168A1 (en) * 2018-02-09 2019-08-15 Salesforce.Com, Inc. Multitask Learning As Question Answering
CN109460553A (zh) * 2018-11-05 2019-03-12 中山大学 一种基于门限卷积神经网络的机器阅读理解方法
CN109492227A (zh) * 2018-11-16 2019-03-19 大连理工大学 一种基于多头注意力机制和动态迭代的机器阅读理解方法
CN109933661A (zh) * 2019-04-03 2019-06-25 上海乐言信息科技有限公司 一种基于深度生成模型的半监督问答对归纳方法和系统
CN110222152A (zh) * 2019-05-29 2019-09-10 北京邮电大学 一种基于机器阅读理解的问题答案获取方法及系统

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210406467A1 (en) * 2020-06-24 2021-12-30 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for generating triple sample, electronic device and computer storage medium
CN113536798A (zh) * 2021-07-16 2021-10-22 北京易道博识科技有限公司 一种多实例文档关键信息抽取方法和系统
CN113536798B (zh) * 2021-07-16 2024-05-31 北京易道博识科技有限公司 一种多实例文档关键信息抽取方法和系统

Also Published As

Publication number Publication date
CN110866098B (zh) 2022-10-28
CN110866098A (zh) 2020-03-06

Similar Documents

Publication Publication Date Title
WO2021082086A1 (zh) 机器阅读方法、系统、装置及存储介质
CN110717017B (zh) 一种处理语料的方法
CN112487182B (zh) 文本处理模型的训练方法、文本处理方法及装置
CN112668671B (zh) 预训练模型的获取方法和装置
WO2022007823A1 (zh) 一种文本数据处理方法及装置
CN111241237B (zh) 一种基于运维业务的智能问答数据处理方法及装置
CN109871538A (zh) 一种中文电子病历命名实体识别方法
CN111985240B (zh) 命名实体识别模型的训练方法、命名实体识别方法及装置
CN112287069B (zh) 基于语音语义的信息检索方法、装置及计算机设备
CN112288075A (zh) 一种数据处理方法及相关设备
CN113239169A (zh) 基于人工智能的回答生成方法、装置、设备及存储介质
JP2022006173A (ja) 知識事前訓練モデルの訓練方法、装置及び電子機器
CN113707299A (zh) 基于问诊会话的辅助诊断方法、装置及计算机设备
CN113657105A (zh) 基于词汇增强的医学实体抽取方法、装置、设备及介质
JP2022145623A (ja) ヒント情報を提示する方法及び装置並びにコンピュータプログラム
CN111881292A (zh) 一种文本分类方法及装置
CN116204674A (zh) 一种基于视觉概念词关联结构化建模的图像描述方法
CN115455169A (zh) 一种基于词汇知识和语义依存的知识图谱问答方法和系统
CN112800205B (zh) 基于语义变化流形分析获取问答相关段落的方法、装置
CN114445832A (zh) 基于全局语义的文字图像识别方法、装置及计算机设备
CN117094395B (zh) 对知识图谱进行补全的方法、装置和计算机存储介质
CN112199954B (zh) 基于语音语义的疾病实体匹配方法、装置及计算机设备
CN116386895B (zh) 基于异构图神经网络的流行病舆情实体识别方法与装置
CN116414988A (zh) 基于依赖关系增强的图卷积方面级情感分类方法及系统
Pourkeshavarz et al. Stacked cross-modal feature consolidation attention networks for image captioning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19950833

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19950833

Country of ref document: EP

Kind code of ref document: A1