CN114265924A - Method and device for retrieving associated table according to question - Google Patents

Method and device for retrieving associated table according to question Download PDF

Info

Publication number
CN114265924A
CN114265924A CN202111586986.1A CN202111586986A CN114265924A CN 114265924 A CN114265924 A CN 114265924A CN 202111586986 A CN202111586986 A CN 202111586986A CN 114265924 A CN114265924 A CN 114265924A
Authority
CN
China
Prior art keywords
question
frequency
vector
title
participle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111586986.1A
Other languages
Chinese (zh)
Inventor
刘星光
程振波
肖刚
孟航程
李琴
孙力
张皓鑫
王亚明
徐雪松
陆佳炜
张元鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202111586986.1A priority Critical patent/CN114265924A/en
Publication of CN114265924A publication Critical patent/CN114265924A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The method and the device for realizing the associated table retrieval by using the question are provided aiming at the table retrieval part in the table question-answering system, and the method can match the most relevant table for the question. The method comprises the following steps: respectively calculating the frequency of the words in the question in the form and the frequency of the words in the form in the question according to the question and the form; carrying out word embedding vector representation on the question and the table; embedding words of the question and the table into the expression vector and calculating frequency to fuse, and further obtaining a fusion vector of the question and the table; finally the fused vector will be used to calculate the similarity of the question to the table.

Description

Method and device for retrieving associated table according to question
Technical Field
The application relates to the field of computer natural language question answering and information retrieval. And more particularly, to a method of retrieving a table most related to a question described in a natural language using the question.
Background
To more accurately represent information, tables are often used to organize the information. The table generally includes a title, a header, and contents. The title is a sentence describing the table function. The header is typically an attribute that represents the association of the stored content, typically described by a phrase. The content is information stored in a table or instantiation of a header attribute, and the representation mode of the content often comprises numbers or short sentences. Forms are often found in various types of documents, documents and reports, and some professional documents, such as design standards in the construction and machinery fields, often include multiple forms. When constructing a question-answering system for professional documents, it is often necessary to obtain answers s of question sentences from a table according to input question sentences Q. For this purpose, it is necessary to first determine from a plurality of tables T ═ T1,T2,...,Tj]Determining a table T having the closest relationship with the question QjThen, the answer s of the question sentence is determined according to the determined table. Where j represents the number of tables, s ∈ Tj. Therefore, being able to retrieve the form most relevant to Q from question Q is one of the most important steps in the design of a question-answering system based on form retrieval. However, there are methods such as publication No. CN109670028A, publication No. CN110737671A and publication No. CN107203528B, which examineThe index is implemented based on a keyword matching manner. This type of search method simply treats the keyword as an independent object, and loses semantic information of the sentence and table contents associated with the keyword. Therefore, the invention provides a new expression method for semantically fusing the question Q with the table title, the table header and the content, thereby improving the accuracy of searching the associated table according to the question.
Disclosure of Invention
The invention provides a method and a device for searching an associated table according to a question, aiming at the table searching problem in a table question-answering system, so as to improve the accuracy of matching the associated table for the question.
The general flow of the invention comprises: calculating a pre-attention vector between a question and a table by using the question and the table; carrying out word embedding vector representation on the question and the table; the method comprises the steps that a question and a word embedding expression vector of a table and a pre-attention vector are fused to obtain a fusion vector expression of the question and the table; and calculating the similarity between the question and the table according to the fusion vector.
A method for retrieving an association table based on a question, comprising the steps of:
step 1: respectively calculating the frequency of the words in the question in the table and the frequency of the words in the table in the question, and recording the calculated frequencies as pre-attention vectors;
step 2: obtaining word embedding expression vectors of a question and a table;
and step 3: the method comprises the steps of fusing word embedding expression vectors and calculation frequency of a question and a table to obtain fusion vector expression of the question and the table;
and 4, step 4: and calculating the similarity between the question and the table according to the fusion vector.
Preferably, the calculating of the pre-attention vector between the question and the table in step 1 specifically includes:
step 2.1: a question is expressed by the symbol Q, and first, the question is participled to obtain Q ═ Q1,q2,…,qnWherein q isnEach participle in the question is represented, and n represents the number of participles in the question.
Step 2.2: constructing characteristic information of the table, wherein the characteristic information of the table comprises a title and a header of the table, and recording the characteristic information of the table as TjWherein j represents the index of the table; denote the title of the table as titlejAnd the table title is participled to represent titlej={t1,t2,…,tmWhere t ismEach participle in the title is represented, and m represents the number of the participles in the title; representing the header of the table as headj={s1,s2,…,skIn which s iskEach attribute participle in the header is represented, and k represents the number of attribute participles in the header. Finally, the header and the header information are spliced to obtain form characteristic information Tj={t1,t2,…,tm,s1,s2,…,sk}。
Step 2.3: and calculating a pre-attention vector between the problem Q and the table, and defining a function Match (x, y) to calculate the occurrence frequency of the participle x in the text y, wherein the meaning of the Match (x, y) indicates that when the text y contains the participle x, the function returns 1, otherwise, the function returns 0, and if the text y is a common stop word, the function returns 0. The use of other methods to calculate the frequency of occurrence of the participles in the text herein does not affect the results of the present invention. The specific method of predicting the vector of interest is as follows:
2.3.1) first calculate the participles Q in QnIn the table titlejThe frequency of occurrence of (a). According to the calculation method of Match (x, y), the participle Q in Q can be obtainednIn titlejThe frequency of occurrence in (a) is expressed as:
Figure BDA0003428218660000021
in addition, the title is calculatedjMiddle participle tmFrequency of occurrence in Q:
Figure BDA0003428218660000031
2.3.2) calculating the participles Q in QnHead at table headjThe frequency of occurrence of (1):
Figure BDA0003428218660000032
calculate the word in Q is headjCorresponding frequency of appearance in table contents, headjThe corresponding table contents are denoted CjThe contents of all headers in the table are denoted Cj={c1,c2,…,ckIn which c iskRepresenting the content of each cell, and k represents the number of cells. Q middle participle QnAt CjThe frequency of occurrence in (a) is expressed as:
Figure BDA0003428218660000033
q middle participle QnHead at the headjThe frequency of occurrence in (a) will eventually be expressed as:
Figure BDA0003428218660000034
further, a head is obtainedjThe frequency of occurrence of each attribute participle in Q:
Figure BDA0003428218660000035
2.3.3) frequency of occurrences in the table title of the participles in the question Q calculated in steps 2.3.1) and 2.3.2), respectively
Figure BDA0003428218660000036
And the frequency of appearance in the table header
Figure BDA0003428218660000037
The frequency with which the tokens in the available question Q appear in the entire table is:
Figure BDA0003428218660000038
2.3.4) calculating according to the steps to obtain a pre-attention vector between the question Q and the table, representing the frequency of each participle in the question appearing in the table, and splicing the frequency of the title participles and the header attribute participles in the table respectively appearing in the question, and recording the pre-attention vector as Mij
Figure BDA0003428218660000039
Preferably, the obtaining of the word embedding expression vector of the question sentence and the table in step 2 specifically includes:
question Q and title feature information TjSplicing is directly performed to represent: [ Q: t isj]=[q1,q2,…,qn,t1,t2,…,tm,s1,s2,…,sk]Then, using a generic word embedding model, the [ Q: t isj]Vector representation as Zij. Each ZijThe length of the sequence is n + m + k; where the length l of each participle is derived from the word embedding model (the size of l does not affect the result of the invention). Then obtaining Z through the existing self-attention mechanismijAnd the self-attention feature vector is denoted as Aij. The attention mechanism can quickly extract important features of sparse data, and is widely used for natural language processing tasks. While the autoflight mechanism is an improvement of the attentiveness mechanism, which reduces reliance on external information and is more adept at capturing internal correlations of data or features.
Preferably, the question sentence and the word embedding expression vector of the table and the calculation frequency are fused in the step 3,
obtaining a fusion vector representation of the question and the table, which specifically comprises the following steps:
pre-attention vector MijThe self-attention score A obtained in claim 3ijThe fusion was performed according to the following formula:
Aij*Mij+Mij (8)
a fused vector representation is obtained, and is recorded as:
Figure BDA0003428218660000042
Figure BDA0003428218660000041
preferably, the calculating the similarity between the question and the table according to the fusion vector in the step 4 specifically includes:
similarity calculation is carried out by using the existing sigmod activating function, the sigmod function can map a real number x to an interval of (0, 1) and can be used for carrying out classification, and the formula of the sigmod function is defined as:
S(x)=1/(1+e-x) (10)
the sigmod function is used for mapping the fusion vector to a (0, 1) interval so as to represent the similarity degree between the question and the table, the mapping value close to 1 represents that the question is very similar, and the mapping value close to 0 represents that the question is not similar.
The invention relates to a device for searching a correlation table according to a question, which comprises: one or more processors; a storage device for storing one or more programs; when the one or more programs are executed by the one or more processors, cause the one or more processors to implement a method as above; the processor implements the steps of the method of the present invention when executing the computer program.
The invention has the advantages that: the method and the device for retrieving the associated table according to the question fully utilize the associated information between the question and the table, solve the problem that semantic information of the sentence and the table content associated with the keyword is lost only by taking the keyword as an independent object in the conventional method for retrieving the table based on the keyword, and further improve the accuracy of matching the associated table according to the question.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a fused self attention score and pre-attention vector flow diagram of the method of the present invention;
FIG. 3 is a flow chart of a fusion vector representation of a question and a table of the method of the present invention;
FIG. 4 is a schematic of the process of the present invention;
FIG. 5 is an example of a process for performing pre-attention vector calculation between a natural question and a form according to the present invention;
FIG. 6 is a schematic diagram of the calculation and fusion of self-attention scores and pre-attention vectors according to the present invention;
fig. 7 is a block diagram of the apparatus of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The invention uses ASME boiler and pressure vessel standard international standard as reference file, and makes training data set according to design standard form, extracts design standard form of 'design requirement according to analysis' in the fifth chapter of book, and asks mechanical professional to ask question according to design standard according to the form (the experiment uses other files as reference data and does not affect the result of the invention). The questions presented according to the corresponding tables are regarded as positive samples, 1500 positive samples are established together, 2 tables are randomly allocated to each question in the experiment as negative sample data, and 4500 training data sets are finally established together. The experimental task matches the question with all tables, and the most relevant tables are matched for the question by using the text method. The specific experimental procedure is as follows:
the structures, proportions, and dimensions shown in the drawings and detailed description are for understanding and reading only, and are not intended to limit the scope of the invention, which is defined in the claims, but are not necessarily essential to the skilled in the art, and any structural modifications, changes in proportions, or adjustments in size, without affecting the efficacy and attainment of the same, are intended to be included within the scope of the invention. The specific values given for the parameters in the present invention are only for clarity of description, and are not intended to limit the scope of the invention, and the changes or adjustments of the relative relationship should be considered as the scope of the invention without substantial technical changes.
A method for retrieving an association table based on a question, comprising the steps of:
1. calculating a pre-focus vector M between a question Q and a table using the question and the tableijThe process is shown in fig. 2, and specifically includes:
step 1.1: as shown by reference numeral S41 in fig. 4, the question is participled to obtain a participle set Q ═ { Q } of the question Q1,q2,…,qnWherein q isnThe number of the participles in the question sentence is n.
Step 1.2: as shown by reference numeral S42 in FIG. 4, the title participle of the table is denoted as titlej={t1,t2,…,tmHead attribute participles of the table are denoted as headj={s1,s2,…,skWill the title and head of the tablejSplicing is carried out to obtain form characteristic information Tj={t1,t2,…,tm;s1,s2,…,sk}。
Step 1.3: according to Q and T obtained in the step 1 and the step 2jA pre-attention vector between the question and the table is calculated. In the invention, a function Match (x, y) is defined to calculate the occurrence frequency of the participle x in the text y, wherein the meaning of the Match (x, y) indicates that when the text y contains the participle x, the function returns 1, otherwise, the function returns 0, and if the function is a common stop word, the function returns 0.
1.3.1) calculating title separately as shown by reference numeral S43 in FIG. 4jAnd frequency of occurrences of participles between Q
Figure BDA0003428218660000061
And
Figure BDA0003428218660000062
according to the calculation method of Match (x, y), the participle Q in Q can be obtainednIn titlejThe frequency of occurrence in (a) is expressed as:
Figure BDA0003428218660000063
can also obtain titlejMiddle participle tmFrequency of occurrence in Q:
Figure BDA0003428218660000064
1.3.2) calculating Q and the table header head, respectively, as shown by reference numeral S44 in FIG. 4jFrequency of inter-word occurrences
Figure BDA0003428218660000065
Medicine for curing cancer
Figure BDA0003428218660000066
According to the function Match (x, y) in the step 3, the participle Q in the Q is firstly calculatednHead at table headjThe frequency of occurrence of (1):
Figure BDA0003428218660000071
considering that some information in the problem may appear in the table content corresponding to the header, the invention also calculates the head of the participle in QjFrequency of occurrence in corresponding table contents, headjThe corresponding table contents are denoted CjThe contents of all headers in the table are denoted Cj={c1,c2,…,ckIn which c iskRepresenting the content of each cell, and k represents the number of cells. Q middle participle QnAt CjThe frequency of occurrence in (a) is expressed as:
Figure BDA0003428218660000072
final Q-term segmentation QnHead at the headjThe frequency of occurrence in (a) will eventually be expressed as:
Figure BDA0003428218660000073
head can be obtained by the same methodjThe frequency of occurrence of each attribute participle in Q:
Figure BDA0003428218660000074
1.3.3) frequency of occurrences in the table title of the participles in the question Q, respectively, calculated according to 1.3.1 and 1.3.2
Figure BDA0003428218660000075
And the frequency of appearance in the table header
Figure BDA0003428218660000076
The frequency with which the tokens in the available question Q appear in the entire table is:
Figure BDA0003428218660000077
1.3.4) as shown by reference numeral S45 in fig. 4, a pre-attention vector between the question Q and the table is expressed as a concatenation of the frequency of occurrence of each participle in the question in the table and the frequency of occurrence of the title participle and the header attribute participle in the table in the question, respectively, and the pre-attention vector is expressed as Mij
Figure BDA0003428218660000078
2. The word embedding expression vector for obtaining the question and the table is shown in fig. 3, and specifically includes:
as shown at S46 in fig. 4, will be described hereinbeforeQuestion Q and title feature information TjSplicing is directly performed to represent: [ Q: t isj]=[q1,q2,…,qn,t1,t2,…,tm,s1,s2,…,sk]Then, a general Word embedding model is used, such as a Word2vec Word vector representation method proposed by T-Mikolov et al in the article effectiveness Estimation of Word expressions in vector Space; or J-Devrlin et al in the article Bert: the bert model proposed in the Pre-training of deep bidirectional transformations for language understating can vectorize a text or word. The method of using any word embedding representation here does not affect the results of the present invention. Mixing [ Q: t isj]Vector representation as Zij. Each ZijThe length of the sequence is n + m + k; wherein the length l of each participle is obtained by a word embedding model (the size of l does not influence the result of the invention); then obtaining Z by the existing self-attention mechanism methodijAnd the self-attention feature vector is denoted as Aij
3. As shown by reference numeral S47 in fig. 4, the question is fused with the self-attention feature vector and the pre-attention vector of the table to obtain a fused vector representation of the question and the table:
pre-attention vector MijAnd self-attention score Aij is fused according to the following formula:
Aij*Mij+Mij(8) a fused vector representation is obtained, and is recorded as:
Figure BDA0003428218660000082
Figure BDA0003428218660000081
4. and calculating the similarity between the question and the table according to the fusion vector:
the invention uses the existing sigmod activation function to map the fusion vector to the (0, 1) interval, so as to represent the similarity degree between the question and the table, the mapping value is close to 1 to represent similarity, and the mapping value is close to 0 to represent dissimilarity.
An apparatus for retrieving an association table based on a question, comprising:
one or more processors; a storage device for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as above. The processor, when executing the computing program, implements the method as above.
Examples of the invention are as follows:
1. the experiment used as the analysis target a table titled "example of stress classification" and presented the natural question "what classification a concave head whose stress type is film stress belongs to? ". As shown in reference numeral S51 in fig. 5, the natural question is segmented, and the experiment uses the chinese segmentation tool jieba commonly used in natural language processing to segment words, so as to obtain a segmentation set Q of the question:
q ═ stress type; film stress; of (1); a concave end enclosure; belongs to the group of; what; classification } (11)
2. As shown by reference numeral S52 in fig. 5, table feature information T of the table "example of stress classification" is obtained: the title of the table is denoted as title ═ stress classification; of (1); example, the header information of the table is denoted as head ═ container part; a location; the origin of the stress; a stress type; classification }; the characteristic information of the table is obtained as follows:
t ═ stress classification; of (1); examples; a container member; a location; the origin of the stress; a stress type; classification } (12)
3. The pre-attention vector between Q and T is cross-calculated as indicated by reference numeral S53 in fig. 5. The frequency of occurrence of the participles between the problem and the table is calculated separately using the function Match (x, y) defined in the present invention. The words "of", "of" and "what" in the question-participles Q and T can be regarded as stop words in the text, so that these words score 0 when the Match function calculation is performed.
As shown by reference numerals S53, S54 in FIG. 5, q in the participle of the question3As an example, made ofAt q is3Is not included in title, shown
Figure BDA0003428218660000091
As shown by reference numeral S54 in fig. 5, although header information head ═ container part; a location; the origin of the stress; a stress type; classification } contains no q3I.e. by
Figure BDA0003428218660000092
But at the head of the watch s4The table content of "stress type" contains "film stress", and thus Match (q)3,s4) 1 is ═ 1, i.e
Figure BDA0003428218660000093
Final participle q3At TjThe frequency of occurrence in (a) is:
Figure BDA0003428218660000094
calculating a natural question Q ═ { stress type; film stress; of (1); a concave end enclosure; belongs to the group of; what; classification and table T ═ stress classification; of (1); examples; a container member; a location; the origin of the stress; a stress type; the pre-attention vector between classes is:
Figure BDA0003428218660000095
4. the method of the present application, in which the self-attention score and the pre-attention vector are fused, is shown in fig. 6, and a natural question Q and a natural question T are spliced, as indicated by reference number S61 in fig. 6:
Figure BDA0003428218660000096
as shown by reference numeral S62 in fig. 6, the [ Q: t is]The pre-attention vector M; as shown in fig. 6 by reference number S63, the word2vec pair Q: t is]Word vector embeddingI represents; as shown in reference numeral S64 in fig. 6, the experiment uses the existing Bi-LSTM neural network to further extract feature information from the text after vector representation, so that the vector representation of the text can focus on context information, and the use of other semantic information extraction methods does not affect the result of the present invention. To obtain [ Q: f. of]Semantic information Z of (a); calculating a self-attention score a for Z on the self-attention layer as shown by reference numeral S65 in fig. 6; finally, as shown by reference numeral S66 in fig. 6, the obtained self-attention score is combined with the pre-attention vector M to obtain a fused vector
Figure BDA0003428218660000102
Figure BDA0003428218660000101
5. The experiment was trained and tested using 4500 data set up in the international code for ASME boiler and pressure vessel code. The accuracy of the method for matching the table in the experiment reaches 90.6 percent. In the comparative experiment, the accuracy of the semantic matching model which only uses the self-attention mechanism to extract the feature vectors and then performs similarity calculation in the table matching task is 73.7%. In the experiment, a method of performing statistical matching in tables only by using the problem keywords is compared, the times of occurrence of the problem keywords in each table are calculated, and all the tables are sorted according to the times of occurrence. Finally, a table with the first rank is returned as the most relevant table of the problem, and the experimental result shows that the accuracy rate of matching the table by a keyword statistical method only reaches 87.3 percent, which shows that the method can be used for calculating the similarity between the question and the table.
An apparatus for retrieving an association table based on a question, the apparatus comprising: one or more processors; a storage device for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as above. A block diagram of an apparatus for retrieving an association table according to a question sentence is shown in fig. 7, and the processor executes the calculation program to realize the method.
It should be understood that the data processing apparatuses correspond to the data processing methods one to one, and are not described herein again. It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

Claims (6)

1. A method for retrieving an association table based on a question, comprising the steps of:
step 1: respectively calculating the frequency of the words in the question in the table and the frequency of the words in the table in the question, and recording the calculated frequencies as pre-attention vectors;
step 2: obtaining word embedding expression vectors of a question and a table;
and step 3: the method comprises the steps of fusing word embedding expression vectors and calculation frequency of a question and a table to obtain fusion vector expression of the question and the table;
and 4, step 4: and calculating the similarity between the question and the table according to the fusion vector.
2. A method of retrieving an association table according to a question as set forth in claim 1, characterized in that: the calculating of the pre-attention vector between the question and the table in the step 1 specifically includes:
step 2.1: a question is expressed by the symbol Q, and first, the question is participled to obtain Q ═ Q1,q2,…,qnWherein q isnEach participle in the question is represented, and n represents the number of participles in the question.
Step 2.2: constructing characteristic information of the table, wherein the characteristic information of the table comprises a title and a header of the table, and recording the characteristic information of the table as TjWherein j represents the index of the table; denote the title of the table as titlejAnd is combined withDividing the table title into words to represent titlej={t1,t2,…,tmWhere t ismEach participle in the title is represented, and m represents the number of the participles in the title; representing the header of the table as headj={s1,s2,…,skIn which s iskEach attribute participle in the header is represented, and k represents the number of attribute participles in the header. Finally, the header and the header information are spliced to obtain form characteristic information Tj={t1,t2,…,tm,s1,s2,…,sk}。
Step 2.3: and calculating a pre-attention vector between the problem Q and the table, and defining a function Match (x, y) to calculate the occurrence frequency of the participle x in the text y, wherein the meaning of the Match (x, y) indicates that when the text y contains the participle x, the function returns 1, otherwise, the function returns 0, and if the text y is a common stop word, the function returns 0. The use of other methods to calculate the frequency of occurrence of the participles in the text herein does not affect the results of the present invention. The specific method of predicting the vector of interest is as follows:
2.3.1) first calculate the participles Q in QnIn the table titlejThe frequency of occurrence of (a). According to the calculation method of Match (x, y), the participle Q in Q can be obtainednIn titlejThe frequency of occurrence in (a) is expressed as:
Figure FDA0003428218650000011
in addition, the title is calculatedjMiddle participle tmFrequency of occurrence in Q:
Figure FDA0003428218650000021
2.3.2) calculating the participles Q in QnHead at table headjThe frequency of occurrence of (1):
Figure FDA0003428218650000022
calculate the word in Q is headjCorresponding frequency of appearance in table contents, headjThe corresponding table contents are denoted CjThe contents of all headers in the table are denoted Cj={c1,c2,…,ckIn which c iskRepresenting the content of each cell, and k represents the number of cells. Q middle participle QnAt CjThe frequency of occurrence in (a) is expressed as:
Figure FDA0003428218650000023
q middle participle QnHead at the headjThe frequency of occurrence in (a) will eventually be expressed as:
Figure FDA0003428218650000024
further, a head is obtainedjThe frequency of occurrence of each attribute participle in Q:
Figure FDA0003428218650000025
2.3.3) frequency of occurrences in the table title of the participles in the question Q calculated in steps 2.3.1) and 2.3.2), respectively
Figure FDA0003428218650000026
And the frequency of appearance in the table header
Figure FDA0003428218650000027
The frequency with which the tokens in the available question Q appear in the entire table is:
Figure FDA0003428218650000028
2.3.4) calculating according to the steps to obtain a pre-attention vector between the question Q and the table, representing the frequency of each participle in the question appearing in the table, and splicing the frequency of the title participles and the header attribute participles in the table respectively appearing in the question, and recording the pre-attention vector as Mij
Figure FDA0003428218650000029
3. A method of retrieving an association table based on a question as claimed in claim 2, characterized in that: the obtaining of the word embedding expression vector of the question sentence and the table in the step 2 specifically includes:
question Q and title feature information TjSplicing is directly performed to represent: [ Q: t isj]=[q1,q2,…,qn,t1,t2,…,tm,s1,s2,…,sk]Then, using a generic word embedding model, the [ Q: t isj]Vector representation as ZijEach of ZijThe length of the sequence is n + m + k. Then obtaining Z through the existing self-attention mechanismijAnd the self-attention feature vector is denoted as Aij. The attention mechanism can quickly extract important features of sparse data, and is widely used for natural language processing tasks. While the autoflight mechanism is an improvement of the attentiveness mechanism, which reduces reliance on external information and is more adept at capturing internal correlations of data or features.
4. A method of retrieving an association table according to a question as set forth in claim 3, characterized in that: step 3, said embedding the words of the question and the table into the expression vector and calculating the frequency to obtain the fusion vector expression of the question and the table, which specifically includes:
pre-attention vector MijThe self-attention score A obtained in claim 3ijAccording to the following disclosureFormula (iv) is fused:
Aij*Mij+Mij (8)
a fused vector representation is obtained, and is recorded as:
Figure FDA0003428218650000031
Figure FDA0003428218650000032
5. a method of retrieving an association table according to a question according to claim 1 or 4, characterized in that: step 4, calculating the similarity between the question and the table according to the fusion vector, which specifically comprises:
similarity calculation is carried out by using the existing sigmod activating function, the sigmod function can map a real number x to an interval of (0, 1) and can be used for carrying out classification, and the formula of the sigmod function is defined as:
S(x)=1/(1+e-x) (10)
the sigmod function is used for mapping the fusion vector to a (0, 1) interval so as to represent the similarity degree between the question and the table, the mapping value close to 1 represents that the question is very similar, and the mapping value close to 0 represents that the question is not similar.
6. An apparatus for retrieving an association table based on a question, characterized in that: the method comprises the following steps: one or more processors; a storage device for storing one or more programs; when the one or more programs are executed by the one or more processors, cause the one or more processors to implement a method as above; the processor, when executing the computer program, realizes the steps of claim 1.
CN202111586986.1A 2021-12-23 2021-12-23 Method and device for retrieving associated table according to question Pending CN114265924A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111586986.1A CN114265924A (en) 2021-12-23 2021-12-23 Method and device for retrieving associated table according to question

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111586986.1A CN114265924A (en) 2021-12-23 2021-12-23 Method and device for retrieving associated table according to question

Publications (1)

Publication Number Publication Date
CN114265924A true CN114265924A (en) 2022-04-01

Family

ID=80828953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111586986.1A Pending CN114265924A (en) 2021-12-23 2021-12-23 Method and device for retrieving associated table according to question

Country Status (1)

Country Link
CN (1) CN114265924A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049354A (en) * 2023-01-28 2023-05-02 北京原子回声智能科技有限公司 Multi-table retrieval method and device based on natural language

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049354A (en) * 2023-01-28 2023-05-02 北京原子回声智能科技有限公司 Multi-table retrieval method and device based on natural language

Similar Documents

Publication Publication Date Title
CN107436864B (en) Chinese question-answer semantic similarity calculation method based on Word2Vec
CN109344236B (en) Problem similarity calculation method based on multiple characteristics
JP6618735B2 (en) Question answering system training apparatus and computer program therefor
Mollá et al. Question answering in restricted domains: An overview
US10503828B2 (en) System and method for answering natural language question
US20150227505A1 (en) Word meaning relationship extraction device
CN112069298A (en) Human-computer interaction method, device and medium based on semantic web and intention recognition
CN111506721A (en) Question-answering system and construction method for domain knowledge graph
CN111797245B (en) Knowledge graph model-based information matching method and related device
JP2011118689A (en) Retrieval method and system
CN112434533A (en) Entity disambiguation method, apparatus, electronic device, and computer-readable storage medium
Hassani et al. LVTIA: A new method for keyphrase extraction from scientific video lectures
Sukumar et al. Semantic based sentence ordering approach for multi-document summarization
Gasmi Medical text classification based on an optimized machine learning and external semantic resource
CN112711666B (en) Futures label extraction method and device
Albeer et al. Automatic summarization of YouTube video transcription text using term frequency-inverse document frequency
CN114265924A (en) Method and device for retrieving associated table according to question
Schirmer et al. A new dataset for topic-based paragraph classification in genocide-related court transcripts
Karpagam et al. Deep learning approaches for answer selection in question answering system for conversation agents
Alwaneen et al. Stacked dynamic memory-coattention network for answering why-questions in Arabic
CN111858885B (en) Keyword separation user question intention identification method
CN114298020A (en) Keyword vectorization method based on subject semantic information and application thereof
Bulfamante Generative enterprise search with extensible knowledge base using AI
CN112905752A (en) Intelligent interaction method, device, equipment and storage medium
Sheikh et al. Improved neural bag-of-words model to retrieve out-of-vocabulary words in speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination