CN116610791B - Semantic analysis-based question answering method, system and equipment for structured information - Google Patents

Semantic analysis-based question answering method, system and equipment for structured information Download PDF

Info

Publication number
CN116610791B
CN116610791B CN202310889872.7A CN202310889872A CN116610791B CN 116610791 B CN116610791 B CN 116610791B CN 202310889872 A CN202310889872 A CN 202310889872A CN 116610791 B CN116610791 B CN 116610791B
Authority
CN
China
Prior art keywords
data
model
layer
attribute
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310889872.7A
Other languages
Chinese (zh)
Other versions
CN116610791A (en
Inventor
姚锋
张忠山
王涛
沈大勇
陈英武
吕济民
何磊
陈宇宁
陈盈果
刘晓路
杜永浩
闫俊刚
王沛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202310889872.7A priority Critical patent/CN116610791B/en
Publication of CN116610791A publication Critical patent/CN116610791A/en
Application granted granted Critical
Publication of CN116610791B publication Critical patent/CN116610791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a question-answering method, a system and equipment based on semantic analysis aiming at structured information, wherein the method comprises the following steps: receiving structured data, preprocessing the structured data, and storing the preprocessed data into a database; labeling the preprocessed data to form labeling data, and training a semantic analysis model based on the labeling data; receiving a user question, inputting the semantic analysis model, and obtaining an output answer; training a semantic judgment model based on the annotation data; the semantic judgment model receives the output answer and generates a query sentence; and inquiring and outputting a precise answer based on the inquiry statement. The scheme ensures the accuracy of the data secondary retrieval answers, has low resource consumption, and avoids the problems of information confusion or uncontrollable answers caused by pushing a plurality of related answers.

Description

Semantic analysis-based question answering method, system and equipment for structured information
Technical Field
The invention relates to the field of structured information processing and semantic analysis, in particular to semantic analysis and structured data processing and query technology, and particularly relates to a question-answering system, method and equipment based on semantic analysis aiming at structured information.
Background
Without the assistance of professionals, the most common approach at present is to provide search services or intelligent question-answering services, in order to provide data in an informationized system to general users via a computer system, to help everybody solve problems in the respective fields or to improve the efficiency of the business.
The search service mainly returns related data according to the similarity degree with the key information through some key words and key information, and then a user searches the answer wanted by the user or arranges the needed data in the returned result set, so that the search service helps us reduce the data searching range and improve the efficiency, but we still need to spend more time on the secondary searching and analysis of the data, and often cannot or cannot completely acquire the wanted data due to the reasons of the description mode of the key information and the like.
Intelligent question-answering services currently mainly include a traditional question-answering system based on search development and a frontmost big model-based generation question-answering system, and the traditional question-answering system based on search usually needs to manually write rules or use a predefined grammar template for question resolution and answer extraction, which may not cover all cases when facing complex questions, and thus cannot provide accurate answers. The user's question background and semantics are not well understood and therefore misunderstood or even completely irrelevant answers are easily generated. Especially for updating and maintaining structured data, a lot of manpower and material resources are needed, so that long waiting time is often needed to obtain the latest information.
If the forefront large model technology is used, the question-answering system can directly return accurate answers, but the resource investment is large, and the answers risk uncontrollable.
In daily demands, the demand of searching or retrieving service for structured data is still large, so how to build a question-answering system for structured data to provide accurate answers as much as possible on the premise of small system resource investment, solve the problems of time consumption, inaccurate answers or uncontrollable answers of secondary retrieval, and still be the problem to be solved urgently at present, and have important significance for specific systems (such as internal systems and the like) or specific scene applications.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an intelligent question-answering system, method and equipment established for structured information or structured data based on deep learning. Specifically, the invention discloses the following technical scheme:
in one aspect, the present invention provides a semantic analysis-based question-answering method for structured information, the method comprising:
s1, receiving structured data, preprocessing the structured data, and storing the preprocessed data into a database; the preprocessing comprises determining the subject, subject attribute and data merging of the structured data;
S2, marking the preprocessed data to form marking data, and training a semantic analysis model based on the marking data; receiving a user question, inputting the semantic analysis model, and obtaining an output answer;
s3, training a semantic judgment model based on the labeling data; the semantic judgment model receives the output answer and generates a query sentence;
s4, inquiring and outputting a precise answer based on the inquiry statement.
Preferably, in the semantic analysis model, 6 cascaded encoder structures encode input data to form encoded information; the coding information is simultaneously and respectively input into 6 cascaded decoders, and meanwhile, the upper-stage decoder inputs the processed data into the lower-stage decoder, and the final-stage decoder outputs an output answer of the semantic analysis model;
wherein the input matrix dimension of the encoder is the same as the output matrix dimension.
Preferably, after receiving the user question in S2, the method further includes:
s21, obtaining a representation vector X of each word in a user question, wherein the representation vector X comprises a word vector and a word position vector;
s22, inputting the representation vector X into an encoder in the semantic analysis model to obtain an encoding information matrix C;
S23, the coding information matrix C is sent to 6 mutually cascaded decoders in the semantic analysis model, and the decoder predicts the next word according to the current translated word until a complete output answer is generated.
Preferably, the word position vector is calculated by:
where pos represents the position of the word in the sentence, d model Representing the total dimension of the word position vector, 2i representing the even dimension, 2i+1 representing the odd dimension, i representing the word being the i-th position in the sequence.
Preferably, the encoder introduces a self-attention calculation to weight the input of the encoder, the self-attention calculation being in the form of:
wherein Q is a query matrix, K is a key matrix, V is a value matrix, d K Representing the dimensions of the key vector.
Preferably, the decoder calculates a query matrix Q based on input data, calculates the similarity between the query matrix Q and a key matrix K, and normalizes the query matrix Q through a softmax function to obtain attention weight; the attention weight is used for carrying out weighted summation on the coded information matrix C so as to realize the prediction of the next word until a complete output answer is generated.
Preferably, the representation vectors X are respectively transferred into three different fully connected layers, respectively resulting in a query matrix Q, a key matrix K and a value matrix V.
Preferably, the data annotation in S2 includes an aggregation type, a condition and a relationship of the annotation data.
Preferably, in the step S2, in training the semantic analysis model, the loss function is:
wherein y represents the real label, p represents the prediction probability generated by the model, N represents the number of categories, y i Probability, p, of representing the ith class of real tags i The ith category representing the model-generated predictive probability.
Preferably, the evaluation of the semantic analysis model uses a BLEU score evaluation, namely:
wherein:
BP represents penalty factors; p represents the exact match rate of the N-gram; n represents the maximum length of the N-gram.
Preferably, the semantic judgment model is: the multi-head self-attention layer is used as an input layer of the model, the multi-head self-attention layer is connected with the first residual error connecting layer, and input data of the model are simultaneously input into the multi-head self-attention layer and the first residual error connecting layer; the first residual error connecting layer is sequentially connected with the forward propagation layer and the second residual error connecting layer, and the output of the first residual error connecting layer is simultaneously input into the forward propagation layer and the second residual error connecting layer;
the output of the second residual connection layer is mapped to an output layer with two nodes through a full connection layer, and the output data of the output layer is converted into probability distribution through a softmax function for two classification.
Preferably, the semantic judgment model adopts an adaptive learning rate during training, and the specific mode is as follows:
initializing the historical gradient square sum cumulative variable G of each parameter w to be 0;
in training iterations, the gradient g of the parameter w is calculated t
Updating the square sum of gradients and the accumulated variable to square g of the current gradient t 2 Accumulating into the historical gradient squared and cumulative variable G, let:
combined learning rate LR t Updating a parameter w:
preferably, generating the query statement further comprises:
analyzing the theme from the output answers obtained by the semantic analysis model, and judging whether corresponding theme data exist or not;
analyzing attribute names and attribute values from the output answers;
analyzing the aggregation type, condition and relation as the query key word of the database;
and generating a database query statement based on the parsed theme, the attribute name, the attribute value, the aggregation type, the relationship and the condition.
On the other hand, the invention also provides a question-answering system based on semantic analysis aiming at the structured information, which comprises the following steps:
the data processing module is used for receiving the structured data, preprocessing the structured data and storing the preprocessed data into the database; the preprocessing comprises determining the subject, subject attribute and data merging of the structured data;
The data labeling module is used for labeling the preprocessed data to form labeling data;
the semantic analysis module is used for receiving a user question and obtaining an output answer;
the semantic judgment module is used for receiving the output answer and generating a query statement;
the query module is used for querying and outputting accurate answers based on query sentences;
and the database is used for storing the historical data and the intermediate data.
In yet another aspect, the present invention also provides a semantic analysis based question-answering device for structured information, the device including a processor, a memory, the processor invoking instructions in the memory to perform a semantic analysis based question-answering method for structured information as described above.
Compared with the prior art, the scheme has the following advantages: the scheme is based on the semantic understanding technology, can deeply understand the problem background and the semantics of the user, and provides more accurate and targeted answers. Meanwhile, the structured data is adopted for question-answer matching, and a large amount of manpower and material resource is not required for updating and maintaining the structured data. The method not only can improve the satisfaction and experience of the user, but also can save the operation cost and improve the working efficiency. In addition, the scheme also ensures the accuracy of the data secondary retrieval answers, has less resource consumption, and avoids the problems of information confusion or uncontrollable answers caused by pushing a plurality of related answers.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a system execution flow according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a semantic analysis model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a text encoding process according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a semantic judgment model according to an embodiment of the present invention;
fig. 5 is a block diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be understood that the described embodiments are only some, but not all, of the embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It will be appreciated by those of skill in the art that the following specific embodiments or implementations are provided as a series of preferred arrangements of the present invention for further explanation of the specific disclosure, and that the arrangements may be used in conjunction or association with each other, unless it is specifically contemplated that some or some of the specific embodiments or implementations may not be associated or used with other embodiments or implementations. Meanwhile, the following specific examples or embodiments are merely provided as an optimized arrangement, and are not to be construed as limiting the scope of the present invention.
In a specific embodiment, referring to fig. 1, the flow of the semantic analysis-based question-answering method for structured information provided by the present invention is as follows:
1. structured data modeling
First, we need to model and basically process the structured data to be processed or retrieved. In a preferred embodiment, for structured data, we perform validation of data topics, reasonable merging of data, and attribute determination, to serve as a basis for subsequent data warehousing and model construction.
1. Data topic validation. The topic of each type of data is determined, wherein the topic comprises Chinese names and English names, the data structures of the same topic are consistent, and the topic can correspond to a table in a traditional database system.
2. Data merging, namely carrying out data merging on a subject with little field difference, wherein the judgment of the data field difference can be carried out in a quantitative mode, for example, the proportion of the number of the difference fields to the total number of the fields of the subject is lower than 10% or 15%, and the like, and the difference fields can be regarded as the little field difference, or the specific number value of the difference fields can be used as a judgment standard, for example, when the difference fields are smaller than or equal to a certain specific threshold value, the difference fields can be regarded as the little field difference; the larger the semantic difference between the subject and the data before the subject and the text represented by the data, the more accurate the query statement generated by the subsequent model.
3. Determining the attribute of the theme, namely determining the English name and the Chinese name of the attribute corresponding to each type of theme, wherein the English name is used in an information system, the Chinese name is used in a semantic model, and the Chinese and English names are in one-to-one relation; the Chinese attribute and the relation thereof need to be analyzed from the Chinese sentence, for example, if the Chinese attribute is 'position', popular sentences such as 'place', 'go to which' and the like can appear in different sentences, and the different fields can be corresponding to the attribute model of 'position', so that the convergence and generalization capability of the model are stronger.
4. And (5) data warehousing, wherein the data with the confirmed theme and attribute are input into a database of an information system.
2. Semantic analysis model construction and model training
1. In a preferred embodiment, as shown in connection with FIG. 2, the semantic analysis model of the present invention requires an attention mechanism. In the network structure of the semantic analysis model, text information is processed through 6 encoders which are serially connected in series to form encoded information; each module of the encoder comprises two sublayers: a multi-head self-attention layer and a full-connection feedforward layer, and adopting residual connection and layer normalization technology. The input to the encoder is a sequence, e.g., a sentence in text, that is first vectorized through the word embedding layer, mapping each word into a fixed length vector. The vector sequence is then processed by a plurality of encoder modules as an input to an encoder to generate an output of the encoder, i.e., encoded information. In each encoder module, the vector sequence is input into a multi-head self-attention layer and a full-connection feedforward layer respectively, and an output of the multi-head self-attention layer and an output of the full-connection feedforward layer are generated and are used as input of the next encoder module after residual connection and layer normalization. The output of the entire encoder is the multi-headed self-attention layer output of the last encoder module.
The decoding portion of the model consists of 6 decoders in series, each decoder module including a multi-headed self-attention layer, an encoder-decoder attention layer, and a fully-connected feed-forward layer. The input to the decoder comprises two parts: firstly, word sequences in a target language are converted into vector sequences through a word embedding layer, and secondly, the output of an encoder, namely the encoding information of a source language, is obtained. Wherein the encoder-decoder attention layer calculates an attention score based on the output of the encoder and the input of the current decoder, thereby obtaining an attention representation of the current decoder input by the encoder. The decoder inputs the output of the encoder and the word sequence of the target language together through the processing of a plurality of decoder modules, and sequentially generates the representation of each word, so that the output of the decoder, namely the translation result of the target language, is finally obtained. The specific data processing process of the model is as follows:
step 1, obtaining a representation vector X of each word in an input sentence, wherein X is obtained by adding a word vector (Embedding) and a word position vector (Embedding). Word vectors are vector representations that map words into a high-dimensional space, enabling the capture of semantic information of the words. The word vector (training) obtained through retraining under the government affair data can be better suitable for the language characteristics in the government affair field, so that the performance of the model is improved.
In the model, in addition to word vectors (references), word position vectors (references) are also required to represent the position of words in sentences. The network model used in the present invention requires additional position coding to help the model understand the position of words in the sequence.
Position coding is used to model each position in an input sequence so that the model can process position information in the sequence. Absolute position coding by coding position information into fixed vectors, explicit correspondence can be established between different positions, in the present invention we use sine and cosine functions to code the positions of the sequence. For example, for a sequence of length N, in absolute position coding, the coding for each position is a fixed vector, where dimension d model The total dimension of the representation vector is a hyper-parameter of the model, which we can set to default value 512, etc. Specifically, for the word at the i-th position in the sentence, its absolute position is encoded as:
where pos represents the position of the word in the sentence, d represents the dimension of pe (i.e., the position encoding) (the same dimension as the word is embedded), 2i represents the even dimension, 2i+1 represents the odd dimension, i.e., 2i.ltoreq.d, 2i+1.ltoreq.d.
This way of calculating pe has several advantages. First, it enables the PE to fit into sentences longer than all sentences in the training set. For example, assuming that the longest sentence in the training set has 20 words, but suddenly a sentence of length 21 is coming, the position code of 21 st bit can be calculated by using this method of formula calculation, so that a longer sentence can be processed.
Secondly, this coding scheme also allows for easy calculation of the relative position. For a fixed length of pitch k, the calculation may be performed using pe (pos+k) =pe (pos), resulting in a relative position code. This is very important for the model, since the relative position information is critical for understanding context and semantic relationships when processing sequence data.
Finally, because sine and cosine functions are introduced into the formula, the PE has an interactive relationship between different dimensions, namely Sin (a+b) =sin (a) Cos (B) +cos (a) Sin (B) and Cos (a+b) =cos (a) Cos (B) -Sin (a) Sin (B). Such a design may effectively embed position information into the input vector, thereby allowing the model to distinguish between different positions, improving the modeling ability of the model for position information.
Step 2, referring to fig. 3, the obtained word representation vector matrix X (n×d) is transmitted into an Encoder (i.e., encoder), where n represents the number of words in the sentence, d represents the dimension of the vector, and after passing through 6 Encoder units (i.e., encoder blocks), the encoding information matrix C (n×d) of all the words in the sentence can be obtained. The output matrix dimension of each Encoder unit (i.e., the Encoder block) is exactly identical to the input.
In the Encoder unit (i.e. the encodable block) we introduce a Self-Attention mechanism (Self-Attention) for weighting the inputs so as to better capture the relationship between the inputs. Specifically, first, the input matrix X is respectively transferred into three different full connection layers, so as to obtain three new matrices, namely, a query matrix Q, a key matrix K and a value matrix V. Assuming that the shape of the input matrix X is (n, d), the weight matrices of the full connection layers are w respectively q、 w k、 w v The following steps are: q=xw q , K = Xw k , V = Xw v Here, the shape of Q, K, V is (n, d), i.e., the same shape as the input matrix X. The self-attention mechanism performs matrix multiplication and transposition operations through a query matrix Q, a key matrix K, and a value matrix V, and calculates the similarity between the query (Q) and the key (K). This can be achieved by multiplication of the dot representation matrix and transposition of the T representation matrix. The similarity is processed by square root operation and softmax function and multiplied by a value matrix V to obtain the final self-attention And outputting. Such an attention mechanism allows the model to automatically weight different locations of the input, thereby better capturing long-range dependencies. The specific calculation mode of the self-attention output is as follows:
in the above, d K Representing the dimensions of the key vector. The self-attention mechanism generates a new representation by computing the similarity between the query, the key and the value and weighting the value (the weighting may be, for example, by weighting the value matrix V according to the weight distribution of the query and the key), resulting in the encoded information for each word in the sentence. The use of self-attention mechanisms in the Encoder unit (i.e., the encodable block) has a significant impact on the performance of the text processing task. The method can allow the model to automatically weight different input positions, so that long-distance dependency relations in sentences can be captured better, and the representation capability and generalization performance of the model are improved. Meanwhile, the self-attention mechanism has higher calculation efficiency and is suitable for processing longer text sequences.
The use of self-attention mechanisms in the encodable block has an important role in generating the encoded information matrix C of sentences. By calculating the similarity between the query, key and value, and weighting and summing the values, the self-attention mechanism can allow the model to automatically weight different locations of the input, thereby better capturing long-distance dependencies and generating codes with rich semantic information.
Step 3, in the encoding stage, the input sequence is subjected to a series of encoding operations to obtain an encoding information matrix C; in the decoding phase, the coded information matrix C is passed to a Decoder (i.e., decoder), which in turn predicts the next word from the currently translated word, masking the untranslated word by Mask operation until a complete answer is generated.
In a Decoder (i.e., decoder), a question start symbol is first input for initializing the state of the Decoder (i.e., decoder). Subsequently, the Decoder (i.e., the Decoder) uses an Attention mechanism (Attention) to weight the coded information matrix C in order to better capture the correspondence between the input and the output. Specifically, the Decoder calculates a query matrix Q, performs similarity calculation on the query matrix Q and a key matrix K of the code information matrix C, and normalizes the query matrix Q by a softmax function to obtain the attention weight. These weights are used to weight sum the coded information matrix C to obtain a context vector for assisting in generating a prediction of the current word.
In generating the next word, in order to avoid leakage of future information, mask operation is required to Mask the word following the current word. This means that the Decoder can only predict the next word on the basis of the word currently translated (i.e. the predicted vector and the total vector of the previous word continue to be input into the Decoder, the next word is output) without accessing the information of the future word, thereby preserving the rationality of the model. As the Decoder progressively generates words, top-k most likely words are selected from the probability distribution as candidates, each of which is stored in a list along with the current hidden state and cell state for a Beam Search (Beam Search) to select a word sequence with a higher probability of generation. The beam search reduces the search space and improves the search efficiency by limiting the consideration of only top-k candidates in each step. In this step, the decoder calculates the hidden state and the cell state of the current time step according to the input of the current time step and the hidden state of the previous time step. The current hidden state contains the previous inputs and state information and can be seen as a memory unit to help the decoder to generate the correct output at the current time step. The input and previous state information are input to a gate cycle unit or long and short memory unit for each time step of the decoder to calculate and generate a new state information, i.e. the current cell state. The solution of these states is well known to those skilled in the art.
Through the above encoding and decoding operations, the Encoder-Decoder model may predict the next word step by step during the generation process until a complete answer is generated.
2. And (5) data collection. Question sentences related to the related structured data are collected.
3. And (5) marking data.
After the data collection is completed, the collected historical questions are annotated, and in this embodiment, we prefer the format of the annotation to be as follows:
polymerization type: chinese attribute name aggregate type: chinese attribute name #Chinese attribute name condition Chinese value relationship Chinese attribute name condition Chinese value #theme
The aggregation type is an optional label, and comprises the following aggregation types: average, maximum, minimum, total, sum
The conditions include: equal to, unequal to, greater than, and less than
The relationships include: and or
Here, we further describe as a practical example:
question mark: how many patents are filed from 2000 to the present?
Labeling: total number # release date is more than 2000 and name is equal to Zhang San # patent library
4. And (5) model training.
In model training, we can take these labeled historical data as the input of the model, the model will generate analysis results according to the input data, calculate loss values by comparing with labeled information using loss functions, and train the model by optimizing the loss values. The training process is typically performed in multiple rounds, with each round of iteration updating the weights of the model until the results of the model on the training set converge.
Wherein the loss function measures the accuracy of the prediction by calculating the cross entropy between the probability distribution generated by the model and the probability distribution of the real label. Specifically, for each predicted word or token, the loss function computes the negative logarithm of its corresponding probability value in the real label, which is then summed as a whole. The loss function is specifically set as follows:
wherein y is the real label (0 or 1), p is the prediction probability generated by the model, the value range of p is between 0 and 1, N is the number of categories, y i Probability (0 or 1), p, of being the ith class of the real tag i Is the ith class of model-generated predictive probabilities (ranging between 0 and 1), log represents the natural logarithm.
A loss function threshold is set and it is checked at each iteration whether the value of the loss function is below the threshold. If the threshold is reached, training is stopped. This ensures that the model has converged to a better state.
5. Model evaluation
In model evaluation, we use the BLEU criterion to measure the similarity between the analysis result and the annotation data, based on the matching of N-grams (i.e., N words in succession), and the exact and fuzzy matching between the analysis result and the annotation data, and for the analysis result and the annotation data, the N-gram matching numbers thereof are calculated first. N-gram matching refers to the number of N-grams that are identical in the analysis results and annotation data in N consecutive words. For the analysis result, the number of occurrences of the N-gram thereof is counted and limited to the maximum number of occurrences of the same N-gram in the analysis result. This is to prevent the analysis result from being longer in length and obtaining a higher score. For N-gram matching, the BLEU score considers both exact and fuzzy matches.
Exact matching refers to the number of identical N-grams in the analysis result and the annotation data, while fuzzy matching refers to the number of N-grams with partial identity in the analysis result and the annotation data.
The BLEU score distinguishes the matching of different N-grams by weighting the number of N-gram matches. In general, shorter N-grams (e.g., 1-gram and 2-gram) will get higher weights in the computation because they are more critical to grammar and vocabulary accuracy. And accumulating the weighted N-gram matching numbers to obtain a final BLEU score. In general, the BLEU score ranges from 0 to 1, with a value closer to 1 indicating that the analysis result is closer to the labeling data.
Wherein:
BP: a penalty factor in the evaluation index is used to penalize the case where the length of the machine translation result is too long, and in a preferred embodiment, BP can be calculated by: bp=exp (min (0, 1-reference translation result length/machine translation result length)) where the reference translation result length is the total number of all words in the reference translation result and the machine translation result length is the total number of all words in the machine translation result;
and p: the exact match rate of the n-gram;
n: maximum length of n-gram;
sum (): representing a summation calculation.
The BLEU score is calculated by comparing the similarity between sentences generated by the machine translation system and a plurality of reference translations, so as to obtain a score between 0 and 1, wherein the higher the score is, the closer the output of the machine translation system is to the reference translations. The model is better when the model result is closer to 1, and the model parameters are adjusted or the optimized data set is enhanced when the model result is worse.
3. Semantic judgment model construction and model training
In the invention, after the semantic analysis model, a semantic judgment model is added, and the semantic judgment model is added, so that the main functions are as follows:
results screening: the text results generated by the semantic analysis model may contain a number of possible options, but in some cases the analysis results of the model are not business-demanding and even the analysis results themselves are erroneous. By adding a classification model, the text result of the semantic analysis result is screened, and only the result which meets the expectations is selected, so that the quality and the accuracy of the final output result are ensured.
Result controllability: the semantic analysis model is generative and may have some randomness and uncertainty in the output. By introducing a classification model, the generated results can be further screened.
Prediction confidence: the classification model may output a predictive probability or confidence level for each option to help evaluate the reliability of each option. This can be used to measure confidence in the results of the semantic analysis model and can be used as a basis for decision making, particularly in question-answering systems, where high reliability and certainty are required.
Error propagation is reduced: by adding a classification model, the generated result can be subjected to primary screening, and the risk of backward propagation of an error result is reduced, so that the robustness and reliability of the whole system are improved. The architecture of the semantic judgment model is shown in fig. 4.
In a preferred embodiment, the semantic judgment model can be set as follows:
the multi-head self-attention layer is used as an input layer of the model, the multi-head self-attention layer is connected with the first residual error connecting layer, and input data of the model are simultaneously input into the multi-head self-attention layer and the first residual error connecting layer; the first residual error connecting layer is sequentially connected with the forward propagation layer and the second residual error connecting layer, and the output of the first residual error connecting layer is simultaneously input into the forward propagation layer and the second residual error connecting layer;
the hidden representation of the second residual connection layer output (also called pooling vector) is used as input, mapped by a full connection layer to an output layer with two nodes, and finally converted into a probability distribution using the softmax function for classification prediction.
Let the pooling vector output by the second residual connection layer be h E R d Where d is the vector dimension. We can take h as input, calculate the output vector o e R by a fully connected layer with two nodes 2 Wherein each node represents a class of two classification tasks, as follows: o=wh+b, where w is the weight matrix and b is the bias vector. The output vector o of this fully connected layer can be regarded as a score for each class, which can be regarded as a fraction of the two classification problems predicted as positive and negative. To convert the output into a probability distribution, I amThey can use the softmax function as follows: y=softmax (o), the prediction results are 0 and 1, when the prediction result is 1, the generated question-answer result is available, and when the prediction result is 0, the generated question-answer result is not available.
1. Data collection
The marked historical data set is taken as a positive sample, and the aggregation type, condition, relation and the like of partial data in the marked historical data set are randomly replaced by candidate values (namely, replaced by wrong values) which are not the candidate values, and the negative sample is taken. Other unexpected prediction results found in the training and testing processes are collected as negative samples and added into the negative sample set.
2. Data annotation
We mark the positive samples as 1 and the negative samples as 0. Of course, we can label the positive and negative samples as other numbers to distinguish them for subsequent sample training.
3. Model training
The self-adaptive learning rate method is adopted during training, and the learning rate is automatically adjusted according to the historical gradient information of each parameter, so that the self-adaptive learning rate adjustment of different parameters is realized, the parameters with smaller gradients are more concerned, and the following detailed steps are carried out:
(1) Initializing: for each parameter w, its historical gradient squared and cumulative variable G is initialized to 0.
(2) In each training iteration, a gradient g of the parameter is calculated t
(3) Updating the gradient square sum cumulative variable: square the current gradient g t 2 Cumulative into historical gradient squared and cumulative variable G: g t = G T-1 +g t 2
(4) Updating parameters: using calculated learning rateUpdating parameters: />
Wherein w is t+1 The updated model parameters; w (w) t Is the current model parameter;g, for calculating the learning rate t Is a gradient.
4. Model evaluation
After model training, the semantic judgment model needs to be evaluated, so that the accuracy and the reliability of the semantic judgment model are ensured.
Since the distribution of positive and negative samples in data is often unbalanced, F1 is used as an evaluation result:
wherein TP represents a real case: representing the number of samples actually being predicted as positive;
TN represents true counterexample: representing the number of samples that are actually negative and predicted to be negative;
FP represents a false positive example: representing the number of samples that are actually negative but predicted to be positive;
FN represents the false counter example: representing the number of samples that are actually positive but predicted to be negative.
4. Query statement generation
After the model is built, a generation link of the query statement is entered. The link mainly comprises the following steps.
1. Topic parsing
The Chinese attribute name is the topic analyzed by the semantic analysis model, we need to convert the Chinese topic name in the model result into English topic name, and judge whether the corresponding topic data exists, if not, the process is ended
2. Attribute resolution
The attribute resolution includes attribute name resolution and attribute value resolution.
And the attribute name analysis is to convert the Chinese attribute name in the semantic analysis model result into an English attribute name, and if the corresponding attribute does not exist under the corresponding theme, ending the flow.
After the attribute name is resolved, the corresponding attribute value is resolved, the attribute value is resolved based on the corresponding attribute type, for example, a time type field resolves the value into a time stamp, and a value of a numerical type is resolved into a corresponding numerical type.
3. Aggregation type, condition and relationship resolution
Analyzing the aggregation type, condition and relation generated by the semantic analysis model into query keywords corresponding to the information system database system, taking mysql as an example:
resolving the average value, the maximum value, the minimum value, the total number and the summation in the aggregation type into avg, sum, max, min, count, sum;
resolving equality, inequality, greater than and less than in the conditions into =, +| =, >;
the sum or analysis of the relationship is called and or.
4. Generating query statements
And generating a query statement according to the grammar rule of the actual database system by combining the analyzed theme, the analyzed attribute and the analyzed aggregation type, the analyzed condition and the analyzed relation.
5. Accurate answer query
And inquiring the accurate answer from the information system by using the generated inquiry statement and returning.
In addition, in another specific embodiment, the technical solution of the present invention may also be implemented by a semantic analysis based question-answering system for structured information, where the system includes:
the data processing module is used for receiving the structured data, preprocessing the structured data and storing the preprocessed data into the database; the preprocessing comprises determining the subject, subject attribute and data merging of the structured data;
The data labeling module is used for labeling the preprocessed data to form labeling data;
the semantic analysis module is used for receiving a user question and obtaining an output answer;
the semantic judgment module is used for receiving the output answer and generating a query statement;
the query module is used for querying and outputting accurate answers based on query sentences;
and the database is used for storing the historical data and the intermediate data.
In another specific embodiment, the solution of the present invention may also be implemented by means of a device, i.e. configured as an electronic device. Fig. 5 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
As shown in fig. 5, the device 600 includes one or more processors 601 and memory 602.
The processor 601 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or information execution capabilities and may control other components in the electronic device 600 to perform desired functions.
The memory 601 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program information may be stored on the computer readable storage medium and the processor 601 may run the program information to implement the leak monitoring and pre-warning method of multi-component hydrocarbon or other desired functions of the various embodiments of the invention described above.
In one example, the device 600 may further include: input device 603 and output device 604, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiment of the present invention. The processor performs the various methods and processes described above. For example, method embodiments in the present solution may be implemented as a software program tangibly embodied on a machine-readable medium, such as a memory. In some embodiments, part or all of the software program may be loaded and/or installed via memory and/or a communication interface. One or more of the steps of the methods described above may be performed when a software program is loaded into memory and executed by a processor. Alternatively, in other embodiments, the processor may be configured to perform one of the methods described above in any other suitable manner (e.g., by means of firmware).
Logic and/or steps represented in the flowcharts or otherwise described herein may be embodied in any readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (8)

1. A semantic analysis-based question-answering method for structured information, the method comprising:
s1, receiving structured data, preprocessing the structured data, and storing the preprocessed data into a database; the preprocessing comprises determining the subject, subject attribute and data merging of the structured data; the subject corresponds to a table in a conventional database system;
S2, marking the preprocessed data to form marking data, and training a semantic analysis model based on the marking data; receiving a user question, inputting the semantic analysis model, and obtaining an output answer;
s3, training a semantic judgment model based on the labeling data; the semantic judgment model receives the output answer and generates a query sentence;
s4, inquiring and outputting a precise answer based on the inquiry statement;
in the semantic analysis model, 6 cascade encoder structures encode input data to form encoded information; the coding information is simultaneously and respectively input into 6 cascaded decoders, and meanwhile, the upper-stage decoder inputs the processed data into the lower-stage decoder, and the final-stage decoder outputs an output answer of the semantic analysis model; wherein the input matrix dimension of the encoder is the same as the output matrix dimension;
the semantic judgment model is as follows: the multi-head self-attention layer is used as an input layer of the model, the multi-head self-attention layer is connected with the first residual error connecting layer, and input data of the model are simultaneously input into the multi-head self-attention layer and the first residual error connecting layer; the first residual error connecting layer is sequentially connected with the forward propagation layer and the second residual error connecting layer, and the output of the first residual error connecting layer is simultaneously input into the forward propagation layer and the second residual error connecting layer;
The output of the second residual error connecting layer is mapped to an output layer with two nodes through a full connecting layer, and the output data of the output layer is converted into probability distribution through a softmax function so as to carry out two classification;
in the step S3, generating the query sentence further includes:
s31, topic analysis: the Chinese attribute names are topics analyzed by the semantic analysis model, the Chinese topic names are converted into English topic names, whether corresponding topic data exist or not is judged, and if the corresponding topic data do not exist, the process is ended;
s32, analyzing the attribute: the attribute analysis comprises attribute name analysis and attribute value analysis; the attribute name analysis is to convert the Chinese attribute name in the semantic analysis model result into an English attribute name, and if no corresponding attribute exists under the corresponding theme, ending the flow; after the attribute name analysis, carrying out corresponding attribute value analysis;
s33, analyzing aggregation type, conditions and relations: analyzing the aggregation type, condition and relation generated by the semantic analysis model into query keywords corresponding to the information system database system;
s34, generating a query statement: and generating a query statement according to the grammar rule of the actual database system by combining the analyzed theme, the analyzed attribute and the analyzed aggregation type, the analyzed condition and the analyzed relation.
2. The method according to claim 1, wherein after receiving the user question in S2 further comprises:
s21, obtaining a representation vector X of each word in a user question, wherein the representation vector X comprises a word vector and a word position vector;
s22, inputting the representation vector X into an encoder in the semantic analysis model to obtain an encoding information matrix C;
s23, the coding information matrix C is sent to 6 mutually cascaded decoders in the semantic analysis model, and the decoder predicts the next word according to the current translated word until a complete output answer is generated.
3. The method of claim 2, wherein the word position vector is calculated by:
where pos represents the position of the word in the sentence, d model Representing the total dimension of the word position vector, 2i representing the even dimension, 2i+1 representing the odd dimension, i representing the word being the i-th position in the sequence.
4. The method of claim 2, wherein the encoder directs a self-attention calculation to weight the input to the encoder, the self-attention calculation being performed by:
wherein Q is a query matrix, K is a key matrix, V is a value matrix, d K Representing the dimensions of the key vector.
5. The method of claim 4, wherein the decoder calculates a query matrix Q based on the input data, performs similarity calculation with a key matrix K, and normalizes the query matrix Q by a softmax function to obtain the attention weight; the attention weight is used for carrying out weighted summation on the coded information matrix C so as to realize the prediction of the next word until a complete output answer is generated.
6. The method according to claim 1, wherein in S2, the loss function in training the semantic analysis model is:
wherein y represents the real label, p represents the prediction probability generated by the model, n represents the number of categories, y i Probability, p, of representing the ith class of real tags i The ith category representing the model-generated predictive probability.
7. A semantic analysis based question-answering system for structured information, the system comprising:
the data processing module is used for receiving the structured data, preprocessing the structured data and storing the preprocessed data into the database; the preprocessing comprises determining the subject, subject attribute and data merging of the structured data; the subject corresponds to a table in a conventional database system;
The data labeling module is used for labeling the preprocessed data to form labeling data;
the semantic analysis module is used for receiving a user question and obtaining an output answer;
the semantic judgment module is used for receiving the output answer and generating a query statement;
the query module is used for querying and outputting accurate answers based on query sentences;
the database is used for storing historical data and intermediate data;
in the semantic analysis module, 6 cascade encoder structures encode input data to form encoded information; the coding information is simultaneously and respectively input into 6 cascaded decoders, and meanwhile, the upper-stage decoder inputs the processed data into the lower-stage decoder, and the final-stage decoder outputs an output answer of the semantic analysis model; wherein the input matrix dimension of the encoder is the same as the output matrix dimension;
the semantic judgment module is configured to: the multi-head self-attention layer is used as an input layer of the model, the multi-head self-attention layer is connected with the first residual error connecting layer, and input data of the model are simultaneously input into the multi-head self-attention layer and the first residual error connecting layer; the first residual error connecting layer is sequentially connected with the forward propagation layer and the second residual error connecting layer, and the output of the first residual error connecting layer is simultaneously input into the forward propagation layer and the second residual error connecting layer;
The output of the second residual error connecting layer is mapped to an output layer with two nodes through a full connecting layer, and the output data of the output layer is converted into probability distribution through a softmax function so as to carry out two classification;
in the semantic judgment module, the process of generating the query sentence further comprises:
s31, topic analysis: the Chinese attribute names are topics analyzed by the semantic analysis model, the Chinese topic names are converted into English topic names, whether corresponding topic data exist or not is judged, and if the corresponding topic data do not exist, the process is ended;
s32, analyzing the attribute: the attribute analysis comprises attribute name analysis and attribute value analysis; the attribute name analysis is to convert the Chinese attribute name in the semantic analysis model result into an English attribute name, and if no corresponding attribute exists under the corresponding theme, ending the flow; after the attribute name analysis, carrying out corresponding attribute value analysis;
s33, analyzing aggregation type, conditions and relations: analyzing the aggregation type, condition and relation generated by the semantic analysis model into query keywords of a corresponding database system;
s34, generating a query statement: and generating a query statement according to the grammar rules of the database system by combining the analyzed theme, the analyzed attribute and the analyzed aggregation type, the analyzed condition and the analyzed relation.
8. A semantic analysis based question and answer device for structured information, the device comprising a processor, a memory, the processor invoking instructions in the memory to perform the semantic analysis based question and answer method for structured information of any of claims 1-6.
CN202310889872.7A 2023-07-20 2023-07-20 Semantic analysis-based question answering method, system and equipment for structured information Active CN116610791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310889872.7A CN116610791B (en) 2023-07-20 2023-07-20 Semantic analysis-based question answering method, system and equipment for structured information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310889872.7A CN116610791B (en) 2023-07-20 2023-07-20 Semantic analysis-based question answering method, system and equipment for structured information

Publications (2)

Publication Number Publication Date
CN116610791A CN116610791A (en) 2023-08-18
CN116610791B true CN116610791B (en) 2023-09-29

Family

ID=87680417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310889872.7A Active CN116610791B (en) 2023-07-20 2023-07-20 Semantic analysis-based question answering method, system and equipment for structured information

Country Status (1)

Country Link
CN (1) CN116610791B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766417A (en) * 2018-11-30 2019-05-17 浙江大学 A kind of construction method of the literature annals question answering system of knowledge based map
WO2021139283A1 (en) * 2020-06-16 2021-07-15 平安科技(深圳)有限公司 Knowledge graph question-answer method and apparatus based on deep learning technology, and device
CN115309879A (en) * 2022-08-05 2022-11-08 中国石油大学(华东) Multi-task semantic parsing model based on BART
CN116303971A (en) * 2023-03-29 2023-06-23 重庆交通大学 Few-sample form question-answering method oriented to bridge management and maintenance field
CN116414962A (en) * 2023-04-11 2023-07-11 南京邮电大学 Question-answer matching method based on attention mechanism

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102033395B1 (en) * 2014-11-20 2019-10-18 한국전자통신연구원 Question answering system and method for structured knowledge-base using deep natrural language question analysis
WO2021000362A1 (en) * 2019-07-04 2021-01-07 浙江大学 Deep neural network model-based address information feature extraction method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766417A (en) * 2018-11-30 2019-05-17 浙江大学 A kind of construction method of the literature annals question answering system of knowledge based map
WO2021139283A1 (en) * 2020-06-16 2021-07-15 平安科技(深圳)有限公司 Knowledge graph question-answer method and apparatus based on deep learning technology, and device
CN115309879A (en) * 2022-08-05 2022-11-08 中国石油大学(华东) Multi-task semantic parsing model based on BART
CN116303971A (en) * 2023-03-29 2023-06-23 重庆交通大学 Few-sample form question-answering method oriented to bridge management and maintenance field
CN116414962A (en) * 2023-04-11 2023-07-11 南京邮电大学 Question-answer matching method based on attention mechanism

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
毕铭文 ; 卞玉芳 ; 左敏 ; 张青川 ; .BLSTM-PA在食品安全领域语义分析仿真研究.计算机仿真.2020,(03),343-348. *
知识库问答中复杂问题的结构化查询生成方法研究;赵满;中国优秀硕士学位论文全文数据库信息科技辑;I138-665 *
赵满.知识库问答中复杂问题的结构化查询生成方法研究.中国优秀硕士学位论文全文数据库信息科技辑.2022,I138-665. *

Also Published As

Publication number Publication date
CN116610791A (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN111310438B (en) Chinese sentence semantic intelligent matching method and device based on multi-granularity fusion model
CN111651589B (en) Two-stage text abstract generation method for long document
CN114548101B (en) Event detection method and system based on backtracking sequence generation method
CN116662582B (en) Specific domain business knowledge retrieval method and retrieval device based on natural language
Wang et al. A transfer-learnable natural language interface for databases
CN117076653B (en) Knowledge base question-answering method based on thinking chain and visual lifting context learning
CN116719520B (en) Code generation method and device
CN114510946B (en) Deep neural network-based Chinese named entity recognition method and system
CN110633473B (en) Implicit discourse relation identification method and system based on conditional random field
Xiao et al. Cqr-sql: Conversational question reformulation enhanced context-dependent text-to-sql parsers
CN116342167B (en) Intelligent cost measurement method and device based on sequence labeling named entity recognition
CN111581365B (en) Predicate extraction method
CN116861269A (en) Multi-source heterogeneous data fusion and analysis method in engineering field
CN117193823A (en) Code workload assessment method, system and equipment for software demand change
CN116610791B (en) Semantic analysis-based question answering method, system and equipment for structured information
CN110287487A (en) The recognition methods of subject-predicate language, device, equipment and computer readable storage medium
CN113076089B (en) API (application program interface) completion method based on object type
CN115437626A (en) OCL statement automatic generation method and device based on natural language
CN115238705A (en) Semantic analysis result reordering method and system
CN115203206A (en) Data content searching method and device, computer equipment and readable storage medium
Wang et al. Event extraction via dmcnn in open domain public sentiment information
Zhou et al. RNN-based sequence-preserved attention for dependency parsing
CN117407051B (en) Code automatic abstracting method based on structure position sensing
CN115114915B (en) Phrase identification method, device, equipment and medium
Yin et al. Probabilistic graph attention for relation extraction for domain of geography

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant