WO2021082982A1 - 基于图谱化知识库的问答方法、设备、存储介质及装置 - Google Patents

基于图谱化知识库的问答方法、设备、存储介质及装置 Download PDF

Info

Publication number
WO2021082982A1
WO2021082982A1 PCT/CN2020/122136 CN2020122136W WO2021082982A1 WO 2021082982 A1 WO2021082982 A1 WO 2021082982A1 CN 2020122136 W CN2020122136 W CN 2020122136W WO 2021082982 A1 WO2021082982 A1 WO 2021082982A1
Authority
WO
WIPO (PCT)
Prior art keywords
question
text information
answer
graphed
knowledge base
Prior art date
Application number
PCT/CN2020/122136
Other languages
English (en)
French (fr)
Inventor
余文礼
杨坤
许开河
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021082982A1 publication Critical patent/WO2021082982A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the technical field of artificial intelligence, and in particular to a question and answer method, equipment, storage medium and device based on a graphed knowledge base.
  • the question and answer method based on a graphed knowledge base includes the following steps:
  • the question answering device based on a graphed knowledge base includes a memory, a processor, and a graph based question and answer device that is stored on the memory and can run on the processor.
  • the question and answer program of the knowledge base, the question and answer program based on the graphed knowledge base is configured to implement the following steps:
  • This application also proposes a storage medium that stores a question and answer program based on a graphed knowledge base, and the following steps are implemented when the question and answer program based on a graphed knowledge base is executed by a processor:
  • This application also proposes a question answering device based on a graphed knowledge base, and the question answering device based on a graphed knowledge base includes:
  • the voice recognition module is configured to perform voice detection, and when the user's question voice is detected, perform voice recognition on the question voice to obtain question text information;
  • the element recognition module is configured to perform element recognition on the question text information through a preset element recognition model, and obtain the question elements corresponding to the question text information;
  • the acquisition module is used to acquire the knowledge elements of multiple storage problems in the preset graphed knowledge base;
  • the matching module is used to match the question elements of the question text information with the knowledge elements of each of the stored questions;
  • the display module is used to display the answer corresponding to the successfully matched storage question if the matching is successful.
  • FIG. 1 is a schematic structural diagram of a question and answer device based on a graphed knowledge base in a hardware operating environment involved in a solution of an embodiment of the present application;
  • FIG. 2 is a schematic flowchart of the first embodiment of the question and answer method based on the graphed knowledge base of this application;
  • FIG. 3 is a schematic flowchart of a second embodiment of the question and answer method based on the graphed knowledge base of this application;
  • FIG. 4 is a schematic flowchart of a third embodiment of the question and answer method based on the graphed knowledge base of this application;
  • Fig. 5 is a structural block diagram of the first embodiment of the question answering device based on the graphed knowledge base of the present application.
  • FIG. 1 is a schematic diagram of the structure of a question answering device based on a graphed knowledge base in a hardware operating environment involved in a solution of an embodiment of the application.
  • the question and answer device based on the graphed knowledge base may include a processor 1001, such as a central processing unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the wired interface of the user interface 1003 may be a USB interface in this application.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a wireless fidelity (WIreless-FIdelity, WI-FI) interface).
  • the memory 1005 can be a high-speed random access memory (Random Access Memory, RAM) memory, can also be stable memory (Non-volatile Memory, NVM), such as disk storage.
  • RAM Random Access Memory
  • NVM Non-volatile Memory
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
  • FIG. 1 does not constitute a limitation on the question and answer device based on the graphed knowledge base, and may include more or less components than shown in the figure, or combine certain components, or different The layout of the components.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a question and answer program based on a graphed knowledge base.
  • the network interface 1004 is mainly used to connect to a back-end server to communicate data with the back-end server;
  • the user interface 1003 is mainly used to connect to user equipment;
  • the question answering device of the knowledge base calls the question answering program based on the graphed knowledge base stored in the memory 1005 through the processor 1001, and executes the question answering method based on the graphed knowledge base provided in the embodiment of the present application.
  • the question and answer method based on the graphed knowledge base includes the following steps:
  • Step S10 Perform voice detection. When the user's question voice is detected, voice recognition is performed on the question voice to obtain question text information.
  • the execution subject of this embodiment is the question answering device based on the graphed knowledge base, where the question answering device based on the graphed knowledge base may be an electronic device such as a smart phone, a personal computer, or a server.
  • the intelligent question answering system in the question answering device based on the graphed knowledge base can obtain a voice signal from a speaker through a microphone. Digital signal processing will first detect whether there is voice in the audio signal captured by the microphone, and then analyze the audio signal to predict what the text is said in the received audio signal. Voice activity detection (Voice Activity Detection, abbreviated VAD) can be used for voice detection.
  • VAD Voice Activity Detection
  • the computer preprocesses the question voice of the detected user, then extracts the characteristics of the voice, and obtains the pre-established voice recognition template.
  • the computer compares the voice template stored in the computer with the input characteristics of the question voice according to the voice recognition model, and finds a series of optimal templates that match the input voice according to a certain search and matching strategy . Then according to the definition of this template, the recognition result of the computer is obtained by looking up the table.
  • a common transformation method is to extract MFCC features.
  • the sound becomes a matrix with 12 rows (assuming that the acoustic features are 12-dimensional) and N columns, which is called the observation sequence, where N is the total number of frames.
  • the audio data extracted from the feature is output in text through the acoustic model, dictionary, and language model to obtain the question text information.
  • Step S20 Perform element recognition on the question text information through a preset element recognition model, and obtain question elements corresponding to the question text information.
  • the long short-term memory network (Long short-term memory, abbreviated LSTM) is mainly to solve the problem of gradient disappearance and gradient explosion in the training process of long sequences.
  • CRF Conditional Random Fields
  • the CRF layer can add some constraints to the final predicted tag to ensure that the predicted tag is legal.
  • LSTM is used to solve the problem of extracting sequence features. Under the LSTM-CRF model, the output is the best tag sequence.
  • the QEAC element system is designed for question sentences, that is, the question elements include at least one of the QEAC elements, where element Q represents the subject question word, element C represents the target of the question word’s direct action, element E represents the center of the subject question sentence, and element A represents Modified or attributed components can be used to modify E or another A. For example, for the sentence "Which are the purchase channels of wealth management products", the result of the element system marking is " ⁇ #A# ⁇ # Channel#E# ⁇ #C# ⁇ #Q#".
  • the QEAC question element corresponding to the question text information is obtained by labeling the words satisfying the characteristics of each element among the words in the question text information through a preset element recognition model.
  • Step S30 Obtain the knowledge elements of multiple storage problems in the preset graphed knowledge base.
  • the preset graphed knowledge base is pre-established based on a large number of existing question and answer pairs, to obtain existing question and answer pairs, and to classify the existing question and answer pairs according to a preset classification algorithm to obtain a preset type
  • the type of question-and-answer pairs, the element identification of each type of question-and-answer pairs is performed through a preset element recognition model, the sample elements of each type of question-and-answer pair are obtained, and the preset graphical knowledge base is established based on the sample elements, that is, the preset A large number of storage questions and corresponding answers are stored in the graphed knowledge base.
  • the similarity between the question element of the question text information and the knowledge element of each stored question in the preset graphed knowledge base can be calculated, and when the similarity exceeds a preset threshold, it is determined that the matching is successful.
  • Step S40 Match the question elements of the question text information with the knowledge elements of each of the stored questions.
  • the question elements and the knowledge elements in the preset graphed knowledge base are represented in the form of vectors, which may be term frequency (Term Frequency, abbreviation TF) or term frequency-inverse document frequency (Term Frequency-Inverse Document Frequency).
  • Document Frequency abbreviated as TF-IDF
  • TF-IDF term frequency-inverse document frequency
  • the cosine distance between the question element in the form of a vector and each knowledge element in the preset graphed knowledge base is calculated, and the cosine distance is used as the similarity.
  • the preset threshold can be set to 90%, etc., and set according to empirical values.
  • Step S50 If the matching is successful, the answer corresponding to the successfully matched storage question is displayed.
  • the knowledge element is deemed to be a successfully matched knowledge element, and the preset graphed knowledge base includes each element
  • the answer corresponding to the composed question is obtained, and the answer corresponding to the successfully matched knowledge element is obtained, and the answer is displayed.
  • the display method includes the voice playing of the answer and the display section of the question-and-answer device based on the graphical knowledge base. At least one of the surface displays.
  • voice recognition is performed on the question voice to obtain the question text information.
  • the question text information is performed through a preset element recognition model.
  • Element identification obtain the question elements corresponding to the question text information, obtain the knowledge elements of multiple stored questions in the preset graphed knowledge base, and match the question elements of the question text information with the knowledge elements of each of the stored questions If the matching is successful, the answer corresponding to the successfully matched storage question will be displayed.
  • the elements are matched to improve the accuracy of matching between sentences, so as to more accurately from the preset map
  • the corresponding answers can be searched in the chemical knowledge base to improve the accuracy and efficiency of intelligent question and answer.
  • FIG. 3 is a schematic flowchart of the second embodiment of the question and answer method based on the graphed knowledge base of the present application. Based on the first embodiment shown in FIG. 2 above, the first embodiment of the question and answer method based on the graphed knowledge base of the present application is proposed. Second embodiment.
  • the method further includes:
  • Step S401 If the matching fails, calculate the similarity between the question element of the question text information and the knowledge element of each stored question.
  • the similarity between the question element of the question text information and the knowledge element of each of the stored questions is less than the preset threshold, it is determined that the question element of the question text information is consistent with each of the The knowledge element of the stored question fails to match, indicating that the corresponding answer cannot be searched in the preset graphed knowledge base, and the question element of the question text information can be stored in the preset graphed knowledge base.
  • the knowledge element of the question is expressed in a vector form, and TF or TF-IDF may be used to convert the question element and each of the knowledge elements in the preset graphed knowledge base into a vector form.
  • Step S402 Obtain the storage problem with the highest similarity as the closest problem.
  • Step S403 Generate follow-up information according to the closest question, and display the follow-up information.
  • the closest question is compared with the question text information to obtain difference content, and follow-up information is generated according to the difference content.
  • Match each element of the closest question with the question element of the question text information in the context (context), that is, each element of the closest question and each element of the question text information are performed according to the element type Respectively matching, the content of the element that is the same as the closest question and the question text information is matched successfully, and the element that fails to match is the difference content.
  • the generating of follow-up information based on the closest question includes: matching each element in the closest question with each element in the question text information according to element type, and the element that fails to match is regarded as the difference Content; generating follow-up information based on the difference content.
  • the closest question and the question text information can also be segmented separately to obtain all the first words of the closest question and all the second words of the question text information, and the The first word is matched with the second word, the word that is successfully matched is the same content in the closest question and the question text information, and the word that fails to be matched is the difference content.
  • the method further includes:
  • Step S404 Perform voice detection, and when the user's reply voice information based on the follow-up information is detected, perform voice recognition on the reply voice information to obtain reply text information.
  • the digital signal processing will first detect whether there is a voice in the audio signal captured by the microphone, and then analyze the audio signal to predict the text in the received audio signal. Information to obtain the reply text information.
  • Step S405 Determine whether the closest question matches the question text information according to the reply text information.
  • obtaining the difference content between the closest question and the question text information, and matching the reply text information with the difference content can be achieved by comparing the reply text information with the difference content It is expressed in a vector form, and TF or TF-IDF can be used to convert the reply text information and the difference content into a vector form. Calculate the cosine distance between the reply text information in vector form and the difference content, and use the cosine distance as the similarity between the reply text information and the difference content. When the similarity between the difference content exceeds a preset similarity threshold, it is determined that the reply text information matches the difference content successfully.
  • the preset similarity threshold can be set to 80%, etc., according to empirical values.
  • Step S406 If the closest question matches the question text information, the answer corresponding to the closest question is obtained and displayed.
  • the closest question matches the question text information successfully, it is determined that the closest question matches the question text information, and the answer corresponding to the closest question can be obtained for display. Play the answer in voice. If the matching fails, it is determined that the closest question does not match the question text information. At this time, the unmatched question text information can be recorded for supplemental update of the stored questions and corresponding answers in the preset graphical knowledge base .
  • questioning information is generated, and according to the user's reply voice information based on the questioning information, it is judged whether the closest question is In order to match the question text information, the corresponding question and answer pair can be found more accurately from the preset graphed knowledge base, and the user experience can be improved.
  • FIG. 4 is a schematic flowchart of the third embodiment of the question and answer method based on the graphed knowledge base of the present application.
  • the first embodiment of the question and answer method based on the graphed knowledge base of the present application is proposed. Three examples. This embodiment is described based on the above-mentioned first embodiment.
  • the method before the step S30, the method further includes:
  • Step S201 Obtain an existing question and answer pair, classify the existing question and answer pair according to a preset classification algorithm, and obtain a type question and answer pair of a preset type.
  • the preset types include 4 types: specific question, right-and-false question, Why type sentence, and choice question.
  • the existing question and answer pairs raised by a large number of users are summarized and analyzed according to the preset
  • the classification algorithm classifies the existing question and answer pairs, and the preset classification algorithm may be an algorithm that uses a convolutional neural network to classify text (Text Convolutional Neural Networks, abbreviated as TextCNN).
  • step S201 includes:
  • the vectorized question and answer pair sequentially passes through the input layer, the convolution layer, the pooling layer and the fully connected layer of the convolutional neural network text classification algorithm to obtain a preset type of type question and answer pair.
  • the existing question and answer pairs are fixed-length processing, for example, the fixed-length is n, n is a positive integer greater than or equal to 1, truncated if it exceeds n, and padded 0 if it is less than n, so as to obtain the fixed-length question and answer. Correct. The added 0 has no effect on the subsequent results, because the subsequent max-pooling layer will only output the maximum value, and the zero-filled items will be filtered out.
  • the fixed-length question answering needs to be one-hot encoded, projected into a low-dimensional space, and semantic features are encoded in a specified dimension to obtain a vectorized question and answer pair.
  • a convolutional layer is established for filters of different sizes, so there will be multiple image features (feature maps).
  • the image is two-dimensional data composed of pixels, and their convolution kernels are at least two-dimensional. max-pooling will only output the maximum value and filter the zeros in the input.
  • a layer of fully connected normalized index (softmax) function layer is connected, and the probability of each category is output, so as to obtain the existing question and answer pairs corresponding to the preset type.
  • Step S202 Perform element identification on each type of question and answer pair through a preset element recognition model, obtain sample elements of each type of question and answer pair, and establish the preset graphical knowledge base based on the sample elements.
  • the output is the best tag sequence, and the words that meet the feature features of each type of question and answer are labeled through the preset feature recognition model.
  • the QEAC element system is designed according to the question sentence, that is, the question element includes at least one of the QEAC elements, the elements are extracted to establish the graphed knowledge base, the element extraction follows the principle of conciseness, and elements that can indicate the meaning of the sentence are extracted according to the sample elements Establish the preset graphical knowledge base.
  • step S20 includes:
  • the word vector and the word vector pass through the two-way long and short-term memory layer of the preset element recognition model to obtain the respective scores of all tags of each word of the question text information;
  • the respective scores of all tags of each word of the question text information pass through the conditional random field layer of the preset element recognition model to obtain the probability of the tag sequence, and use the tag with the highest probability as the corresponding question element to obtain the The question element corresponding to the question text information.
  • each element in the QEAC element system of question design can be used as tags. All tags are all elements.
  • the question element includes at least one of the QEAC elements, where element Q represents the subject question word, and element C represents the question.
  • the final output vector of the LSTM unit can be regarded as a form of representation of the question text information, and finally in the labeling process.
  • the softmax function is generally used for processing, but this method has limited effect when processing data that has a direct relationship with the output label.
  • the neural network structure is very dependent on the data, the size and quality of the data will also seriously affect the effect of the model training, so there is a combination of the existing linear statistical model and the neural network structure.
  • the combination method the combination of LSTM and CRF has a better effect. It is to combine the softmax function with CRF on the output side, use LSTM to solve the problem of extracting sequence features, and use CRF to effectively use sentence-level tagging information.
  • Each sentence is represented as a word vector and a word vector by transforming words into a vector word2vector.
  • the second layer Bi-directional Long Short-Term Memory (Bi-directional Long Short-Term Memory, abbreviated BiLSTM) layer.
  • BiLSTM Bi-directional Long Short-Term Memory
  • Input word vectors and word vectors to the BiLSTM layer in the model, and the output of this layer is the respective scores of all tags of each word in the sentence.
  • the respective scores of the tags serve as the non-normalized emission probabilities in the CRF model.
  • the third layer Conditional Random Fields (Conditional Random Fields, abbreviated CRF) layer.
  • This layer uses the output of the BiLSTM layer as the respective scores of all tags for each word, that is, the (transmission probability matrix) and the transition probability matrix, as the parameters of the original CRF model, and finally obtains the probability of the tag sequence.
  • the output probability matrix of LSTM can be defined as: P_ ⁇ n ⁇ k ⁇ , where k is the number of output labels and P ⁇ i,j ⁇ refers to the first The probability that i words are labeled as the j-th label.
  • y (y1,y2,...,yn)
  • A is the state transition matrix
  • Ai,j represent the probability of transition from the i-th tag to the j-th tag.
  • the CRF introduced is to model the output tag two-tuple, and then use dynamic programming to calculate, and finally label according to the obtained optimal path, that is, the words in the question text information that meet the characteristics of each element Labeling is performed to mark the probability that each word in the question text information belongs to each element, and the tag with the highest probability is used as the corresponding question element, thereby obtaining the QEAC question element corresponding to the question text information.
  • the existing question and answer pairs are obtained, the existing question and answer pairs are classified according to the preset classification algorithm, the type question and answer pairs of the preset type are obtained, and the elements of each type of question answer pair are identified through the preset element recognition model.
  • the library can search for the corresponding answer more accurately and improve the user experience.
  • an embodiment of the present application also proposes a storage medium.
  • the storage medium may be volatile or non-volatile.
  • the storage medium stores a question and answer program based on a graphed knowledge base.
  • the question and answer program of the chemical knowledge base is executed by the processor, the steps of the question answer method based on the graph knowledge base as described above are realized.
  • an embodiment of the present application also proposes a question answering device based on a graphed knowledge base, and the question answering device based on a graphed knowledge base includes:
  • the voice recognition module 10 is configured to perform voice detection, and when a user's question voice is detected, voice recognition is performed on the question voice to obtain question text information.
  • the voice signal can be acquired from the speaker through a microphone.
  • Digital signal processing will first detect whether there is voice in the audio signal captured by the microphone, and then analyze the audio signal to predict what the text is said in the received audio signal.
  • Voice activity detection Voice Activity Detection, abbreviated VAD
  • VAD Voice Activity Detection
  • the computer preprocesses the question voice of the detected user, then extracts the characteristics of the voice, and obtains the pre-established voice recognition template.
  • the computer compares the voice template stored in the computer with the input characteristics of the question voice according to the voice recognition model, and finds a series of optimal templates that match the input voice according to a certain search and matching strategy . Then according to the definition of this template, the recognition result of the computer is obtained by looking up the table.
  • a common transformation method is to extract MFCC features.
  • the sound becomes a matrix with 12 rows (assuming that the acoustic features are 12-dimensional) and N columns, which is called the observation sequence, where N is the total number of frames.
  • the audio data extracted from the feature is output in text through the acoustic model, dictionary, and language model to obtain the question text information.
  • the element recognition module 20 is configured to perform element recognition on the question text information through a preset element recognition model to obtain question elements corresponding to the question text information.
  • the long short-term memory network (Long short-term memory, abbreviated LSTM) is a special kind of RNN, mainly to solve the problem of gradient disappearance and gradient explosion during long sequence training.
  • CRF Conditional Random Fields
  • the CRF layer can add some constraints to the final predicted tag to ensure that the predicted tag is legal.
  • LSTM is used to solve the problem of extracting sequence features. Under the LSTM-CRF model, the output is the best tag sequence.
  • the QEAC element system is designed for question sentences, that is, the question elements include at least one of the QEAC elements, where element Q represents the subject question word, element C represents the target of the question word’s direct action, element E represents the center of the subject question sentence, and element A represents Modified or attributed components can be used to modify E or another A. For example, for the sentence "Which are the purchase channels of wealth management products", the result of the element system marking is " ⁇ #A# ⁇ # Channel#E# ⁇ #C# ⁇ #Q#".
  • the QEAC question element corresponding to the question text information is obtained by labeling the words satisfying the characteristics of each element among the words in the question text information through a preset element recognition model.
  • the obtaining module 30 is used to obtain the knowledge elements of multiple storage problems in the preset graphed knowledge base.
  • the preset graphed knowledge base is pre-established based on a large number of existing question and answer pairs, to obtain existing question and answer pairs, and to classify the existing question and answer pairs according to a preset classification algorithm to obtain a preset type
  • the type of question-and-answer pairs, the element identification of each type of question-and-answer pairs is performed through a preset element recognition model, the sample elements of each type of question-and-answer pair are obtained, and the preset graphical knowledge base is established based on the sample elements, that is, the preset A large number of storage questions and corresponding answers are stored in the graphed knowledge base.
  • the similarity between the question element of the question text information and the knowledge element of each stored question in the preset graphed knowledge base can be calculated, and when the similarity exceeds a preset threshold, it is determined that the matching is successful.
  • the matching module 40 is configured to match the question elements of the question text information with the knowledge elements of each of the stored questions.
  • the question elements and the knowledge elements in the preset graphed knowledge base are represented in the form of vectors, which may be term frequency (Term Frequency, abbreviation TF) or term frequency-inverse document frequency (Term Frequency-Inverse Document Frequency).
  • Document Frequency abbreviated as TF-IDF
  • TF-IDF term frequency-inverse document frequency
  • the cosine distance between the question element in the form of a vector and each knowledge element in the preset graphed knowledge base is calculated, and the cosine distance is used as the similarity.
  • the preset threshold can be set to 90%, etc., and set according to empirical values.
  • the display module 50 is configured to display the answer corresponding to the successfully matched storage question if the matching is successful.
  • the knowledge element is deemed to be a successfully matched knowledge element, and the preset graphed knowledge base includes each element
  • the answer corresponding to the composed question is obtained, and the answer corresponding to the successfully matched knowledge element is obtained, and the answer is displayed.
  • the display method includes the voice playing of the answer and the display section of the question-and-answer device based on the graphical knowledge base. At least one of the surface displays.
  • voice recognition is performed on the question voice to obtain the question text information.
  • the question text information is performed through a preset element recognition model.
  • Element identification obtain the question elements corresponding to the question text information, obtain the knowledge elements of multiple stored questions in the preset graphed knowledge base, and match the question elements of the question text information with the knowledge elements of each of the stored questions If the matching is successful, the answer corresponding to the successfully matched storage question will be displayed.
  • the elements are matched to improve the accuracy of matching between sentences, so as to more accurately from the preset map
  • the corresponding answers can be searched in the chemical knowledge base to improve the accuracy and efficiency of intelligent question and answer.
  • the question answering device based on the graphed knowledge base further includes:
  • the calculation module is configured to calculate the similarity between the question element of the question text information and the knowledge element of each stored question if the matching fails;
  • the acquiring module 30 is also configured to acquire the storage problem with the highest similarity as the closest problem
  • the generating module is used to generate follow-up information according to the closest question, and display the follow-up information.
  • the voice recognition module 10 is also used to perform voice detection. When detecting the user's reply voice information based on the questioning information, perform voice recognition on the reply voice information to obtain the reply text information;
  • the matching module 40 is further configured to determine whether the closest question matches the question text information according to the reply text information;
  • the display module 50 is further configured to obtain an answer corresponding to the closest question for display if the closest question matches the question text information successfully.
  • the matching module 40 is further configured to match each element in the closest question with each element in the question text information according to element type, and the element that fails to match is regarded as the difference content;
  • the generating module is also used to generate follow-up information according to the difference content.
  • the question answering device based on the graphed knowledge base further includes:
  • the classification module is used to obtain existing question and answer pairs, classify the existing question and answer pairs according to a preset classification algorithm, and obtain type question and answer pairs of preset types;
  • the establishment module is used to identify the elements of each type of question and answer pair through a preset element recognition model, obtain sample elements of each type of question and answer pair, and establish the preset graphed knowledge base based on the sample elements.
  • the question answering device based on the graphed knowledge base further includes:
  • the fixed-length processing module is used to obtain an existing question and answer pair, perform fixed-length processing on the existing question and answer pair, and obtain a fixed-length question and answer pair;
  • the encoding module is used to perform one-hot encoding on the fixed-length question and answer pairs to obtain vectorized question and answer pairs;
  • the classification module is also used for the vectorized question and answer pair to pass through the input layer, the convolution layer, the pooling layer and the fully connected layer of the convolutional neural network text classification algorithm in sequence to obtain the preset type of type question and answer pair.
  • the element recognition module 20 is further configured to express the question text information as a word vector and a word vector through the presentation layer of a preset element recognition model; the word vector and the word vector pass through the pre- Set the two-way long and short-term memory layer of the element recognition model to obtain the respective scores of all the tags of each word of the question text information; the respective scores of all the tags of each word of the question text information are recognized by the preset element
  • the conditional random field layer of the model obtains the probability of the tag sequence, uses the tag with the highest probability as the corresponding question element, and obtains the question element corresponding to the question text information.
  • Memory image ROM/Random Access Memory (Random Access Memory, RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (can be a mobile phone, computer, server, air conditioner, or network equipment Etc.) Perform the methods described in each embodiment of this application.
  • a terminal device can be a mobile phone, computer, server, air conditioner, or network equipment Etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种基于图谱化知识库的问答方法、设备、存储介质及装置,该方法通过进行语音检测,在检测到用户的提问语音时,对所述提问语音进行语音识别,获得提问文本信息,基于人工智能,通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素,获取预设图谱化知识库中多个存储问题的知识要素,将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配,若匹配成功,则将匹配成功的存储问题对应的答案进行展示,基于要素匹配,提高问题匹配的准确性,从而提高智能问答的准确率和效率。

Description

基于图谱化知识库的问答方法、设备、存储介质及装置
本申请要求于2019年10月29日提交中国专利局、申请号为CN201911041316.4、名称为“基于图谱化知识库的问答方法、设备、存储介质及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能的技术领域,尤其涉及一种基于图谱化知识库的问答方法、设备、存储介质及装置。
背景技术
发明人意识到,现有技术中的智能问答系统,主要是采用以下两种方法:
(1)通过序列到序列模型(seq2seq),在对话过程中生成合适的回复,但它的缺点是需要大规模的训练语料,对模型精度要求较高,经常生成一些没有意义的回复,导致智能问答准确率低且效率低,用户问答体验差;
(2)通过从事先定义好的问答库中通过索引搜索,从问答库中已有对话中选择合适的回复,但它的缺点是过于依赖数据库中已有数据的质量和检索算法,需要大量的人工标注,如果选用的数据质量欠佳,则无法搜索到正确地回复对用户问题进行回答,导致智能问答准确率低且效率低。
上述内容仅用于辅助理解本申请的技术方案,并不代表承认上述内容是现有技术。
技术解决方案
本申请提供一种基于图谱化知识库的问答方法,所述基于图谱化知识库的问答方法包括以下步骤:
进行语音检测,在检测到用户的提问语音时,对所述提问语音进行语音识别,获得提问文本信息;
通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素;
获取预设图谱化知识库中多个存储问题的知识要素;
将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配;
若匹配成功,则将匹配成功的存储问题对应的答案进行展示。
本申请还提出一种基于图谱化知识库的问答设备,所述基于图谱化知识库的问答设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的基于图谱化知识库的问答程序,所述基于图谱化知识库的问答程序配置为实现如下步骤:
进行语音检测,在检测到用户的提问语音时,对所述提问语音进行语音识别,获得提问文本信息;
通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素;
获取预设图谱化知识库中多个存储问题的知识要素;
将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配;
若匹配成功,则将匹配成功的存储问题对应的答案进行展示。
本申请还提出一种存储介质,所述存储介质上存储有基于图谱化知识库的问答程序,所述基于图谱化知识库的问答程序被处理器执行时实现如下步骤:
进行语音检测,在检测到用户的提问语音时,对所述提问语音进行语音识别,获得提问文本信息;
通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素;
获取预设图谱化知识库中多个存储问题的知识要素;
将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配;
若匹配成功,则将匹配成功的存储问题对应的答案进行展示。
本申请还提出一种基于图谱化知识库的问答装置,所述基于图谱化知识库的问答装置包括:
语音识别模块,用于进行语音检测,在检测到用户的提问语音时,对所述提问语音进行语音识别,获得提问文本信息;
要素识别模块,用于通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素;
获取模块,用于获取预设图谱化知识库中多个存储问题的知识要素;
匹配模块,用于将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配;
展示模块,用于若匹配成功,则将匹配成功的存储问题对应的答案进行展示。
附图说明
图1是本申请实施例方案涉及的硬件运行环境的基于图谱化知识库的问答设备的结构示意图;
图2为本申请基于图谱化知识库的问答方法第一实施例的流程示意图;
图3为本申请基于图谱化知识库的问答方法第二实施例的流程示意图;
图4为本申请基于图谱化知识库的问答方法第三实施例的流程示意图;
图5为本申请基于图谱化知识库的问答装置第一实施例的结构框图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
参照图1,图1为本申请实施例方案涉及的硬件运行环境的基于图谱化知识库的问答设备结构示意图。
如图1所示,该基于图谱化知识库的问答设备可以包括:处理器1001,例如中央处理器(Central Processing Unit,CPU),通信总线1002、用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display),可选用户接口1003还可以包括标准的有线接口、无线接口,对于用户接口1003的有线接口在本申请中可为USB接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如无线保真(WIreless-FIdelity,WI-FI)接口)。存储器1005可以是高速的随机存取存储器(Random Access Memory,RAM)存储器,也可以是稳定的存储器(Non-volatile Memory,NVM),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图1中示出的结构并不构成对基于图谱化知识库的问答设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及基于图谱化知识库的问答程序。
在图1所示的基于图谱化知识库的问答设备中,网络接口1004主要用于连接后台服务器,与所述后台服务器进行数据通信;用户接口1003主要用于连接用户设备;所述基于图谱化知识库的问答设备通过处理器1001调用存储器1005中存储的基于图谱化知识库的问答程序,并执行本申请实施例提供的基于图谱化知识库的问答方法。
基于上述硬件结构,提出本申请基于图谱化知识库的问答方法的实施例。
参照图2,图2为本申请基于图谱化知识库的问答方法第一实施例的流程示意图,提出本申请基于图谱化知识库的问答方法第一实施例。
在第一实施例中,所述基于图谱化知识库的问答方法包括以下步骤:
步骤S10:进行语音检测,在检测到用户的提问语音时,对所述提问语音进行语音识别,获得提问文本信息。
应理解的是,本实施例的执行主体是所述基于图谱化知识库的问答设备,其中,所述基于图谱化知识库的问答设备可为智能手机、个人电脑或服务器等电子设备,本实施例对此不加以限制。所述基于图谱化知识库的问答设备中的智能问答系统,可通过麦克风从扬声器获取语音信号。数字信号处理将会首先检测是否有语音存在于麦克风所捕获的音频信号中,随后对该音频信号进行分析以预测接收到的音频信号中所说的文字是什么。可采用语音活动检测(Voice Activity Detection,缩写VAD)方式进行语音检测。
首先对检测到用户的提问语音进行预处理,然后提取语音的特征,获取预先建立的语音识别模板。计算机在识别过程中根据语音识别模型,将计算机中存放的语音模板与输入的所述提问语音的特征进行比较,根据一定的搜索和匹配策略,找出一系列最优的与输入语音匹配的模板。然后根据此模板的定义,通过查表获得计算机的识别结果。
对所述提问语音进行分析,需要对所述提问语音分帧,也就是把所述提问语音切开一小段一小段,每小段称为一帧。分帧后,所述提问语音就变成了很多小段。但波形在时域上几乎没有描述能力,因此必须将波形作变换。常见的一种变换方法是提取MFCC特征。声音就成了一个12行(假设声学特征是12维)、N列的一个矩阵,称之为观察序列,这里N为总帧数。通过声学模型,字典以及语言模型对提取特征后的音频数据进行文字输出,获得所述提问文本信息。
步骤S20:通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素。
可理解的是,长短期记忆网络(Long short-term memory,缩写LSTM)主要是为了解决长序列训练过程中的梯度消失和梯度爆炸问题。使用条件随机场(Conditional Random Fields,缩写CRF)有效利用了句子级别的标记信息,CRF层可以为最后预测的标签添加一些约束来保证预测的标签是合法的。首先获取大量的样本问答对及对应的样本要素,建立基础长短期记忆网络LSTM-CRF模型,根据所述样本问答对及对应的所述样本要素对所述基础LSTM-CRF模型进行训练,获得所述预设要素识别模型。
在具体实现中,使用LSTM解决提取序列特征的问题,在LSTM-CRF模型下,输出的是最佳的标签序列。针对疑问句设计QEAC要素体系,即所述提问要素包括QEAC要素中的至少一项,其中要素Q表示主体疑问词,要素C表示疑问词直接作用的目标,要素E表示主体疑问句的中心,要素A表示修饰或者属性的成分,可以用来修饰E也可以用来修饰另外一个A。例如对于“理财产品的购买渠道有哪些”这句话,要素体系标注结果为“理财产品#A#的购买#渠道#E#有#C#哪些#Q#”。通过预设要素识别模型对所述提问文本信息中各词语中满足各要素特征的词语进行标注,从而获得所述提问文本信息对应的QEAC提问要素。
步骤S30:获取预设图谱化知识库中多个存储问题的知识要素。
需要说明的是,所述预设图谱化知识库为根据大量的现有问答对预先建立的,获取现有问答对,根据预设分类算法对所述现有问答对进行分类,获得预设类型的类型问答对,通过预设要素识别模型对各类型问答对进行要素识别,获得各类型问答对的样本要素,并根据所述样本要素建立所述预设图谱化知识库,即所述预设图谱化知识库中存储有大量的存储问题及对应的答案。可通过计算所述提问文本信息的提问要素与预设图谱化知识库中的各存储问题的知识要素之间的相似度,在相似度超过预设阈值时,认定匹配成功。
步骤S40:将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配。
应理解的是,首先将所述提问要素和预设图谱化知识库中的各知识要素以向量形式进行表示,可采用词频(Term Frequency,缩写TF)或者词频-逆文档频率(Term Frequency–Inverse Document Frequency,缩写TF-IDF)将提问要素和预设图谱化知识库中的各知识要素转换成向量形式。计算向量形式的提问要素与预设图谱化知识库中的各知识要素之间的余弦距离,将所述余弦距离作为相似度,在所述相似度超过预设阈值时,认定匹配成功。所述预设阈值可设置为90%等,根据经验值设置。
步骤S50:若匹配成功,则将匹配成功的存储问题对应的答案进行展示。
可理解的是,若在预设图谱化知识库中检索到相似度超过预设阈值的知识要素,将该知识要素认定为匹配成功的知识要素,所述预设图谱化知识库中包括各要素组成的问题对应的答案,则获取匹配成功的知识要素对应的答案,并将该答案进行展示,展示的方式包括将该答案进行语音播放和在所述基于图谱化知识库的问答设备的显示节面进行显示中的至少一种。
本实施例中,通过进行语音检测,在检测到用户的提问语音时,对所述提问语音进行语音识别,获得提问文本信息,基于人工智能,通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素,获取预设图谱化知识库中多个存储问题的知识要素,将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配,若匹配成功,则将匹配成功的存储问题对应的答案进行展示,通过将句子划分为各要素,各要素之间进行匹配,提高句子之间的匹配准确性,从而更准确地从预设图谱化知识库中搜索到对应的答案,提高智能问答的准确率和效率。
参照图3,图3为本申请基于图谱化知识库的问答方法第二实施例的流程示意图,基于上述图2所示的第一实施例,提出本申请基于图谱化知识库的问答方法的第二实施例。
在第二实施例中,所述步骤S40之后,还包括:
步骤S401:若匹配失败,则计算所述提问文本信息的提问要素与各存储问题的知识要素之间的相似度。
应理解的是,若所述提问文本信息的提问要素与各所述存储问题的知识要素之间的相似度均小于所述预设阈值时,认定所述提问文本信息的提问要素与各所述存储问题的知识要素匹配失败,说明所述预设图谱化知识库中未能搜索到对应的答案,则可通过将所述提问文本信息的提问要素和所述预设图谱化知识库中各存储问题的知识要素以向量形式进行表示,可采用TF或者TF-IDF将所述提问要素和预设图谱化知识库中的各所述知识要素转换成向量形式。计算向量形式的提问要素与预设图谱化知识库中的各所述知识要素之间的余弦距离,将所述余弦距离作为所述提问要素与预设图谱化知识库中的各所述知识要素之间的相似度。
步骤S402:获取所述相似度最高的存储问题作为最接近问题。
可理解的是,获取所述提问要素与预设图谱化知识库中的各所述知识要素之间的相似度,将所述相似度最高的存储问题认定为与所述提问文本信息最接近的所述最接近问题。
步骤S403:根据所述最接近问题生成追问信息,并将所述追问信息进行展示。
需要说明的是,将所述最接近问题与所述提问文本信息进行比较,获得差异内容,根据所述差异内容生成追问信息。将所述最接近问题的各要素与所述提问文本信息的提问要素在上下文语境(context)中匹配,即将所述最接近问题中各要素与所述提问文本信息中各要素按照要素类型进行分别匹配,匹配成功的是所述最接近问题与所述提问文本信息中相同的要素内容,匹配失败的要素即为所述差异内容。本实施例中,所述根据所述最接近问题生成追问信息,包括:将所述最接近问题中各要素与所述提问文本信息中各要素按照要素类型分别进行匹配,匹配失败的要素作为差异内容;根据所述差异内容生成追问信息。
在具体实现中,也可将所述最接近问题与所述提问文本信息分别进行分词处理,获得所述最接近问题的所有第一词语与所述提问文本信息的所有第二词语,将所述第一词语与所述第二词语进行匹配,匹配成功的词语为所述最接近问题与所述提问文本信息中相同的内容,匹配失败的词语为所述差异内容。
可理解的是,根据用户的所述提问文本信息和所述预设图谱化知识库中存储问题,针对用户有歧义不能回答的问题进行追问。比如用户在问收益规则是什么的时候,通过检索所述预设图谱化知识库,发现所述预设图谱化知识库里面有转入收益规则和转出收益规则,从而对用户进行追问是转入收益规则还是转出收益规则。
在本实施例中,所述步骤S403之后,还包括:
步骤S404:进行语音检测,在检测到所述用户基于所述追问信息的回复语音信息时,对所述回复语音信息进行语音识别,获得回复文本信息。
应理解的是,通过麦克风从扬声器获取语音信号,数字信号处理将会首先检测是否有语音存在于麦克风所捕获的音频信号中,随后对该音频信号进行分析以预测接收到的音频信号中的文字信息,从而获得所述回复文本信息。
步骤S405:根据所述回复文本信息判断所述最接近问题是否与所述提问文本信息匹配。
需要说明的是,获取所述最接近问题与所述提问文本信息之间的差异内容,将所述回复文本信息与所述差异内容进行匹配,可通过将所述回复文本信息与所述差异内容以向量形式进行表示,可采用TF或者TF-IDF将所述回复文本信息与所述差异内容转换成向量形式。计算向量形式的所述回复文本信息与所述差异内容之间的余弦距离,将所述余弦距离作为所述回复文本信息与所述差异内容之间的相似度,在所述回复文本信息与所述差异内容之间的相似度超过预设相似度阈值时,认定所述回复文本信息与所述差异内容匹配成功。所述预设相似度阈值可设置为80%等,根据经验值设置。
步骤S406:若所述最接近问题与所述提问文本信息匹配,则获取所述最接近问题对应的答案进行展示。
在具体实现中,若所述最接近问题与所述提问文本信息匹配成功,则认定所述最接近问题与所述提问文本信息匹配,则可获取所述最接近问题对应的答案进行展示,可将该答案进行语音播放。若匹配失败,则认定所述最接近问题与所述提问文本信息不匹配,此时可将未匹配的提问文本信息进行记录,以进行预设图谱化知识库中存储问题及对应答案的补充更新。
在本实施例中,所述提问文本信息的提问要素与各所述存储问题的知识要素匹配失败时,生成追问信息,根据所述用户基于所述追问信息的回复语音信息,判断最接近问题是否为与所述提问文本信息匹配,从而更加准确地从预设图谱化知识库中查找对应的问答对,提升用户体验。
参照图4,图4为本申请基于图谱化知识库的问答方法第三实施例的流程示意图,基于上述第一实施例或第二实施例,提出本申请基于图谱化知识库的问答方法的第三实施例。本实施例基于上述第一实施例进行说明。
在第三实施例中,所述步骤S30之前,还包括:
步骤S201:获取现有问答对,根据预设分类算法对所述现有问答对进行分类,获得预设类型的类型问答对。
应理解的是,所述预设类型包括4类:特指问句、是非问句、Why类型句子和选择疑问句,对大量的用户所提的所述现有问答对进行总结分析,根据预设分类算法对所述现有问答对进行分类,所述预设分类算法可以是利用卷积神经网络对文本进行分类的算法(Text Convolutional Neural Networks,缩写TextCNN)。
进一步地,所述步骤S201,包括:
获取现有问答对,对所述现有问答对进行定长处理,获得定长问答对;
对所述定长问答对进行独热编码,获得向量化的问答对;
所述向量化的问答对依次经过卷积神经网络文本分类算法的输入层、卷积层、池化层和全连接层,获得预设类型的类型问答对。
可理解的是,首先,对现有问答对进行定长处理,比如定长为n,n为大于等于1的正整数,超过n的截断,不足n的补0,从而获得所述定长问答对。补充的0对后面的结果没有影响,因为后面的最大池化层(max-pooling)只会输出最大值,补零的项会被过滤掉。通过一个隐藏层,将所述定长问答需进行独热(one-hot)编码,投影到一个低维空间中,在指定维度中编码语义特征,获得向量化的问答对。为不同尺寸的过滤器(filter)都建立一个卷积层,所以会有多个图像特征(feature map),图像是像素点组成的二维数据,它们的卷积核至少是二维的。max-pooling只会输出最大值,对输入中的补0 做过滤。最后接一层全连接的归一化指数(softmax)函数层,输出每个类别的概率,从而获得预设类型对应的现有问答对。
步骤S202:通过预设要素识别模型对各类型问答对进行要素识别,获得各类型问答对的样本要素,并根据所述样本要素建立所述预设图谱化知识库。
需要说明的是,使用LSTM解决提取序列特征的问题,在LSTM-CRF模型下,输出的是最佳的标签序列,通过预设要素识别模型对各类型问答中满足各要素特征的词语进行标注,从而获得各类型问答对对应的QEAC样本要素。根据疑问句设计QEAC要素体系,即所述提问要素包括QEAC要素中的至少一项,提取要素建立所述图谱化知识库,要素提取遵循简练原则,提取能够表明句意的要素,根据所述样本要素建立所述预设图谱化知识库。
在本实施例中,所述步骤S20,包括:
通过预设要素识别模型的表示层将所述提问文本信息表示为词向量和字向量;
所述词向量和字向量经过所述预设要素识别模型的双向长短时记忆层,获得所述提问文本信息的每个词的所有标签的各自得分;
所述提问文本信息的每个词的所有标签的各自得分经过所述预设要素识别模型的条件随机场层,获得标签序列的概率,并将概率最高的标签作为对应的提问要素,获得所述提问文本信息对应的提问要素。
在具体实现中,可将疑问句设计QEAC要素体系中各个要素作为标签,所有标签也即所有要素,所述提问要素包括QEAC要素中的至少一项,其中要素Q表示主体疑问词,要素C表示疑问词直接作用的目标,要素E表示主体疑问句的中心,要素A表示修饰或者属性的成分,可以用来修饰E也可以用来修饰另外一个A。通过LSTM网络的处理,相当于得到了一个比较好的对所述提问文本信息的表示方法,LSTM单元最终输出的向量即可以看成是所述提问文本信息的一种表示形式,最终在打标签阶段,一般都采用softmax函数进行处理,不过这种方法在处理输出标签直接有强烈关系的数据时,效果还是有限的。特别是在实际的序列标注任务时,由于神经网络结构对数据的依赖很大,数据量的大小和质量也会严重影响模型训练的效果,故而出现了将现有的线性统计模型与神经网络结构相结合的方法,效果较好的有LSTM与CRF的结合。就是在输出端将softmax函数与CRF结合起来,使用LSTM解决提取序列特征的问题,使用CRF有效利用了句子级别的标记信息。
应理解的是,第一层:表示层。通过词转化成向量word2vector的方式将每个句子表示为词向量和字向量。
第二层:双向长短时记忆(Bi-directional Long Short-Term Memory,缩写BiLSTM)层。输入词向量和字向量到模型中的BiLSTM层,该层的输出是句子的每个词的所有标签的各自得分。此处的标签的各自得分充当的是CRF模型中的非归一化的发射概率。
第三层:条件随机场(Conditional Random Fields,缩写CRF)层。该层使用BiLSTM层的输出为每个词的所有标签的各自得分,即(发射概率矩阵)以及转移概率矩阵,作为原始CRF模型的参数,最终获得标签序列的概率。
对于输入:X=(x1,x2,...,xn),可以定义LSTM的输出概率矩阵为:P_{n∗k},其中k是输出标签的个数P{i,j}是指第i个字被标记为第j个标签的概率。对于待预测的标签序列:y=(y1,y2,...,yn),可以有如下定义:
s(X,y)=∑n_{i=0}A_{y_i,y_{i+1}}+∑n_{i=0}P_{i,y_i};
其中A是状态转移矩阵,Ai,j代表从第i个标签tag转移到第j个标签tag的概率。通过求得最大的s(X,y),即可得到最佳的输出标签序列。引入的CRF,是对输出标签二元组进行了建模,然后使用动态规划进行计算,最终根据得到的最优路径进行标注,即对所述提问文本信息中各词语中满足各要素特征的词语进行标注,标注出所述提问文本信息中各词语属于各要素的概率,则概率最高的标签作为对应的提问要素,从而获得所述提问文本信息对应的QEAC提问要素。
本实施例中,获取现有问答对,根据预设分类算法对所述现有问答对进行分类,获得预设类型的类型问答对,通过预设要素识别模型对各类型问答对进行要素识别,获得各类型问答对的样本要素,并根据所述样本要素建立所述预设图谱化知识库,以要素体系建立的预设图谱化知识库,提问文本信息可通过要素识别从预设图谱化知识库更加准确地搜索到对应的答案,提升用户体验。
此外,本申请实施例还提出一种存储介质,所述存储介质可以是易失性,也可以是非易失性,所述存储介质上存储有基于图谱化知识库的问答程序,所述基于图谱化知识库的问答程序被处理器执行时实现如上文所述的基于图谱化知识库的问答方法的步骤。
此外,参照图5,本申请实施例还提出一种基于图谱化知识库的问答装置,所述基于图谱化知识库的问答装置包括:
语音识别模块10,用于进行语音检测,在检测到用户的提问语音时,对所述提问语音进行语音识别,获得提问文本信息。
应理解的是,可通过麦克风从扬声器获取语音信号。数字信号处理将会首先检测是否有语音存在于麦克风所捕获的音频信号中,随后对该音频信号进行分析以预测接收到的音频信号中所说的文字是什么。可采用语音活动检测(Voice Activity Detection,缩写VAD)方式进行语音检测。
首先对检测到用户的提问语音进行预处理,然后提取语音的特征,获取预先建立的语音识别模板。计算机在识别过程中根据语音识别模型,将计算机中存放的语音模板与输入的所述提问语音的特征进行比较,根据一定的搜索和匹配策略,找出一系列最优的与输入语音匹配的模板。然后根据此模板的定义,通过查表获得计算机的识别结果。
对所述提问语音进行分析,需要对所述提问语音分帧,也就是把所述提问语音切开一小段一小段,每小段称为一帧。分帧后,所述提问语音就变成了很多小段。但波形在时域上几乎没有描述能力,因此必须将波形作变换。常见的一种变换方法是提取MFCC特征。声音就成了一个12行(假设声学特征是12维)、N列的一个矩阵,称之为观察序列,这里N为总帧数。通过声学模型,字典以及语言模型对提取特征后的音频数据进行文字输出,获得所述提问文本信息。
要素识别模块20,用于通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素。
可理解的是,长短期记忆网络(Long short-term memory,缩写LSTM)是一种特殊的RNN,主要是为了解决长序列训练过程中的梯度消失和梯度爆炸问题。使用条件随机场(Conditional Random Fields,缩写CRF)有效利用了句子级别的标记信息,CRF层可以为最后预测的标签添加一些约束来保证预测的标签是合法的。首先获取大量的样本问答对及对应的样本要素,建立基础长短期记忆网络LSTM-CRF模型,根据所述样本问答对及对应的所述样本要素对所述基础LSTM-CRF模型进行训练,获得所述预设要素识别模型。
在具体实现中,使用LSTM解决提取序列特征的问题,在LSTM-CRF模型下,输出的是最佳的标签序列。针对疑问句设计QEAC要素体系,即所述提问要素包括QEAC要素中的至少一项,其中要素Q表示主体疑问词,要素C表示疑问词直接作用的目标,要素E表示主体疑问句的中心,要素A表示修饰或者属性的成分,可以用来修饰E也可以用来修饰另外一个A。例如对于“理财产品的购买渠道有哪些”这句话,要素体系标注结果为“理财产品#A#的购买#渠道#E#有#C#哪些#Q#”。通过预设要素识别模型对所述提问文本信息中各词语中满足各要素特征的词语进行标注,从而获得所述提问文本信息对应的QEAC提问要素。
获取模块30,用于获取预设图谱化知识库中多个存储问题的知识要素。
需要说明的是,所述预设图谱化知识库为根据大量的现有问答对预先建立的,获取现有问答对,根据预设分类算法对所述现有问答对进行分类,获得预设类型的类型问答对,通过预设要素识别模型对各类型问答对进行要素识别,获得各类型问答对的样本要素,并根据所述样本要素建立所述预设图谱化知识库,即所述预设图谱化知识库中存储有大量的存储问题及对应的答案。可通过计算所述提问文本信息的提问要素与预设图谱化知识库中的各存储问题的知识要素之间的相似度,在相似度超过预设阈值时,认定匹配成功。
匹配模块40,用于将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配。
应理解的是,首先将所述提问要素和预设图谱化知识库中的各知识要素以向量形式进行表示,可采用词频(Term Frequency,缩写TF)或者词频-逆文档频率(Term Frequency–Inverse Document Frequency,缩写TF-IDF)将提问要素和预设图谱化知识库中的各知识要素转换成向量形式。计算向量形式的提问要素与预设图谱化知识库中的各知识要素之间的余弦距离,将所述余弦距离作为相似度,在所述相似度超过预设阈值时,认定匹配成功。所述预设阈值可设置为90%等,根据经验值设置。
展示模块50,用于若匹配成功,则将匹配成功的存储问题对应的答案进行展示。
可理解的是,若在预设图谱化知识库中检索到相似度超过预设阈值的知识要素,将该知识要素认定为匹配成功的知识要素,所述预设图谱化知识库中包括各要素组成的问题对应的答案,则获取匹配成功的知识要素对应的答案,并将该答案进行展示,展示的方式包括将该答案进行语音播放和在所述基于图谱化知识库的问答设备的显示节面进行显示中的至少一种。
本实施例中,通过进行语音检测,在检测到用户的提问语音时,对所述提问语音进行语音识别,获得提问文本信息,基于人工智能,通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素,获取预设图谱化知识库中多个存储问题的知识要素,将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配,若匹配成功,则将匹配成功的存储问题对应的答案进行展示,通过将句子划分为各要素,各要素之间进行匹配,提高句子之间的匹配准确性,从而更准确地从预设图谱化知识库中搜索到对应的答案,提高智能问答的准确率和效率。
在一实施例中,所述基于图谱化知识库的问答装置还包括:
计算模块,用于若匹配失败,则计算所述提问文本信息的提问要素与各存储问题的知识要素之间的相似度;
所述获取模块30,还用于获取所述相似度最高的存储问题作为最接近问题;
生成模块,用于根据所述最接近问题生成追问信息,并将所述追问信息进行展示。
在一实施例中,所述语音识别模块10,还用于进行语音检测,在检测到所述用户基于所述追问信息的回复语音信息时,对所述回复语音信息进行语音识别,获得回复文本信息;
所述匹配模块40,还用于根据所述回复文本信息判断所述最接近问题是否与所述提问文本信息匹配;
所述展示模块50,还用于若所述最接近问题与所述提问文本信息匹配成功,则获取所述最接近问题对应的答案进行展示。
在一实施例中,所述匹配模块40,还用于将所述最接近问题中各要素与所述提问文本信息中各要素按照要素类型分别进行匹配,匹配失败的要素作为差异内容;
所述生成模块,还用于根据所述差异内容生成追问信息。
在一实施例中,所述基于图谱化知识库的问答装置还包括:
分类模块,用于获取现有问答对,根据预设分类算法对所述现有问答对进行分类,获得预设类型的类型问答对;
建立模块,用于通过预设要素识别模型对各类型问答对进行要素识别,获得各类型问答对的样本要素,并根据所述样本要素建立所述预设图谱化知识库。
在一实施例中,所述基于图谱化知识库的问答装置还包括:
定长处理模块,用于获取现有问答对,对所述现有问答对进行定长处理,获得定长问答对;
编码模块,用于对所述定长问答对进行独热编码,获得向量化的问答对;
所述分类模块,还用于所述向量化的问答对依次经过卷积神经网络文本分类算法的输入层、卷积层、池化层和全连接层,获得预设类型的类型问答对。
在一实施例中,所述要素识别模块20,还用于通过预设要素识别模型的表示层将所述提问文本信息表示为词向量和字向量;所述词向量和字向量经过所述预设要素识别模型的双向长短时记忆层,获得所述提问文本信息的每个词的所有标签的各自得分;所述提问文本信息的每个词的所有标签的各自得分经过所述预设要素识别模型的条件随机场层,获得标签序列的概率,并将概率最高的标签作为对应的提问要素,获得所述提问文本信息对应的提问要素。
本申请所述基于图谱化知识库的问答装置的其他实施例或具体实现方式可参照上述各方法实施例,此处不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。词语第一、第二、以及第三等的使用不表示任何顺序,可将这些词语解释为标识。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如只读存储器镜像(Read Only Memory image,ROM)/随机存取存储器(Random Access Memory,RAM)、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种基于图谱化知识库的问答方法,其中,所述基于图谱化知识库的问答方法包括以下步骤:
    进行语音检测,在检测到用户的提问语音时,对所述提问语音进行语音识别,获得提问文本信息;
    通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素;
    获取预设图谱化知识库中多个存储问题的知识要素;
    将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配;
    若匹配成功,则将匹配成功的存储问题对应的答案进行展示。
  2. 如权利要求1所述的基于图谱化知识库的问答方法,其中,所述将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配之后,所述基于图谱化知识库的问答方法还包括:
    若匹配失败,则计算所述提问文本信息的提问要素与各存储问题的知识要素之间的相似度;
    获取所述相似度最高的存储问题作为最接近问题;
    根据所述最接近问题生成追问信息,并将所述追问信息进行展示。
  3. 如权利要求2所述的基于图谱化知识库的问答方法,其中,所述根据所述最接近问题生成追问信息,并将所述追问信息进行展示之后,所述基于图谱化知识库的问答方法还包括:
    进行语音检测,在检测到所述用户基于所述追问信息的回复语音信息时,对所述回复语音信息进行语音识别,获得回复文本信息;
    根据所述回复文本信息判断所述最接近问题是否与所述提问文本信息匹配;
    若所述最接近问题与所述提问文本信息匹配成功,则获取所述最接近问题对应的答案进行展示。
  4. 如权利要求2所述的基于图谱化知识库的问答方法,其中,所述根据所述最接近问题生成追问信息,包括:
    将所述最接近问题中各要素与所述提问文本信息中各要素按照要素类型分别进行匹配,匹配失败的要素作为差异内容;
    根据所述差异内容生成追问信息。
  5. 如权利要求1所述的基于图谱化知识库的问答方法,其中,所述获取预设图谱化知识库中多个存储问题的知识要素之前,所述基于图谱化知识库的问答方法还包括:
    获取现有问答对,根据预设分类算法对所述现有问答对进行分类,获得预设类型的类型问答对;
    通过预设要素识别模型对各类型问答对进行要素识别,获得各类型问答对的样本要素,并根据所述样本要素建立所述预设图谱化知识库。
  6. 如权利要求5所述的基于图谱化知识库的问答方法,其中,所述获取现有问答对,根据预设分类算法对所述现有问答对进行分类,获得预设类型的类型问答对,包括:
    获取现有问答对,对所述现有问答对进行定长处理,获得定长问答对;
    对所述定长问答对进行独热编码,获得向量化的问答对;
    所述向量化的问答对依次经过卷积神经网络文本分类算法的输入层、卷积层、池化层和全连接层,获得预设类型的类型问答对。
  7. 如权利要求1-6中任一项所述的基于图谱化知识库的问答方法,其中,所述通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素,包括:
    通过预设要素识别模型的表示层将所述提问文本信息表示为词向量和字向量;
    所述词向量和字向量经过所述预设要素识别模型的双向长短时记忆层,获得所述提问文本信息的每个词的所有标签的各自得分;
    所述提问文本信息的每个词的所有标签的各自得分经过所述预设要素识别模型的条件随机场层,获得标签序列的概率,并将概率最高的标签作为对应的提问要素,获得所述提问文本信息对应的提问要素。
  8. 一种基于图谱化知识库的问答设备,其中,所述基于图谱化知识库的问答设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的基于图谱化知识库的问答程序,所述基于图谱化知识库的问答程序被所述处理器执行时实现如下步骤:
    进行语音检测,在检测到用户的提问语音时,对所述提问语音进行语音识别,获得提问文本信息;
    通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素;
    获取预设图谱化知识库中多个存储问题的知识要素;
    将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配;
    若匹配成功,则将匹配成功的存储问题对应的答案进行展示。
  9. 如权利要求8所述的基于图谱化知识库的问答设备,其中,所述将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配之后,所述基于图谱化知识库的问答程序被所述处理器执行时还实现如下步骤:
    若匹配失败,则计算所述提问文本信息的提问要素与各存储问题的知识要素之间的相似度;
    获取所述相似度最高的存储问题作为最接近问题;
    根据所述最接近问题生成追问信息,并将所述追问信息进行展示。
  10. 如权利要求9所述的基于图谱化知识库的问答设备,其中,所述根据所述最接近问题生成追问信息,并将所述追问信息进行展示之后,所述基于图谱化知识库的问答程序被所述处理器执行时还实现如下步骤:
    进行语音检测,在检测到所述用户基于所述追问信息的回复语音信息时,对所述回复语音信息进行语音识别,获得回复文本信息;
    根据所述回复文本信息判断所述最接近问题是否与所述提问文本信息匹配;
    若所述最接近问题与所述提问文本信息匹配成功,则获取所述最接近问题对应的答案进行展示。
  11. 如权利要求9所述的基于图谱化知识库的问答设备,其中,所述根据所述最接近问题生成追问信息,包括:
    将所述最接近问题中各要素与所述提问文本信息中各要素按照要素类型分别进行匹配,匹配失败的要素作为差异内容;
    根据所述差异内容生成追问信息。
  12. 如权利要求8所述的基于图谱化知识库的问答设备,其中,所述获取预设图谱化知识库中多个存储问题的知识要素之前,所述基于图谱化知识库的问答程序被所述处理器执行时还实现如下步骤:
    获取现有问答对,根据预设分类算法对所述现有问答对进行分类,获得预设类型的类型问答对;
    通过预设要素识别模型对各类型问答对进行要素识别,获得各类型问答对的样本要素,并根据所述样本要素建立所述预设图谱化知识库。
  13. 如权利要求12所述的基于图谱化知识库的问答设备,其中,所述获取现有问答对,根据预设分类算法对所述现有问答对进行分类,获得预设类型的类型问答对,包括:
    获取现有问答对,对所述现有问答对进行定长处理,获得定长问答对;
    对所述定长问答对进行独热编码,获得向量化的问答对;
    所述向量化的问答对依次经过卷积神经网络文本分类算法的输入层、卷积层、池化层和全连接层,获得预设类型的类型问答对。
  14. 如权利要求8-13中任一项所述的基于图谱化知识库的问答设备,其中,所述通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素,包括:
    通过预设要素识别模型的表示层将所述提问文本信息表示为词向量和字向量;
    所述词向量和字向量经过所述预设要素识别模型的双向长短时记忆层,获得所述提问文本信息的每个词的所有标签的各自得分;
    所述提问文本信息的每个词的所有标签的各自得分经过所述预设要素识别模型的条件随机场层,获得标签序列的概率,并将概率最高的标签作为对应的提问要素,获得所述提问文本信息对应的提问要素。
  15. 一种存储介质,其中,所述存储介质上存储有基于图谱化知识库的问答程序,所述基于图谱化知识库的问答程序被处理器执行时实现如下步骤:
    进行语音检测,在检测到用户的提问语音时,对所述提问语音进行语音识别,获得提问文本信息;
    通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素;
    获取预设图谱化知识库中多个存储问题的知识要素;
    将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配;
    若匹配成功,则将匹配成功的存储问题对应的答案进行展示。
  16. 如权利要求15所述的存储介质,其中,所述将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配之后,所述基于图谱化知识库的问答程序被处理器执行时还实现如下步骤:
    若匹配失败,则计算所述提问文本信息的提问要素与各存储问题的知识要素之间的相似度;
    获取所述相似度最高的存储问题作为最接近问题;
    根据所述最接近问题生成追问信息,并将所述追问信息进行展示。
  17. 如权利要求16所述的存储介质,其中,所述根据所述最接近问题生成追问信息,并将所述追问信息进行展示之后,所述基于图谱化知识库的问答程序被处理器执行时还实现如下步骤:
    进行语音检测,在检测到所述用户基于所述追问信息的回复语音信息时,对所述回复语音信息进行语音识别,获得回复文本信息;
    根据所述回复文本信息判断所述最接近问题是否与所述提问文本信息匹配;
    若所述最接近问题与所述提问文本信息匹配成功,则获取所述最接近问题对应的答案进行展示。
  18. 如权利要求16所述的存储介质,其中,所述根据所述最接近问题生成追问信息,包括:
    将所述最接近问题中各要素与所述提问文本信息中各要素按照要素类型分别进行匹配,匹配失败的要素作为差异内容;
    根据所述差异内容生成追问信息。
  19. 如权利要求15所述的存储介质,其中,所述获取预设图谱化知识库中多个存储问题的知识要素之前,所述基于图谱化知识库的问答程序被处理器执行时还实现如下步骤:
    获取现有问答对,根据预设分类算法对所述现有问答对进行分类,获得预设类型的类型问答对;
    通过预设要素识别模型对各类型问答对进行要素识别,获得各类型问答对的样本要素,并根据所述样本要素建立所述预设图谱化知识库。
  20. 一种基于图谱化知识库的问答装置,其中,所述基于图谱化知识库的问答装置包括:
    语音识别模块,用于进行语音检测,在检测到用户的提问语音时,对所述提问语音进行语音识别,获得提问文本信息;
    要素识别模块,用于通过预设要素识别模型对所述提问文本信息进行要素识别,获得所述提问文本信息对应的提问要素;
    获取模块,用于获取预设图谱化知识库中多个存储问题的知识要素;
    匹配模块,用于将所述提问文本信息的提问要素与各所述存储问题的知识要素进行匹配;
    展示模块,用于若匹配成功,则将匹配成功的存储问题对应的答案进行展示。
PCT/CN2020/122136 2019-10-29 2020-10-20 基于图谱化知识库的问答方法、设备、存储介质及装置 WO2021082982A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911041316.4 2019-10-29
CN201911041316.4A CN111046133B (zh) 2019-10-29 2019-10-29 基于图谱化知识库的问答方法、设备、存储介质及装置

Publications (1)

Publication Number Publication Date
WO2021082982A1 true WO2021082982A1 (zh) 2021-05-06

Family

ID=70232720

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/122136 WO2021082982A1 (zh) 2019-10-29 2020-10-20 基于图谱化知识库的问答方法、设备、存储介质及装置

Country Status (2)

Country Link
CN (1) CN111046133B (zh)
WO (1) WO2021082982A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113299294A (zh) * 2021-05-26 2021-08-24 中国平安人寿保险股份有限公司 任务型对话机器人交互方法、装置、设备及存储介质
CN115292461A (zh) * 2022-08-01 2022-11-04 北京伽睿智能科技集团有限公司 基于语音识别的人机交互学习方法及系统
CN115658852A (zh) * 2022-12-28 2023-01-31 北京百车宝科技有限公司 基于知识库的汽车智能问答系统
CN116303919A (zh) * 2022-11-30 2023-06-23 荣耀终端有限公司 一种问答方法及系统
CN117271886A (zh) * 2023-08-25 2023-12-22 广东美亚旅游科技集团股份有限公司 基于机票订单管理的数据搜索方法、系统、设备及介质
CN117473071A (zh) * 2023-12-27 2024-01-30 珠海格力电器股份有限公司 数据检索方法、装置、设备及计算机可读介质
CN117591657A (zh) * 2023-12-22 2024-02-23 宿迁乐享知途网络科技有限公司 一种基于ai的智能对话管理系统及方法
CN117725190A (zh) * 2024-02-18 2024-03-19 粤港澳大湾区数字经济研究院(福田) 基于大语言模型的多轮问答方法、系统、终端及存储介质
CN117873909A (zh) * 2024-03-13 2024-04-12 上海爱可生信息技术股份有限公司 故障诊断执行方法、故障诊断执行系统、电子设备及存储介质

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046133B (zh) * 2019-10-29 2023-07-25 平安科技(深圳)有限公司 基于图谱化知识库的问答方法、设备、存储介质及装置
CN111506770B (zh) * 2020-04-22 2023-10-27 新华智云科技有限公司 一种采访视频集锦生成方法和系统
CN111858876B (zh) * 2020-05-14 2024-05-10 北京捷通华声科技股份有限公司 一种知识库的生成方法、文本查找方法和装置
CN111883140B (zh) * 2020-07-24 2023-07-21 中国平安人寿保险股份有限公司 基于知识图谱和声纹识别的认证方法、装置、设备及介质
CN112148853A (zh) * 2020-09-15 2020-12-29 上海风秩科技有限公司 查询结果的确定方法及装置、存储介质、电子装置
CN112182252B (zh) * 2020-11-09 2021-08-31 浙江大学 基于药品知识图谱的智能用药问答方法及其设备
CN112380848B (zh) * 2020-11-19 2022-04-26 平安科技(深圳)有限公司 文本生成方法、装置、设备及存储介质
CN112860873B (zh) * 2021-03-23 2024-03-05 北京小米移动软件有限公司 智能应答方法、装置及存储介质
CN113076409A (zh) * 2021-04-20 2021-07-06 上海景吾智能科技有限公司 应用于机器人的对话系统及方法、机器人、可读介质
CN112989785B (zh) * 2021-04-27 2021-09-07 支付宝(杭州)信息技术有限公司 文本向量的获取方法和装置、文本相似度计算方法和装置
CN113157944A (zh) * 2021-04-30 2021-07-23 携程旅游网络技术(上海)有限公司 基于互动的知识图谱拓展方法、系统、设备及存储介质
CN113780561A (zh) * 2021-09-07 2021-12-10 国网北京市电力公司 电网调控运行知识库的构建方法及装置
CN114021546A (zh) * 2021-09-08 2022-02-08 北京市农林科学院信息技术研究中心 迁移语境网络的大桃生产知识开放问答方法及装置
CN115617976B (zh) * 2022-12-21 2023-07-07 安徽淘云科技股份有限公司 问答方法、装置、电子设备和存储介质
CN116303981B (zh) * 2023-05-23 2023-08-01 山东森普信息技术有限公司 一种农业社区知识问答方法、装置及存储介质
CN117194647B (zh) * 2023-11-03 2024-02-20 深圳墨影科技有限公司 一种用于离线环境的智能问答系统、方法及装置
CN117609466A (zh) * 2023-12-04 2024-02-27 北方工业大学 一种基于大数据分析的语音智能问答系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818164A (zh) * 2017-11-02 2018-03-20 东北师范大学 一种智能问答方法及其系统
CN108153876A (zh) * 2017-12-26 2018-06-12 爱因互动科技发展(北京)有限公司 智能问答方法及系统
CN109145168A (zh) * 2018-07-11 2019-01-04 广州极天信息技术股份有限公司 一种专家服务机器人云平台
US20190122111A1 (en) * 2017-10-24 2019-04-25 Nec Laboratories America, Inc. Adaptive Convolutional Neural Knowledge Graph Learning System Leveraging Entity Descriptions
US20190279104A1 (en) * 2018-03-07 2019-09-12 International Business Machines Corporation Unit conversion in a synonym-sensitive framework for question answering
CN111046133A (zh) * 2019-10-29 2020-04-21 平安科技(深圳)有限公司 基于图谱化知识库的问答方法、设备、存储介质及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909572A (zh) * 2015-12-23 2017-06-30 北京奇虎科技有限公司 一种问答知识库的构建方法和装置
CN109918650B (zh) * 2019-02-03 2020-10-23 北京大学 自动生成采访稿的采访智能机器人装置及智能采访方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190122111A1 (en) * 2017-10-24 2019-04-25 Nec Laboratories America, Inc. Adaptive Convolutional Neural Knowledge Graph Learning System Leveraging Entity Descriptions
CN107818164A (zh) * 2017-11-02 2018-03-20 东北师范大学 一种智能问答方法及其系统
CN108153876A (zh) * 2017-12-26 2018-06-12 爱因互动科技发展(北京)有限公司 智能问答方法及系统
US20190279104A1 (en) * 2018-03-07 2019-09-12 International Business Machines Corporation Unit conversion in a synonym-sensitive framework for question answering
CN109145168A (zh) * 2018-07-11 2019-01-04 广州极天信息技术股份有限公司 一种专家服务机器人云平台
CN111046133A (zh) * 2019-10-29 2020-04-21 平安科技(深圳)有限公司 基于图谱化知识库的问答方法、设备、存储介质及装置

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113299294B (zh) * 2021-05-26 2024-06-11 中国平安人寿保险股份有限公司 任务型对话机器人交互方法、装置、设备及存储介质
CN113299294A (zh) * 2021-05-26 2021-08-24 中国平安人寿保险股份有限公司 任务型对话机器人交互方法、装置、设备及存储介质
CN115292461B (zh) * 2022-08-01 2024-03-12 北京伽睿智能科技集团有限公司 基于语音识别的人机交互学习方法及系统
CN115292461A (zh) * 2022-08-01 2022-11-04 北京伽睿智能科技集团有限公司 基于语音识别的人机交互学习方法及系统
CN116303919A (zh) * 2022-11-30 2023-06-23 荣耀终端有限公司 一种问答方法及系统
CN115658852B (zh) * 2022-12-28 2023-10-24 北京百车宝科技有限公司 基于知识库的汽车智能问答系统
CN115658852A (zh) * 2022-12-28 2023-01-31 北京百车宝科技有限公司 基于知识库的汽车智能问答系统
CN117271886A (zh) * 2023-08-25 2023-12-22 广东美亚旅游科技集团股份有限公司 基于机票订单管理的数据搜索方法、系统、设备及介质
CN117591657A (zh) * 2023-12-22 2024-02-23 宿迁乐享知途网络科技有限公司 一种基于ai的智能对话管理系统及方法
CN117591657B (zh) * 2023-12-22 2024-05-07 宿迁乐享知途网络科技有限公司 一种基于ai的智能对话管理系统及方法
CN117473071A (zh) * 2023-12-27 2024-01-30 珠海格力电器股份有限公司 数据检索方法、装置、设备及计算机可读介质
CN117473071B (zh) * 2023-12-27 2024-04-05 珠海格力电器股份有限公司 数据检索方法、装置、设备及计算机可读介质
CN117725190A (zh) * 2024-02-18 2024-03-19 粤港澳大湾区数字经济研究院(福田) 基于大语言模型的多轮问答方法、系统、终端及存储介质
CN117725190B (zh) * 2024-02-18 2024-06-04 粤港澳大湾区数字经济研究院(福田) 基于大语言模型的多轮问答方法、系统、终端及存储介质
CN117873909A (zh) * 2024-03-13 2024-04-12 上海爱可生信息技术股份有限公司 故障诊断执行方法、故障诊断执行系统、电子设备及存储介质
CN117873909B (zh) * 2024-03-13 2024-05-28 上海爱可生信息技术股份有限公司 故障诊断执行方法、故障诊断执行系统、电子设备及存储介质

Also Published As

Publication number Publication date
CN111046133A (zh) 2020-04-21
CN111046133B (zh) 2023-07-25

Similar Documents

Publication Publication Date Title
WO2021082982A1 (zh) 基于图谱化知识库的问答方法、设备、存储介质及装置
CN109446430B (zh) 产品推荐的方法、装置、计算机设备及可读存储介质
CN113094578B (zh) 基于深度学习的内容推荐方法、装置、设备及存储介质
WO2020216064A1 (zh) 语音情感识别方法、语义识别方法、问答方法、计算机设备及计算机可读存储介质
CN111177310A (zh) 电力服务机器人智能场景会话方法及装置
CN113033438B (zh) 一种面向模态非完全对齐的数据特征学习方法
CN112487139A (zh) 基于文本的自动出题方法、装置及计算机设备
CN105956053A (zh) 一种基于网络信息的搜索方法及装置
CN112685550B (zh) 智能问答方法、装置、服务器及计算机可读存储介质
CN113990352B (zh) 用户情绪识别与预测方法、装置、设备及存储介质
CN113094478B (zh) 表情回复方法、装置、设备及存储介质
JP2019071089A (ja) 情報提示装置、および情報提示方法
CN111563373A (zh) 聚焦属性相关文本的属性级情感分类方法
CN110992988A (zh) 一种基于领域对抗的语音情感识别方法及装置
CN110647613A (zh) 一种课件构建方法、装置、服务器和存储介质
CN112259078A (zh) 一种音频识别模型的训练和非正常音频识别的方法和装置
CN114281948A (zh) 一种纪要确定方法及其相关设备
CN107734123A (zh) 一种联系人排序方法和装置
CN111159377B (zh) 属性召回模型训练方法、装置、电子设备以及存储介质
CN111462762B (zh) 一种说话人向量正则化方法、装置、电子设备和存储介质
CN110969005A (zh) 一种确定实体语料之间的相似性的方法及装置
CN112489689A (zh) 基于多尺度差异对抗的跨数据库语音情感识别方法及装置
CN112199958A (zh) 概念词序列生成方法、装置、计算机设备及存储介质
CN109190556B (zh) 一种公证意愿真实性鉴别方法
JP2021043530A (ja) 入力支援方法、入力支援システム、及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20883495

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20883495

Country of ref document: EP

Kind code of ref document: A1