WO2021000676A1 - 问答方法、问答装置、计算机设备及存储介质 - Google Patents

问答方法、问答装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021000676A1
WO2021000676A1 PCT/CN2020/093141 CN2020093141W WO2021000676A1 WO 2021000676 A1 WO2021000676 A1 WO 2021000676A1 CN 2020093141 W CN2020093141 W CN 2020093141W WO 2021000676 A1 WO2021000676 A1 WO 2021000676A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
candidate
input information
named entity
similarity
Prior art date
Application number
PCT/CN2020/093141
Other languages
English (en)
French (fr)
Inventor
朱威
梁欣
李春宇
丁佳佳
倪渊
谢国彤
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021000676A1 publication Critical patent/WO2021000676A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Definitions

  • This application relates to the field of artificial intelligence natural language processing, and in particular to a question answering method, question answering device, computer equipment and storage medium.
  • Question answering system is an advanced form of information retrieval system, it can use accurate and concise natural language to answer users' questions in natural language.
  • the traditional question answering system is divided into two parts: question processing and answer retrieval.
  • question processing the basis of question processing is word segmentation.
  • Answer retrieval mostly uses a scoring mechanism, that is, a series of candidate answers are selected from a large amount of text data, and then a selection function is constructed to select the closest answer from the candidate answers.
  • this traditional question-and-answer device has errors of varying degrees due to the difference in processing long text nouns and the constructed selection function.
  • the question answering system based on the knowledge graph came into being.
  • the first type rule type, this type uses fixed rules to determine whether a user question is asking for a certain fact in the knowledge base.
  • the second type template learning. This type collects a large number of templates, and learns the probability of a natural language question corresponding to a template through a large amount of data that has been marked with the corresponding knowledge base facts.
  • the third category Semantic matching based on deep learning, through the neural network model, learn the semantic similarity between a question and a relationship in the knowledge graph, in which the question has been entity recognition and the entities in the question are replaced with special symbols .
  • this application proposes a question answering method, question answering device, computer equipment, and storage medium, which can obtain an accurate answer when there is less question and answer corpus data.
  • this application proposes a question and answer method, which includes the steps:
  • this application also provides a question and answer device based on a learning ranking of Chinese knowledge graphs, the question and answer device includes:
  • the first obtaining module is used to obtain user input information
  • the recognition and linking module is used to identify the named entity in the input information and link the named entity to the candidate entity corresponding to the named entity in the Chinese knowledge graph to form an entity pair, wherein the entity To include the named entity and the candidate entity;
  • a matching module configured to match the candidate relationship of the candidate entity in the Chinese knowledge graph through a relationship model
  • a forming module is used to form a candidate triplet according to the entity pair and the candidate relationship; wherein the candidate triplet includes the named entity, the candidate entity and the candidate relationship;
  • the second acquiring module is used to acquire the ranking result corresponding to each candidate triplet based on the learning ranking model
  • the third obtaining module is configured to query the Chinese knowledge graph according to the ranking result to obtain the answer to the input information.
  • the present application also provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor implements the foregoing method when the computer program is executed. A step of.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and the computer program implements the steps of the foregoing method when executed by a processor.
  • the question and answer method, computer equipment and storage medium based on the knowledge graph proposed in this application can effectively use external resources, and effectively use external resources such as synonyms of related facts or online words through the width learning model.
  • External resources can be quickly obtained through text mining or directly using Chinese word styles.
  • the width learning model and the deep learning model the amount of data required by the model can be reduced, and better output results can be obtained when the training data is small. This is when developing new vertical domain knowledge graph question and answer Has a very important meaning.
  • FIG. 1 is a schematic flowchart of the question and answer method of the first embodiment of the present application
  • FIG. 3 is a schematic flowchart of the question and answer method of the third embodiment of the present application.
  • FIG. 5 is a schematic flowchart of the question and answer method of the fifth embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a question and answer device according to a sixth embodiment of the present application.
  • FIG. 7 is a schematic block diagram of the question answering device of the seventh embodiment of the present application.
  • Fig. 8 is a block diagram of the synonym collection unit in the question answering device of the eighth embodiment of the present application.
  • the question and answer method includes:
  • Step S110 Obtain user input information.
  • the input information may be natural query sentences (such as question sentences). For example, the user inputs a question sentence on a search website: "What medicine do I need to take for a cough?" This embodiment does not limit the way of obtaining the input information.
  • Step S120 Identify the named entity in the input information, and link the named entity to the candidate entity in the Chinese knowledge graph to form an entity pair, wherein the entity pair includes the named entity and the candidate entity .
  • the input information is sequence-labeled by using the label set method and the recurrent neural network model, and then the named entity recognition is completed according to the result of the sequence labeling (the specific steps will be described in detail in the second embodiment).
  • the named entity recognition is completed according to the result of the sequence labeling (the specific steps will be described in detail in the second embodiment). For example, "What kind of medicine do I need to take for a cough?", first mark it by the BIO annotation set method, obtain the vector information of the question according to the annotation result, and then use the vector information as the input of the recurrent neural network model to identify the "cough "This named entity. Then, the named entity is mapped to a Globally Unique Identifier (GUID) in the Chinese knowledge graph, so as to link the named entity to the corresponding candidate entity in the knowledge graph, such as cough.
  • GUID Globally Unique Identifier
  • each candidate entity in the knowledge graph uniquely corresponds to a GUID, through which different candidate entities in the Chinese knowledge graph can be distinguished.
  • the Chinese Knowledge Graph is a new technology for storing complex structured information.
  • a large amount of fact-based knowledge is stored in the Chinese knowledge graph, and entities and the relationship information between entities are stored inside.
  • Chinese knowledge graphs mostly store data in the format of RDF (Resource Description Framework).
  • a fact is represented as a (S, P, O) triplet in the form of (subject, predicate, object), where S and O are represented as entities , O is sometimes expressed as an attribute value, and P indicates the relationship between S and O.
  • Entity linking is an important method to solve the problem of named entity ambiguity. This method realizes the elimination of entity ambiguity by linking ambiguous entity references to a given knowledge graph.
  • each candidate entity and its corresponding name and alias in the Chinese knowledge graph obtain alias information, and reversely construct a dictionary from alias to candidate entity for entity linking.
  • the alias strings need to be unified, such as converting to lowercase characters, deleting special characters, etc., and sorting the entities in the alias dictionary by obtaining the frequency of the entity's appearance in the knowledge graph as the popularity.
  • we use the named entity to find the candidate entity in the alias dictionary, and select the top ranked entity as the candidate entity according to the popularity of the entity.
  • Step S130 Match the candidate relationship of the candidate entity in the Chinese knowledge graph through a relationship template.
  • the relationship template understands the semantics expressed by the user's input information (such as question sentences) through natural language understanding technology, and matches the relationship P in the triples (S, P, O) in the Chinese knowledge graph to This determines the semantics expressed by the input information and the corresponding candidate relationship in the Chinese knowledge graph.
  • the relationship template includes the first entity, the second entity, and the relationship between the first entity and the second entity.
  • the relationship template extracts some triples from the Chinese knowledge graph, and extracts relationship information from these triples, so that the relationship template corresponding to the relationship information is obtained through training according to the relationship information.
  • Step S140 Form a candidate triplet according to the entity pair and the candidate relationship; wherein the candidate triplet includes the named entity, the candidate entity and the candidate relationship.
  • the named entity identified in the above steps, the candidate entity and candidate relationship corresponding to the named entity in the Chinese knowledge graph are used to form each candidate triplet.
  • Step S150 Obtain a ranking result corresponding to each candidate triplet based on the learning ranking model.
  • each candidate triplet is converted into corresponding vector information and used as the input of the learning ranking model.
  • the ranking result corresponding to each candidate triplet is output.
  • the ranking results can be arranged according to the rule that the higher the ranking is, the more accurate the ranking is, the lower the ranking is, the less accurate the rules, or other methods, which are not limited in this embodiment.
  • the learning ranking model is calculated using a learning ranking algorithm.
  • Learning to rank is a sorting method of supervised learning (SL).
  • LTR generally has three types of methods: single document method (Pointwise), document pair method (Pairwise), document list method (Listwise).
  • the learning ranking algorithm adopts the document pair method (Pairwise).
  • the learning ranking model is obtained by training a second sample formed by a first sample and each candidate triple, where the first sample is a triple formed by standard answers of the input information.
  • the first sample is a triple formed by standard answers of the input information.
  • 10 candidate entities are randomly selected in the Chinese knowledge graph, and candidate relationships are collected based on this candidate entity, and finally 50 three Negative samples (N) composed of tuples (named entities, candidate entities, candidate relationships).
  • the triple (named entity, candidate entity, candidate relationship) of the standard answer is a positive sample (P).
  • the positive sample (P) and one of the negative samples (N) are combined to generate two samples, namely (P, N) sample and (N, P) sample.
  • the label of (P, N) sample is 1, and the label of (N, P) sample is 0.
  • the learning ranking model can be trained and obtained based on the sample.
  • Step S160 Query the Chinese knowledge graph according to the ranking result to obtain an answer to the input information.
  • the candidate triples ranked before the preset value are selected, and then these selected candidate triples are converted into query languages in the Chinese knowledge graph, so as to obtain the information in Chinese knowledge
  • the query statement is executed in the graph, and the answer corresponding to the input information is returned after the query.
  • the question answering method can effectively utilize external resources by using the learning ranking model, and can obtain accurate answers to the user's questions when the question and answer corpus data is less.
  • the step of identifying the named entity in the input information in step S120 in the first embodiment includes:
  • Step S210 Annotate the input information, and obtain an annotation result.
  • X (x1,x2,...,xn)
  • xi represents each word in the question
  • each word in the question is labeled using the BIEO labeling method
  • B is a named entity
  • I means inside the named entity
  • E means the end of the named entity
  • O means it is not a named entity
  • Y (y1,y2,...,yn) represents the labeling result
  • the score of the labeling result obtained by this labeling method is:
  • the matrix P ⁇ R ⁇ (K ⁇ n) is the state characteristic matrix of the conditional random field
  • Pi,j represents the score of the jth word in the sentence marked as the i-th tag
  • a ⁇ R ⁇ ((K+ 2) ⁇ (k+2)) represents the state transition matrix
  • its elements Ai,j represent the score of transition from the i-th tag to the j-th tag.
  • the labeling method may also be other labeling methods, such as BIO, BIOES, etc., which is not limited in this embodiment.
  • the labeling of question sentences by the BIEO labeling set method is as follows: Fishing (O) Fish (O) vs.
  • Step S220 According to the labeling result, the named entity in the input information is identified through the recurrent neural network model.
  • the labeling results in the above steps are used to obtain the labeling result of each character, and then the vector information of each character is obtained according to the labeling result. For example, convert the labeling result of each word into a one-hot vector, then map the one-hot vector of each word to a low-dimensional dense word vector, and then combine the word vectors of each word in the sentence in turn Arrange to obtain the vector information of the entire sentence. Then input the vector information of the entire sentence into the recurrent neural network model to identify the named entity in the question sentence. Among them, the recurrent neural network model can calculate the probability of the label corresponding to each word in the input information in the question sentence and obtain the optimal label sequence. The optimal tag sequence is the identified named entity.
  • the cyclic neural network model may be a two-way long and short-term memory cyclic neural network model, or a conditional random field model, which is not limited in this embodiment.
  • the question and answer method further includes:
  • Step S310 Calculate the similarity between the named entity and the candidate entity in each of the entity pairs, where the similarity is based on the similarity of Chinese characters, the similarity of pinyin characters, the similarity of word vectors, and the interest of entities Degree.
  • the Chinese character character similarity, pinyin character similarity, word vector similarity, and entity attention degree between the named entity and the candidate entity in the entity pair are calculated, and the similarities are synthesized to obtain the corresponding similarity of each entity pair .
  • the method of calculating similarity is based on the bag-of-words model. After the named entity and the candidate entity are vectorized, they are transformed into calculating the distance in the space. The smaller the distance, the higher the similarity. There is also the calculation of the cosine of the angle between two vectors.
  • the magnitude of the cosine can directly reflect the similarity, that is, the smaller the cosine, the higher the similarity; this embodiment does not limit the method for calculating the similarity.
  • the similarity is calculated separately in different aspects such as Chinese characters, pinyin characters, word vectors, and attention degree, and finally the similarity is synthesized, so that the similarity between the named entity and the candidate entity can be judged more accurately. Conducive to find the best candidate entity.
  • step S320 each entity pair is sorted according to the similarity to obtain a ranking corresponding to each entity pair.
  • each entity pair is sorted according to the similarity degree, and then the ranking of each entity pair among all the entity pairs is obtained.
  • the higher the similarity the higher the matching degree between the candidate entity and the named entity, and the lower the similarity, the lower the matching degree between the candidate entity and the named entity.
  • Step S330 Select the corresponding entity pair according to the ranking.
  • each entity pair whose ranking is before the preset ranking is selected.
  • the preset ranking can be set according to the actual situation.
  • the preset ranking is 10th, so that the top ten entity pairs can be selected, and the candidate entities in the selected entity pairs are also closer to the named entities in the input information.
  • step S150 includes:
  • Step S410 Calculate each feature vector corresponding to each of the triples.
  • each triplet is converted into one-hot vectors, and then mapped to low-dimensional dense word vectors, and finally each word vector is arranged to obtain each three The feature vector of the tuple.
  • Step S420 Input each of the feature vectors into the learning ranking model to obtain a ranking result corresponding to each of the candidate triples.
  • each feature vector is used as the input of the learning ranking model, and after the calculation of the learning ranking model, the ranking result corresponding to each triplet is output.
  • step S410 includes:
  • Step S510 Calculate the first similarity feature between the named entity and the candidate entity according to the triplet.
  • the first similarity feature between the named entity and the candidate entity is calculated.
  • the first similarity feature may be a similarity value.
  • Step S520 Remove the named entity from the input information to obtain remaining words, and calculate the second similarity feature between the remaining words, synonyms and context words.
  • the words in the phrase are removed from the input information entered by the user, some remaining words or words are obtained, and the similarity characteristics of these words or words with words in adjacent phrases are calculated, and these words are also calculated. Or the similarity feature of the word and its synonyms, the two parts of the similarity feature are combined to obtain the second similarity feature.
  • Step S530 Generate a high-dimensional vector according to the input information, wherein the high-dimensional vector is generated according to whether a preset vocabulary exists in the input information.
  • a high-dimensional vector corresponding to the question is generated according to whether the words in the question appear in the preset vocabulary.
  • Each position in the high-dimensional vector represents a word. If the word exists in the natural language question, then the value at that position is 1, otherwise it is 0.
  • Step S540 Generate the feature vector according to the first similarity feature, the second similarity feature, and the high-dimensional vector.
  • the first similarity feature value, the second similarity feature and the high-dimensional vector are spliced together to obtain the final feature vector.
  • a question answering device 600 based on a Chinese knowledge graph based on learning ranking is provided.
  • the question and answer device 600 includes:
  • the first obtaining module 610 is used for obtaining user input information.
  • the input information may be natural query sentences (such as question sentences). For example, the user inputs a question sentence on a search website: "What medicine do I need to take for a cough?" This embodiment does not limit the way of obtaining the input information.
  • the recognition and linking module 620 is configured to recognize a named entity in the input information, and link the named entity to a candidate entity corresponding to the named entity in the Chinese knowledge graph to form an entity pair, wherein The entity pair includes the named entity and the candidate entity.
  • the input information is sequence-labeled by using the label set method and the cyclic neural network model, and then the named entity recognition is completed according to the result of the sequence labeling (the specific steps will be described in detail in the second embodiment).
  • the named entity recognition is completed according to the result of the sequence labeling (the specific steps will be described in detail in the second embodiment). For example, "What kind of medicine do I need to take for a cough?", first mark it by the BIO annotation set method, obtain the vector information of the question according to the annotation result, and then use the vector information as the input of the recurrent neural network model to identify the "cough "This named entity. Then, the named entity is mapped to a Globally Unique Identifier (GUID) in the Chinese knowledge graph, so as to link the named entity to the corresponding candidate entity in the knowledge graph.
  • GUID Globally Unique Identifier
  • each candidate entity in the knowledge graph uniquely corresponds to a GUID, through which different candidate entities in the Chinese knowledge graph can be distinguished.
  • the Chinese Knowledge Graph is a new technology for storing complex structured information.
  • a large amount of fact-based knowledge is stored in the Chinese knowledge graph, and entities and the relationship information between entities are stored inside.
  • Chinese knowledge graphs mostly store data in the format of RDF (Resource Description Framework).
  • a fact is represented as a (S, P, O) triplet in the form of (subject, predicate, object), where S and O are represented as entities , O is sometimes expressed as an attribute value, and P indicates the relationship between S and O.
  • Entity linking is an important method to solve the problem of named entity ambiguity. This method realizes the elimination of entity ambiguity by linking ambiguous entity references to a given knowledge graph.
  • the matching module 630 is configured to match the candidate relationship of the candidate entity in the Chinese knowledge graph through a relationship model.
  • the relationship template understands the semantics expressed by the user's input information (such as question sentences) through natural language understanding technology, and matches the relationship P in the triples (S, P, O) in the Chinese knowledge graph to This determines the semantics expressed by the input information and the corresponding candidate relationship in the Chinese knowledge graph.
  • the relationship template extracts some triples from the Chinese knowledge graph, and extracts relationship information from these triples, so as to obtain relationship templates corresponding to the relationship information through training according to the relationship information.
  • the forming module 640 is configured to form a candidate triplet according to the entity pair and the candidate relationship; wherein the candidate triplet includes the named entity, the candidate entity, and the candidate relationship.
  • the named entity identified in the above steps, the candidate entity and candidate relationship corresponding to the named entity in the Chinese knowledge graph are used to form each candidate triplet.
  • the second acquiring module 650 is configured to acquire the ranking result corresponding to each candidate triplet based on the learning ranking model.
  • each candidate triple is used as the input of the learning ranking model, and after a series of calculations of the learning ranking model, the ranking result corresponding to each candidate triple is output.
  • the ranking results can be arranged according to the rule that the higher the ranking is, the more accurate the ranking is, the lower the ranking is, the less accurate the rules, or other methods, which are not limited in this embodiment.
  • the learning ranking model is calculated using a learning ranking algorithm.
  • Learning to rank is a sorting method of supervised learning (SL).
  • LTR generally has three types of methods: single document method (Pointwise), document pair method (Pairwise), document list method (Listwise).
  • the learning ranking algorithm adopts the document pair method (Pairwise).
  • the learning ranking model is obtained by training a second sample formed by a first sample and each candidate triple, where the first sample is a triple formed by standard answers of the input information.
  • the first sample is a triple formed by standard answers of the input information.
  • 10 candidate entities are randomly selected in the Chinese knowledge graph, and candidate relationships are collected based on this candidate entity, and finally 50 three Negative samples (N) composed of tuples (named entities, candidate entities, candidate relationships).
  • the triple (named entity, candidate entity, candidate relationship) of the standard answer is a positive sample (P).
  • the positive sample (P) and one of the negative samples (N) are combined to generate two samples, namely (P, N) sample and (N, P) sample.
  • the label of (P, N) sample is 1, and the label of (N, P) sample is 0.
  • the learning ranking model can be trained and obtained based on the sample.
  • the third obtaining module 660 is configured to query the Chinese knowledge graph according to the ranking result to obtain an answer to the input information.
  • the candidate triples ranked before the preset value are selected, and then these selected candidate triples are converted into query languages in the Chinese knowledge graph, so as to obtain the information in Chinese knowledge
  • the query statement is executed in the graph, and the answer corresponding to the input information is returned after the query.
  • the question answering device 600 based on the learning and sorting Chinese knowledge graph further includes an offline module 700 for preparing for the operation of the above question answering device.
  • the offline module 700 includes an entity mention rate unit 710, a synonym collection unit 720, a context mining unit 730, a question template unit 740, and a learning ranking unit 750.
  • the entity mention rate unit 710 is used to score the number of times the candidate entity in the Chinese knowledge graph is mentioned. Specifically, the mention rate of the candidate entity in the Chinese knowledge graph is scored, where the mention rate indicates the degree of attention of the candidate entity by the user. This part can use the ranking of the mention rate that has been done (for example: the list of drugs that patients care about most), or it can calculate the frequency of the entity being mentioned by users by crawling online user questions.
  • the synonym collection unit 720 is used to collect the relationship name of each candidate relationship in the Chinese knowledge graph, where the relationship name includes the standard name and the synonyms of the standard name.
  • each candidate relationship in the Chinese knowledge graph has a standard name, for example, the relationship "xx drugs treat xx diseases".
  • the standard name is...indications..., but due to the diversity of Chinese natural language, users may say "What does xx medicine treat?" "What is the function of xx medicine” and so on. So you need to collect synonyms for the relationship name (or relationship predicate).
  • the synonym collection unit 612 is used to collect the relationship name of each candidate relationship in the Chinese knowledge graph.
  • the relationship name includes the standard name and the synonyms of the standard name, so as to ensure the accuracy of the subsequent question and answer.
  • the context mining unit 730 is used to find the connection relationship between two candidate entities in the Chinese knowledge graph based on a text mining method. Specifically, the context mining unit is completely based on remotely supervised text mining. There may be multiple connection relationships between two candidate entities (considering the fact that the longest 2-hop triplet). In the text collection of the professional field, find a sentence in which these two candidate entities appear at the same time, and perform a dependent syntax tree analysis on this sentence. If the minimum path length of the two entities on the dependent syntax tree is less than or equal to 4, this is the shortest The word on the path serves as the context word for the relationship (there may be more than one) between the two candidate entities (if the word is not a synonym for the relationship).
  • the question template unit 740 is used to divide the question sentence into predefined question sentence forms. Specifically, the question sentences are divided according to the pre-defined question sentence form, so that the search in the Chinese knowledge graph is more convenient and efficient. This step can specify that the compared relational space is within two or three hops of the subject entity.
  • the learning ranking unit 750 is used to obtain training data according to the question sentence. Specifically, the learning ranking unit obtains training data according to the question sentence, and is based on a ranking algorithm of pairwise learning to rank. Although there may be less Q&A expected data, the training data can be expanded by generating negative samples to obtain a better Q&A model.
  • the synonym collection unit 720 includes a labeling subunit 721, a recording frequency subunit 722, and a manual review subunit 723.
  • the labeling subunit 721 is used to label the relationship between the entity in the question and the candidate entity in the knowledge graph.
  • the recording frequency subunit 722 is used to remove the entity name, stop words and punctuation in the question sentence, obtain the remaining words, use the word frequency-inverse text frequency method to score the remaining words, obtain the scores of the remaining words, and record the scores. The remaining words whose value exceeds the preset value.
  • the recording frequency sub-unit 722 obtains the remaining words in the question by removing the entity name, stop words, and punctuation in the question, and uses the word frequency-inverse text frequency method to score the remaining words and collect the scores Higher words, such as the top 15 words.
  • TF-IDF is the abbreviation of Term Frequency-Inverse Document Frequency, that is, "term frequency-inverse text frequency”. It consists of two parts, TF and IDF.
  • TF means word frequency.
  • the previous vectorization is to make statistics on the frequency of occurrence of each word in the text and use it as a text feature.
  • IDF or "Inverse Text Frequency”. Some words have high word frequency but low importance. IDF is to help us reflect the importance of this word, and then correct the word feature value represented by word frequency only.
  • IDF reflects the frequency of a word in all texts. If a word appears in many texts, its IDF value should be low, such as "I”. Conversely, if a word appears in a relatively small amount of text, its IDF value should be high. For example, some professional terms such as "machine learning”. The IDF value of such words should be high. In an extreme case, if a word appears in all texts, its IDF value should be 0.
  • the manual review subunit 823 is used to manually filter out unreasonable synonyms under each relationship. Specifically, if you want the synonym set to be more accurate, you can do a certain manual review, that is, manually filter out unreasonable synonyms for each type of relationship.
  • This application also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a cabinet server (including independent servers, or more A server cluster composed of two servers), etc.
  • the computer equipment in this embodiment at least includes but is not limited to: a memory, a processor, etc., which can be communicatively connected to each other through a device bus.
  • This embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), only Readable memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, servers, App application malls, etc., on which computer programs are stored, The corresponding function is realized when the program is executed by the processor.
  • the computer-readable storage medium of this embodiment is used to store the electronic device 20, and when executed by a processor, the question and answer method of the present application is implemented.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. ⁇
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种问答方法、装置、计算机设备及存储介质,包括:获取用户的输入信息;识别输入信息中的命名实体,并将命名实体链接至中文知识图谱中与命名实体相对应的候选实体,形成实体对,其中实体对包括命名实体和候选实体;通过关系模型在中文知识图谱中匹配出候选实体的候选关系;根据实体对和候选关系,形成候选三元组,其中候选三元组包括命名实体、候选实体及候选关系;基于学习排序模型,获取各候选三元组相对应的排序结果;及根据排序结果,查询所述中文知识图谱,以获取输入信息的答案。该方法可以有效利用外部资源,通过文本挖掘可以提供大量的语境信息,并且基于学习排序模型,在问答语料数据较少时也可以获得较好的回答。

Description

问答方法、问答装置、计算机设备及存储介质
本申请要求于2019年7月3日提交中国专利局、申请号为201910593110.6,发明名称为“问答方法、问答装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能自然语言处理领域,尤其涉及一种问答方法、问答装置、计算机设备及存储介质。
背景技术
问答系统是信息检索系统的一种高级形式,它能用准确、简洁的自然语言回答用户用自然语言提出的问题。传统的问答系统分为问句处理和答案检索两大部分。其中,问句处理的基础是分词。答案检索多采用评分机制,即从海量文本数据中选取一系列候选答案,然后构建选择函数从候选答案中选取最接近的答案。而这种传统的问答装置因为在处理长文本名词和构建的选择函数的不同而出现不同程度的错误。
在这种情况下,基于知识图谱的问答系统应运而生。目前,基于知识图谱的问答系统主要的研究方向有三类。第一类:规则型,这一类型通过固定的规则来判定一个用户问句是否是在询问知识库中的某个事实。第二类:模板学习,这一类型收集大量模板,通过已经标注好对应的知识库事实,并通过大量数据学习一个自然语言问句对应某个模板的概率。第三类:基于深度学习的语义匹配,通过神经网络模型,学习一个问句与知识图谱中某个关系的语义相似度,其中问句已做过实体识别并将问句中实体用特殊符号代替。
发明人意识到,规则型的知识库问答系统精确度很高,但是并不灵活,每一类问题都需要写一个规则,而模板学习和深度学习方法往往需要根据大规模问答语料进行学习,难以在开发初期就应用在一个问答数据匮乏的垂直领域。
发明内容
有鉴于此,本申请提出一种问答方法、问答装置、计算机设备及存储介质,能够在问答语料数据较少的情况下,得到一个准确的回答。
首先,为实现上述目的,本申请提出一种问答方法,该问答方法包括步骤:
获取用户的输入信息;
识别所述输入信息中的命名实体,并将所述命名实体链接至所述中文知识图谱中与所述命名实体相对应的候选实体,形成实体对,其中所述实体对包括所述命名实体和所述候选实体;
通过关系模型在所述中文知识图谱中匹配出所述候选实体的候选关系;
根据所述实体对和所述候选关系,形成候选三元组;其中所述候选三元组包括所述命 名实体、所述候选实体及所述候选关系;
基于学习排序模型获取各所述候选三元组相对应的排序结果;及
根据所述排序结果查询所述中文知识图谱,以获取所述输入信息的答案。
为实现上述目的,本申请还提供一种基于学习排序的中文知识图谱的问答装置,所述问答装置包括:
第一获取模块,用于获取用户的输入信息;
识别及链接模块,用于识别所述输入信息中的命名实体,并将所述命名实体链接至所述中文知识图谱中与所述命名实体相对应的候选实体,形成实体对,其中所述实体对包括所述命名实体和所述候选实体;
匹配模块,用于通过关系模型在所述中文知识图谱中匹配出所述候选实体的候选关系;
形成模块,用于根据所述实体对和所述候选关系,形成候选三元组;其中所述候选三元组包括所述命名实体、所述候选实体及所述候选关系;
第二获取模块,用于基于学习排序模型获取各所述候选三元组相对应的排序结果;及
第三获取模块,用于根据所述排序结果查询所述中文知识图谱,以获取所述输入信息的答案。
为实现上述目的,本申请还提供一种计算机设备,包括存储器、处理器以及存储在存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述方法的步骤。
为实现上述目的,本申请还提供计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述方法的步骤。
相较于传统技术,本申请所提出的基于知识图谱的问答方法、计算机设备及存储介质,能够有效的利用外部资源,通过宽度学习模型有效利用关系事实的同义词或上线文词等外部资源,这一部分外部资源可以通过文本挖掘或者直接利用中文词体等方式快速得到。也通过宽度学习模型和深度学习模型的结合,能够降低模型所需的数据量,在训练数据较少的时候也能得到较好的输出结果,这在开发新的垂直领域的知识图谱问答的时候有着非常重要的意义。
附图说明
图1是本申请第一实施例之问答方法的流程示意图;
图2是本申请第二实施例之问答方法的流程示意图;
图3是本申请第三实施例之问答方法的流程示意图;
图4是本申请第四实施例之问答方法的流程示意图;
图5是本申请第五实施例之问答方法的流程示意图;
图6是本申请第六实施例之问答装置的方框示意图;
图7是本申请第七实施例之问答装置的方框示意图;及
图8是本申请第八实施例之问答装置中同义词收集单元的方框示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
请参考图1,第一实施例中提供了一种问答方法。该问答方法包括:
步骤S110:获取用户的输入信息。
其中,输入信息可以是自然查询语句(如问句),如用户在搜索网站上输入问句:“咳嗽需要吃什么药?”本实施例对获取输入信息的方式不做限定。
步骤S120:识别所述输入信息中的命名实体,并将所述命名实体链接至所述中文知识图谱中的候选实体,形成实体对,其中所述实体对包括所述命名实体和所述候选实体。
具体地,通过利用标注集方法、循环神经网络模型对输入信息进行序列标注,再根据序列标注的结果从而完成命名实体的识别(具体步骤将在第二实施例中详细介绍)。例如“咳嗽需要吃什么药?”,先通过BIO标注集方法对其进行标注,根据标注结果获取该问句的向量信息,再将该向量信息作为循环神经网络模型的输入,从而识别出“咳嗽“这一命名实体。接着,将该命名实体对应到中文知识图谱中的全局唯一标识符(Globally Unique Identifier,GUID),从而将该命名实体链接到知识图谱中对应的候选实体,如咳嗽。另外,知识图谱中的每个候选实体都唯一对应一个GUID,通过该GUID可以区分中文知识图谱中的不同候选实体。
其中,中文知识图谱是一种储存复杂结构化信息的新型技术。中文知识图谱中存储了大量事实型知识,其内部存储了实体及实体间的关系信息。中文知识图谱多以RDF(Resource Description Framework)的格式存储数据,一条事实被表示为一个(S,P,O)三元组,形如(subject,predicate,object),其中S和O表示为实体,O有时也表示为属性值,P表示S和O之间的关系。实体链接是解决命名实体歧义问题的一种重要方法,该方法通过将具有歧义的实体指称项链接到给定的知识图谱中从而实现实体歧义的消除。
另外,由于命名实体均存在别名或其他信息,中文知识图谱中的每个候选实体及其对应的名字和别名,获取别名信息,反向构建别名到候选实体的词典用于实体链接。构建词典 时,需要对别名字符串进行统一化处理,比如转换为小写字符,删去特殊字符等,并且通过获取实体在知识图谱种出现的频率作为知名度对别名词典中的实体进行排序。识别出命名实体识别后,我们使用命名实体在别名词典中查找得到候选实体,并且根据实体的知名度选取排名靠前的作为候选实体。
步骤S130:通过关系模板在所述中文知识图谱中匹配出所述候选实体的候选关系。
具体地,关系模板通过自然语言理解技术理解用户的输入信息(如问句)所表达的语义,并与中文知识图谱中的三元组(S,P,O)中的关系P进行匹配,以此确定该输入信息所表达的语义与中文知识图谱中对应的候选关系。其中,关系模板包括第一实体、第二实体及第一实体与第二实体之间的关系。关系模板是通过在中文知识图谱中提取一些三元组,并从这些三元组中提取关系信息,从而根据这些关系信息经过训练得到与这些关系信息对应的关系模板。
步骤S140:根据所述实体对和所述候选关系,形成候选三元组;其中所述候选三元组包括所述命名实体、所述候选实体及所述候选关系。
具体地,利用上述步骤所识别出的命名实体,该命名实体在中文知识图谱中对应的候选实体以及候选关系,从而形成各候选三元组。
步骤S150:基于学习排序模型获取各所述候选三元组相对应的排序结果。
具体地,将各候选三元组转换为相应的向量信息,并作为学习排序模型的输入,经过该学习排序模型的一系列计算,从而输出与各候选三元组相对应的排序结果。其中排序结果可根据排名越靠前越准确,排名越靠后越不准确的规则进行排列,也可以是其他方式,本实施例不做限定。
其中,学习排序模型是利用学习排序算法进行计算的。学习排序算法(Learning to rank,LTR)是一种监督学习(Supervised Learning,SL)的排序方法。LTR一般说来有三类方法:单文档方法(Pointwise),文档对方法(Pairwise),文档列表方法(Listwise)。本实施例中学习排序算法采用的是文档对方法(Pairwise)。
在一实施例中,学习排序模型是通过训练第一样本和各候选三元组所形成的第二样本得到的,其中第一样本是由所输入信息的标准答案构成的三元组。例如,根据一个问句的标准答案的三元组(命名实体、候选实体、候选关系),在中文知识图谱中随机采取10个候选实体,在依据这个候选实体采集候选关系,最终得到50个三元组(命名实体、候选实体、候选关系)组成的负样本(N)。其中该标准答案的三元组(命名实体、候选实体、候选关系)为正样本(P)。将该正样本(P)与负样本(N)中的一个结合,生成两个样本,即即(P,N)样本和(N,P)样本。(P,N)样本的标签为1,(N,P)样本的标签为0。学习排序模型可根据该样本训练并得到。
步骤S160:根据所述排序结果查询所述中文知识图谱,以获取所述输入信息的答案。
具体地,根据各候选三元组的排序结果,选取排名在预设值之前的候选三元组,再将所选取的这些候选三元组转换为中文知识图谱中的查询语言,从而在中文知识图谱中执行该 查询语句,查询后返回该输入信息对应的答案。
总之,该问答方法可以利用学习排序模型有效地利用外部资源,在问答语料数据较少的情况下,对于用户的问题也能获得准确的回答。
在第二实施例中,请参考图2,第一实施例中的步骤S120中识别所述输入信息中的命名实体这一步骤包括:
步骤S210,对所述输入信息进行标注,获取标注结果。
假设用户输入的问句为q:X=(x1,x2,…,xn),xi表示问句中的每个词,使用BIEO标注方法标注问句中的每个字,“B”是命名实体的开始,“I”表示命名实体内部,“E”表示命名实体的结束,“O”表示不是命名实体。Y=(y1,y2,…,yn)表示标注结果,通过该标注方法得到的标注结果的得分为:
S(X,y)=∑_(i=0)^nA_(y_i,y_(i+1))+∑_(i=1)^nP_(i,yi)
其中矩阵P∈R^(K×n)即为条件随机场的状态特征矩阵,Pi,j表示句子中的第j个字被标注为第i种标签的得分,A∈R^((K+2)×(k+2))表示状态转移矩阵,其元素Ai,j表示从第i种标签转移到第j种标签的得分。标注方法也可以是其他标注方法,如BIO,BIOES等,本实施例不做限定。例如,通过BIEO标注集方法对问句的标注情况如下:钓(O)鱼(O)比(O)赛(O)在(O)厦(B-LOC)门(I-LOC)市(E-LOC)举(O)行(O)。采用标注集是为了尽可能的减少噪点,从而识别及提取出的实体准确率更高。
步骤S220,根据所述标注结果,通过循环神经网络模型,识别出所述输入信息中的命名实体。
具体地,将上述步骤中的标注结果,从而获取到每个字的标注结果,再根据该标注结果,获得每个字的向量信息。例如,将每个字的标注结果转换为one-hot向量,再将每个字的one-hot向量映射为低维稠密的字向量,然后将该句子中的每个字的字向量进行依次组合排列,从而获得整个句子的向量信息。再将整个句子的向量信息输入至循环神经网络模型中,即可识别出问句中的命名实体。其中循环神经网络模型可计算出问句中的输入信息中的每个字对应的标签的概率并获得最优标签序列。其中该最优标签序列即为识别出的命名实体。循环神经网络模型可以是双向长短时记忆循环神经网络模型,也可以是条件随机场模型等,本实施例对其不做限定。
在第三个实施例中,请参考图3,第一个实施例中步骤S130之后,所述问答方法还包括:
步骤S310,计算各所述实体对中所述命名实体与所述候选实体之间的相似度,其中所述相似度是根据汉字字符相似度、拼音字符相似度、词向量相似度及实体受关注度得到的。
具体地,计算出实体对中命名实体与候选实体之间的汉字字符相似度、拼音字符相似度、词向量相似度及实体受关注度,综合各相似度从而得到各实体对相对应的相似度。其中,相似度越高说明命名实体与候选实体之间越相似。计算相似度的方法有基于词袋模型的,通过将命名实体和候选实体向量化之后,转化为计算空间中的距离,距离越小相似度越高;也 有计算两个向量间夹角的余弦。该余弦的大小可直接反映相似度,即余弦越小相似度越高;本实施例对计算相似度的方法不做限定。本实施例通过在汉字字符、拼音字符、词向量及关注度等不同方面分别计算相似度,最后再综合得到相似度,从而可以更准确的判断命名实体与候选实体之间的相似程度,也有利于找到最优的候选实体。
步骤S320,根据各所述相似度对各实体对排序,以获取各所述实体对对应的排列名次。
具体地,根据上述步骤计算出来的相似度,从而按照相似度的大小对各实体对进行排序,进而得到各实体对在所有实体对中的排列名次。其中,相似度越高,说明候选实体与命名实体的匹配程度越高,相似度越低说明候选实体与命名实体的匹配程度越低。
步骤S330,根据所述排列名次选取相对应的所述实体对。
具体地,选取排列名次在预设名次之前的各实体对。其中,预设名次可根据实际情况进行设定。在本实施例中,预设名次为第10名,从而可以选取出排名前十的实体对,所选取的实体对中的候选实体也更接近输入信息中的命名实体。
在第四个实施例中,请参考图4,第一个实施例中步骤S150包括;
步骤S410,计算各所述三元组相对应的各特征向量。
具体地,将各三元组中的命名实体,候选实体及候选关系转换为one-hot向量,再映射为低维稠密的字向量,最后再对每个字向量进行排列,获取到每个三元组的特征向量。
步骤S420,将各所述特征向量输入至所述学习排序模型中以获取各所述候选三元组相对应的排序结果。
具体地,将各特征向量作为学习排序模型的输入,经过学习排序模型的计算,输出各三元组相对应的排序结果。
在第五个实施例中,请参考图5,第四个实施中步骤S410包括:
步骤S510,根据所述三元组计算所述命名实体与所述候选实体之间的第一相似度特征。
具体地,对于三元组(命名实体、候选实体、候选关系),计算命名实体与候选实体之间的第一相似度特征。其中该第一相似度特征可以为相似值。
步骤S520,除去所述输入信息中的所述命名实体以获取剩余词语,并计算所述剩余词语与同义词以及上下文词汇之间的第二相似度特征。
具体地,将用户所输入的输入信息中除去词组中的词,获取到所剩余的的一些字或词,计算这些字或词与相邻的词组中的词的相似度特征,还计算这些字或词与其同义词的相似度特征,两部分相似度特征结合从而获得第二相似度特征。
步骤S530,根据所述输入信息生成高维向量,其中所述高维向量是依据所述输入信息中是否存在预设词汇生成的。
具体地,对于用户输入的自然语言问句,根据该问句中的字是否出现在预设词汇中,从而生成与该问句相应的高维向量。其中该高维向量中的每一个位置都代表一个字,如果该自然语言问句中存在该字,那么在该位置的值为1,否则为0。例如用户输入“阿司匹林是哪些病人吃的”,如果只有阿司匹林这四个字存在于预设词汇中,那么该问句所对应的高维 向量就是在阿司匹林这四个字出现的位置为1,其他均为0,该高维向量的维度可根据实际情况设定。
步骤S540,根据所述第一相似度特征、所述第二相似度特征和所述高维向量,生成所述特征向量。
具体地,将第一相似度特征值、第二相似度特征和高维向量进行拼接,从而获取到最终的特征向量。
在第六个实施例中,请参考图6,提供了一种基于学习排序的中文知识图谱的问答装置600。该问答装置600包括:
第一获取模块610,用于获取用户的输入信息。
其中,输入信息可以是自然查询语句(如问句),如用户在搜索网站上输入问句:“咳嗽需要吃什么药?”本实施例对获取输入信息的方式不做限定。
识别及链接模块620,用于识别所述输入信息中的命名实体,并将所述命名实体链接至所述中文知识图谱中与所述命名实体相对应的候选实体,形成实体对,其中所述实体对包括所述命名实体和所述候选实体。
具体地,通过利用标注集方法、循环神经网络模型对输入信息进行序列标注,再根据序列标注的结果完成命名实体的识别(具体步骤将在第二实施例中详细介绍)。例如“咳嗽需要吃什么药?”,先通过BIO标注集方法对其进行标注,根据标注结果获取该问句的向量信息,再将该向量信息作为循环神经网络模型的输入,从而识别出“咳嗽“这一命名实体。接着,将该命名实体对应到中文知识图谱中的全局唯一标识符(Globally Unique Identifier,GUID),从而将该命名实体链接到知识图谱中对应的候选实体。另外,知识图谱中的每个候选实体都唯一对应一个GUID,通过该GUID可以区分中文知识图谱中的不同候选实体。
其中,中文知识图谱是一种储存复杂结构化信息的新型技术。中文知识图谱中存储了大量事实型知识,其内部存储了实体及实体间的关系信息。中文知识图谱多以RDF(Resource Description Framework)的格式存储数据,一条事实被表示为一个(S,P,O)三元组,形如(subject,predicate,object),其中S和O表示为实体,O有时也表示为属性值,P表示S和O之间的关系。实体链接是解决命名实体歧义问题的一种重要方法,该方法通过将具有歧义的实体指称项链接到给定的知识图谱中从而实现实体歧义的消除。
匹配模块630,用于通过关系模型在所述中文知识图谱中匹配出所述候选实体的候选关系。
具体地,关系模板通过自然语言理解技术理解用户的输入信息(如问句)所表达的语义,并与中文知识图谱中的三元组(S,P,O)中的关系P进行匹配,以此确定该输入信息所表达的语义与中文知识图谱中对应的候选关系。其中,关系模板是通过在中文知识图谱中提取一些三元组,并从这些三元组中提取关系信息,从而根据这些关系信息经过训练得到与这些关系信息对应的关系模板。
形成模块640,用于根据所述实体对和所述候选关系,形成候选三元组;其中所述候选 三元组包括所述命名实体、所述候选实体及所述候选关系。
具体地,利用上述步骤所识别出的命名实体,该命名实体在中文知识图谱中对应的候选实体以及候选关系,从而形成各候选三元组。
第二获取模块650,用于基于学习排序模型获取各所述候选三元组相对应的排序结果。
具体地,将各候选三元组作为学习排序模型的输入,经过该学习排序模型的一系列计算,从而输出与各候选三元组相对应的排序结果。其中排序结果可根据排名越靠前越准确,排名越靠后越不准确的规则进行排列,也可以是其他方式,本实施例不做限定。
其中,学习排序模型是利用学习排序算法进行计算的。学习排序算法(Learning to rank,LTR)是一种监督学习(Supervised Learning,SL)的排序方法。LTR一般说来有三类方法:单文档方法(Pointwise),文档对方法(Pairwise),文档列表方法(Listwise)。本实施例中学习排序算法采用的是文档对方法(Pairwise)。
在一实施例中,学习排序模型是通过训练第一样本和各候选三元组所形成的第二样本得到的,其中第一样本是由所输入信息的标准答案构成的三元组。例如,根据一个问句的标准答案的三元组(命名实体、候选实体、候选关系),在中文知识图谱中随机采取10个候选实体,在依据这个候选实体采集候选关系,最终得到50个三元组(命名实体、候选实体、候选关系)组成的负样本(N)。其中该标准答案的三元组(命名实体、候选实体、候选关系)为正样本(P)。将该正样本(P)与负样本(N)中的一个结合,生成两个样本,即即(P,N)样本和(N,P)样本。(P,N)样本的标签为1,(N,P)样本的标签为0。学习排序模型可根据该样本训练并得到。
第三获取模块660,用于根据所述排序结果查询所述中文知识图谱,以获取所述输入信息的答案。
具体地,根据各候选三元组的排序结果,选取排名在预设值之前的候选三元组,再将所选取的这些候选三元组转换为中文知识图谱中的查询语言,从而在中文知识图谱中执行该查询语句,查询后返回该输入信息对应的答案。
另外,请参考图7,基于学习排序的中文知识图谱的问答装置600还包括线下模块700,该线下模块700用于为上述问答装置的运行做准备。
线下模块700包括实体提及率单元710、同义词收集单元720、上下文挖掘单元730、问题模板单元740及学习排序单元750。
实体提及率单元710用于对中文知识图谱中的候选实体被提及次数进行打分。具体地,给中文知识图谱中的候选实体进行提及率打分,其中该提及率表示该候选实体受用户的关注程度。这一部分可以借助已经做好的提及率排名(例如:患者最关心的药品排行榜),也可以通过爬取网上用户提问,计算实体被用户提及的频率。
同义词收集单元720用于收集中文知识图谱中每个候选关系的关系名称,其中关系名称包括标准名称和标准名称的同义词。
具体地,中文知识图谱中每个候选关系有一个标准名称,例如“xx药品治疗xx疾病” 这个关系,标准名称叫……适应症……,但是由于中文自然语言的多样性,用户可能会说“xx药主治什么”,“xx药功能是什么”等等。所以需要收集关系名称(或者关系谓词)的同义词。该同义词收集单元612用于收集中文知识图谱中每个候选关系的关系名称,这个关系名称包括标准名称和该标准名称的同义词,从而确保后期问答的准确性。
上下文挖掘单元730用于基于文本挖掘方法,找出中文知识图谱中两个候选实体之间的连接关系。具体地,上下文挖掘单元完全基于远程监督的文本挖掘。在两个候选实体间的连接关系(考虑最长2跳的三元组事实)可能有多个。在专业领域的文本集中,找到这两个候选实体同时出现的一句话,对这句话做依存语法树分析,如果这两个实体在依存语法树上的最小路径长度小于等于4,则这个最短路径上的词就作为这两个候选实体间关系(可能有多个)的上下文词(如果这个词不是关系的同义词的话)。专业领域一般文本资料(如专业文献等)数据充分,但是问答语料(特别是适宜于目前知识图谱的问答语料)可能相对稀缺。通过文本挖掘,可以为该问答装置提供大量语境信息,从而有效的利用外部资源。
问题模板单元740用于将问句划分为预先定义的问句形式。具体地,将问句按照预先定义的问句形式进行划分,这样在中文知识图谱中的搜索也更方便、更高效。这一步可以规定所比较的关系空间是距离主体实体两跳或者三跳之内。
学习排序单元750用于根据问句获取训练数据。具体地,学习排序单元根据问句获取到训练数据,基于pairwise learning to rank这样一个排序算法。虽然可能问答预料数据较少,但是却可以通过生成负样本的方式扩大训练数据,得到一个性能较好的问答模型。
其中,请参考图8,同义词收集单元720包括标注子单元721、记录频率子单元722及人工审核子单元723。
标注子单元721用于标注问句中的实体与知识图谱中候选实体的关系。记录频率子单元722用于去掉问句中的实体名称、停用词及标点符号,获取剩余词语,利用词频-逆文本频率方法,对所述剩余词语打分,获取剩余词语的分值,记录分值超过预设值的剩余词语。
具体地,记录频率子单元722通过去掉问句中的实体名称、停用词及标点符号后,获取到问句中的剩余词语,利用词频-逆文本频率方法,对该剩余词语打分,收集分数较高的词语,比如分值排名前十五的词语。
其中,TF-IDF是Term Frequency-Inverse Document Frequency的缩写,即“词频-逆文本频率”。它由两部分组成,TF和IDF。TF表示的是词频,之前的向量化也就是做了文本中各个词的出现频率统计,并作为文本特征。IDF,即“逆文本频率”。有些词的词频很高但重要性却很低,IDF就是来帮助我们来反应这个词的重要性,进而修正仅仅用词频表示的词特征值。
概括来讲,IDF反映了一个词在所有文本中出现的频率,如果一个词在很多的文本中出现,那么它的IDF值应该低,比如“我”字。而反过来如果一个词在比较少的文本中出现,那么它的IDF值应该高。比如一些专业的名词如“机器学习”。这样的词IDF值应该高。一个极端的情况,如果一个词在所有的文本中都出现,那么它的IDF值应该为0。
人工审核子单元823用于手动过滤掉每个关系下不合理的同义词。具体地,如希望同义词集较为准确,可以做一定的人工审核,即每类关系下,手动过滤掉不合理的同义词。
本申请还提供一种计算机设备,如可以执行程序的智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。本实施例的计算机设备至少包括但不限于:可通过装置总线相互通信连接的存储器、处理器等。
本实施例还提供一种计算机可读存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被处理器执行时实现相应功能。本实施例的计算机可读存储介质用于存储电子装置20,被处理器执行时实现本申请的问答方法。所述计算机可读存储介质可以是非易失性,也可以是易失性。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种问答方法,其中,所述问答方法包括:
    获取用户的输入信息;
    识别所述输入信息中的命名实体,并将所述命名实体链接至所述中文知识图谱中与所述命名实体相对应的候选实体,形成实体对,其中所述实体对包括所述命名实体和所述候选实体;
    通过关系模板在所述中文知识图谱中匹配出所述候选实体的候选关系;
    根据所述实体对和所述候选关系,形成候选三元组,其中所述候选三元组包括所述命名实体、所述候选实体及所述候选关系;
    基于学习排序模型获取各所述候选三元组相对应的排序结果;及
    根据所述排序结果查询所述中文知识图谱,以获取所述输入信息的答案。
  2. 如权利要求1所述的问答方法,其中,所述识别所述输入信息中的命名实体具体包括:
    对所述输入信息进行标注,获取标注结果;及根据所述标注结果,通过循环神经网络模型识别出所述输入信息中的命名实体。
  3. 如权利要求1所述的问答方法,其中,所述识别所述输入信息中的命名实体,并将所述命名实体链接至所述中文知识图谱中与所述命名实体相对应的候选实体,形成实体对的步骤之后,所述问答方法还包括:
    计算各所述实体对中所述命名实体与所述候选实体之间的相似度,其中所述相似度是根据汉字字符相似度、拼音字符相似度、词向量相似度及实体受关注度得到的;
    根据各所述相似度对各实体对排序,以获取各所述实体对对应的排列名次;及
    根据所述排列名次选取相对应的所述实体对。
  4. 如权利要求1所述的问答方法,其中,所述关系模板包括第一实体、第二实体及第一实体与第二实体之间的关系。
  5. 如权利要求1所述的问答方法,其中,所述基于学习排序模型,获取各所述候选三元组相对应的排序结果具体包括:
    计算各所述三元组相对应的各特征向量;及
    将各所述特征向量输入至所述学习排序模型中以获取各所述候选三元组相对应的排序结果。
  6. 如权利要求5所述的问答方法,其中,所述计算各所述三元组的各特征向量的步骤包括:
    根据所述三元组计算所述命名实体与所述候选实体之间的第一相似度特征;
    除去所述输入信息中的所述命名实体以获取剩余词语,并计算所述剩余词语与同义词以及上下文词汇之间的第二相似度特征;
    根据所述输入信息生成高维向量,其中所述高维向量是依据所述输入信息中是否存在预 设词汇生成的;及
    根据所述第一相似度特征、所述第二相似度特征和所述高维向量,生成所述特征向量。
  7. 如权利要求1所述的问答方法,其中,所述学习排序模型是通过训练第一样本和各所述候选三元组所形成的第二样本得到的,其中第一样本是由所述输入信息的标准答案构成的三元组。
  8. 一种电子装置,其中,所述装置包括:
    第一获取模块,用于获取用户的输入信息;
    识别及链接模块,用于识别所述输入信息中的命名实体,并将所述命名实体链接至所述中文知识图谱中与所述命名实体相对应的候选实体,形成实体对,其中所述实体对包括所述命名实体和所述候选实体;
    匹配模块,用于通过关系模型在所述中文知识图谱中匹配出所述候选实体的候选关系;
    形成模块,用于根据所述实体对和所述候选关系,形成候选三元组;其中所述候选三元组包括所述命名实体、所述候选实体及所述候选关系;
    第二获取模块,用于基于学习排序模型获取各所述候选三元组相对应的排序结果;及
    第三获取模块,用于根据所述排序结果查询所述中文知识图谱,以获取所述输入信息的答案。
  9. 一种设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现问答方法,所述问答方法方法具体包括如下步骤:
    获取用户的输入信息;
    识别所述输入信息中的命名实体,并将所述命名实体链接至所述中文知识图谱中与所述命名实体相对应的候选实体,形成实体对,其中所述实体对包括所述命名实体和所述候选实体;
    通过关系模板在所述中文知识图谱中匹配出所述候选实体的候选关系;
    根据所述实体对和所述候选关系,形成候选三元组,其中所述候选三元组包括所述命名实体、所述候选实体及所述候选关系;
    基于学习排序模型获取各所述候选三元组相对应的排序结果;及
    根据所述排序结果查询所述中文知识图谱,以获取所述输入信息的答案。
  10. 如权利要求9所述的设备,其中,所述识别所述输入信息中的命名实体具体包括:
    对所述输入信息进行标注,获取标注结果;及根据所述标注结果,通过循环神经网络模型识别出所述输入信息中的命名实体。
  11. 如权利要求9所述的设备,其中,所述识别所述输入信息中的命名实体,并将所述命名实体链接至所述中文知识图谱中与所述命名实体相对应的候选实体,形成实体对的步骤之后,所述问答方法还包括:
    计算各所述实体对中所述命名实体与所述候选实体之间的相似度,其中所述相似度是根 据汉字字符相似度、拼音字符相似度、词向量相似度及实体受关注度得到的;
    根据各所述相似度对各实体对排序,以获取各所述实体对对应的排列名次;及
    根据所述排列名次选取相对应的所述实体对。
  12. 如权利要求9所述的设备,其中,所述关系模板包括第一实体、第二实体及第一实体与第二实体之间的关系。
  13. 如权利要求9所述的设备,其中,所述基于学习排序模型,获取各所述候选三元组相对应的排序结果具体包括:
    计算各所述三元组相对应的各特征向量;及
    将各所述特征向量输入至所述学习排序模型中以获取各所述候选三元组相对应的排序结果。
  14. 如权利要求13所述的设备,其中,所述计算各所述三元组的各特征向量的步骤包括:
    根据所述三元组计算所述命名实体与所述候选实体之间的第一相似度特征;
    除去所述输入信息中的所述命名实体以获取剩余词语,并计算所述剩余词语与同义词以及上下文词汇之间的第二相似度特征;
    根据所述输入信息生成高维向量,其中所述高维向量是依据所述输入信息中是否存在预设词汇生成的;及
    根据所述第一相似度特征、所述第二相似度特征和所述高维向量,生成所述特征向量。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现问答方法,其中,所述问答方法具体包括如下步骤:
    获取用户的输入信息;
    识别所述输入信息中的命名实体,并将所述命名实体链接至所述中文知识图谱中与所述命名实体相对应的候选实体,形成实体对,其中所述实体对包括所述命名实体和所述候选实体;
    通过关系模板在所述中文知识图谱中匹配出所述候选实体的候选关系;
    根据所述实体对和所述候选关系,形成候选三元组,其中所述候选三元组包括所述命名实体、所述候选实体及所述候选关系;
    基于学习排序模型获取各所述候选三元组相对应的排序结果;及
    根据所述排序结果查询所述中文知识图谱,以获取所述输入信息的答案。
  16. 如权利要求15所述的计算机可读存储介质,其中,所述识别所述输入信息中的命名实体具体包括:
    对所述输入信息进行标注,获取标注结果;及根据所述标注结果,通过循环神经网络模型识别出所述输入信息中的命名实体。
  17. 如权利要求15所述的计算机可读存储介质,其中,所述识别所述输入信息中的命名实体,并将所述命名实体链接至所述中文知识图谱中与所述命名实体相对应的候选实体, 形成实体对的步骤之后,所述问答方法还包括:
    计算各所述实体对中所述命名实体与所述候选实体之间的相似度,其中所述相似度是根据汉字字符相似度、拼音字符相似度、词向量相似度及实体受关注度得到的;
    根据各所述相似度对各实体对排序,以获取各所述实体对对应的排列名次;及
    根据所述排列名次选取相对应的所述实体对。
  18. 如权利要求15所述的计算机可读存储介质,其中,所述关系模板包括第一实体、第二实体及第一实体与第二实体之间的关系。
  19. 如权利要求15所述的计算机可读存储介质,其中,所述基于学习排序模型,获取各所述候选三元组相对应的排序结果具体包括:
    计算各所述三元组相对应的各特征向量;及
    将各所述特征向量输入至所述学习排序模型中以获取各所述候选三元组相对应的排序结果。
  20. 如权利要求19所述的计算机可读存储介质,其中,所述计算各所述三元组的各特征向量的步骤包括:
    根据所述三元组计算所述命名实体与所述候选实体之间的第一相似度特征;
    除去所述输入信息中的所述命名实体以获取剩余词语,并计算所述剩余词语与同义词以及上下文词汇之间的第二相似度特征;
    根据所述输入信息生成高维向量,其中所述高维向量是依据所述输入信息中是否存在预设词汇生成的;及
    根据所述第一相似度特征、所述第二相似度特征和所述高维向量,生成所述特征向量。
PCT/CN2020/093141 2019-07-03 2020-05-29 问答方法、问答装置、计算机设备及存储介质 WO2021000676A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910593110.6 2019-07-03
CN201910593110.6A CN110502621B (zh) 2019-07-03 2019-07-03 问答方法、问答装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021000676A1 true WO2021000676A1 (zh) 2021-01-07

Family

ID=68585335

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093141 WO2021000676A1 (zh) 2019-07-03 2020-05-29 问答方法、问答装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN110502621B (zh)
WO (1) WO2021000676A1 (zh)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749268A (zh) * 2021-01-30 2021-05-04 云知声智能科技股份有限公司 基于混合策略的faq系统排序方法、装置及系统
CN112818031A (zh) * 2021-01-26 2021-05-18 国网江苏省电力有限公司营销服务中心 基于nlp中文分词技术的潜在高耗能企业挖掘方法、系统及存储介质
CN113128231A (zh) * 2021-04-25 2021-07-16 深圳市慧择时代科技有限公司 一种数据质检方法、装置、存储介质和电子设备
CN113127626A (zh) * 2021-04-22 2021-07-16 广联达科技股份有限公司 基于知识图谱的推荐方法、装置、设备及可读存储介质
CN113157935A (zh) * 2021-03-16 2021-07-23 中国科学技术大学 基于关系上下文进行实体对齐的图神经网络模型及方法
CN113377923A (zh) * 2021-06-25 2021-09-10 北京百度网讯科技有限公司 语义检索方法、装置、设备、存储介质以及计算机程序产品
CN113449119A (zh) * 2021-06-30 2021-09-28 珠海金山办公软件有限公司 一种构建知识图谱的方法、装置、电子设备及存储介质
CN113505586A (zh) * 2021-06-07 2021-10-15 中电鸿信信息科技有限公司 一种融合语义分类与知识图谱的坐席辅助问答方法与系统
CN113515630A (zh) * 2021-06-10 2021-10-19 深圳数联天下智能科技有限公司 三元组生成和校验方法、装置、电子设备和存储介质
CN113590783A (zh) * 2021-07-28 2021-11-02 复旦大学 基于nlp自然语言处理的中医养生智能问答系统
CN113704494A (zh) * 2021-08-27 2021-11-26 北京百度网讯科技有限公司 基于知识图谱的实体检索方法、装置、设备以及存储介质
CN113761167A (zh) * 2021-09-09 2021-12-07 上海明略人工智能(集团)有限公司 一种会话信息抽取方法、系统、电子设备及存储介质
CN113946651A (zh) * 2021-09-27 2022-01-18 盛景智能科技(嘉兴)有限公司 维修知识推荐方法、装置、电子设备、介质及产品
US11526688B2 (en) * 2020-04-16 2022-12-13 International Business Machines Corporation Discovering ranked domain relevant terms using knowledge
CN116089587A (zh) * 2023-02-20 2023-05-09 星环信息科技(上海)股份有限公司 答案生成方法、装置、设备及存储介质
CN116955592A (zh) * 2023-07-21 2023-10-27 广州拓尔思大数据有限公司 基于可视化推理结果的数据处理方法及系统

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502621B (zh) * 2019-07-03 2023-06-13 平安科技(深圳)有限公司 问答方法、问答装置、计算机设备及存储介质
CN112925887A (zh) * 2019-12-05 2021-06-08 北京四维图新科技股份有限公司 交互方法和装置、电子设备、存储介质、文本识别方法
CN112948569A (zh) * 2019-12-10 2021-06-11 中国石油天然气股份有限公司 基于活动知识图谱的科学工作流图版推送方法及装置
CN111883230B (zh) * 2019-12-18 2024-05-07 深圳数字生命研究院 饮食数据的生成方法及装置、存储介质和电子装置
CN111259653B (zh) * 2020-01-15 2022-06-24 重庆邮电大学 基于实体关系消歧的知识图谱问答方法、系统以及终端
CN111368042A (zh) * 2020-02-13 2020-07-03 平安科技(深圳)有限公司 智能问答方法、装置、计算机设备及计算机存储介质
CN111353298A (zh) * 2020-02-17 2020-06-30 杭州网易再顾科技有限公司 字符序列生成方法、装置、设备及计算机可读存储介质
CN111339269B (zh) * 2020-02-20 2023-09-26 来康科技有限责任公司 模板自动生成的知识图谱问答训练及应用服务系统
CN111368048A (zh) * 2020-02-26 2020-07-03 京东方科技集团股份有限公司 信息获取方法、装置、电子设备及计算机可读存储介质
CN111753055B (zh) * 2020-06-28 2024-01-26 中国银行股份有限公司 一种客户问答自动提示方法及装置
CN112100356A (zh) * 2020-09-17 2020-12-18 武汉纺织大学 一种基于相似性的知识库问答实体链接方法及系统
CN112182178A (zh) * 2020-09-25 2021-01-05 北京字节跳动网络技术有限公司 智能问答方法、装置、设备及可读存储介质
CN111950303B (zh) * 2020-10-19 2021-01-08 平安科技(深圳)有限公司 医疗文本翻译方法、装置及存储介质
CN112328759A (zh) * 2020-10-29 2021-02-05 平安科技(深圳)有限公司 自动问答方法、装置、设备及存储介质
CN114444505A (zh) * 2020-10-30 2022-05-06 北京金山数字娱乐科技有限公司 文本处理方法及装置
CN112579752A (zh) * 2020-12-10 2021-03-30 上海明略人工智能(集团)有限公司 实体关系的抽取方法及装置、存储介质、电子设备
CN112733508B (zh) * 2021-03-30 2021-06-18 中国电子技术标准化研究院 标准文本标注、标准图谱构建方法及装置
CN113495964B (zh) * 2021-04-28 2024-02-23 中国科学技术大学 三元组的筛选方法、装置、设备及可读存储介质
CN113361269B (zh) * 2021-06-11 2023-07-18 南京信息工程大学 一种用于文本情感分类的方法
CN113420160A (zh) * 2021-06-24 2021-09-21 竹间智能科技(上海)有限公司 数据处理方法和设备
CN113312854B (zh) * 2021-07-19 2021-11-02 成都数之联科技有限公司 选型推荐方法、装置、电子设备和可读存储介质
CN114510558A (zh) * 2022-01-26 2022-05-17 北京博瑞彤芸科技股份有限公司 一种基于中医知识图谱的问答方法及系统
CN114781387B (zh) * 2022-06-20 2022-09-02 北京惠每云科技有限公司 一种医学命名实体识别方法、装置、电子设备及存储介质
CN116127053B (zh) * 2023-02-14 2024-01-02 北京百度网讯科技有限公司 实体词消歧、知识图谱生成和知识推荐方法以及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748757A (zh) * 2017-09-21 2018-03-02 北京航空航天大学 一种基于知识图谱的问答方法
CN108427707A (zh) * 2018-01-23 2018-08-21 深圳市阿西莫夫科技有限公司 人机问答方法、装置、计算机设备和存储介质
CN109241294A (zh) * 2018-08-29 2019-01-18 国信优易数据有限公司 一种实体链接方法及装置
CN110502621A (zh) * 2019-07-03 2019-11-26 平安科技(深圳)有限公司 问答方法、问答装置、计算机设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9985982B1 (en) * 2015-12-21 2018-05-29 Cisco Technology, Inc. Method and apparatus for aggregating indicators of compromise for use in network security
CN107402954B (zh) * 2017-05-26 2020-07-10 百度在线网络技术(北京)有限公司 建立排序模型的方法、基于该模型的应用方法和装置
CN107832400B (zh) * 2017-11-01 2019-04-16 山东大学 一种基于位置的lstm和cnn联合模型进行关系分类的方法
CN108345702A (zh) * 2018-04-10 2018-07-31 北京百度网讯科技有限公司 实体推荐方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748757A (zh) * 2017-09-21 2018-03-02 北京航空航天大学 一种基于知识图谱的问答方法
CN108427707A (zh) * 2018-01-23 2018-08-21 深圳市阿西莫夫科技有限公司 人机问答方法、装置、计算机设备和存储介质
CN109241294A (zh) * 2018-08-29 2019-01-18 国信优易数据有限公司 一种实体链接方法及装置
CN110502621A (zh) * 2019-07-03 2019-11-26 平安科技(深圳)有限公司 问答方法、问答装置、计算机设备及存储介质

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11526688B2 (en) * 2020-04-16 2022-12-13 International Business Machines Corporation Discovering ranked domain relevant terms using knowledge
CN112818031A (zh) * 2021-01-26 2021-05-18 国网江苏省电力有限公司营销服务中心 基于nlp中文分词技术的潜在高耗能企业挖掘方法、系统及存储介质
CN112818031B (zh) * 2021-01-26 2023-10-27 国网江苏省电力有限公司营销服务中心 基于nlp中文分词技术的潜在高耗能企业挖掘方法、系统及存储介质
CN112749268A (zh) * 2021-01-30 2021-05-04 云知声智能科技股份有限公司 基于混合策略的faq系统排序方法、装置及系统
CN113157935A (zh) * 2021-03-16 2021-07-23 中国科学技术大学 基于关系上下文进行实体对齐的图神经网络模型及方法
CN113157935B (zh) * 2021-03-16 2024-02-27 中国科学技术大学 基于关系上下文进行实体对齐的图神经网络系统及方法
CN113127626A (zh) * 2021-04-22 2021-07-16 广联达科技股份有限公司 基于知识图谱的推荐方法、装置、设备及可读存储介质
CN113127626B (zh) * 2021-04-22 2024-04-30 广联达科技股份有限公司 基于知识图谱的推荐方法、装置、设备及可读存储介质
CN113128231A (zh) * 2021-04-25 2021-07-16 深圳市慧择时代科技有限公司 一种数据质检方法、装置、存储介质和电子设备
CN113505586A (zh) * 2021-06-07 2021-10-15 中电鸿信信息科技有限公司 一种融合语义分类与知识图谱的坐席辅助问答方法与系统
CN113515630A (zh) * 2021-06-10 2021-10-19 深圳数联天下智能科技有限公司 三元组生成和校验方法、装置、电子设备和存储介质
CN113515630B (zh) * 2021-06-10 2024-04-09 深圳数联天下智能科技有限公司 三元组生成和校验方法、装置、电子设备和存储介质
CN113377923A (zh) * 2021-06-25 2021-09-10 北京百度网讯科技有限公司 语义检索方法、装置、设备、存储介质以及计算机程序产品
CN113377923B (zh) * 2021-06-25 2024-01-09 北京百度网讯科技有限公司 语义检索方法、装置、设备、存储介质以及计算机程序产品
CN113449119A (zh) * 2021-06-30 2021-09-28 珠海金山办公软件有限公司 一种构建知识图谱的方法、装置、电子设备及存储介质
CN113590783B (zh) * 2021-07-28 2023-10-03 复旦大学 基于nlp自然语言处理的中医养生智能问答系统
CN113590783A (zh) * 2021-07-28 2021-11-02 复旦大学 基于nlp自然语言处理的中医养生智能问答系统
CN113704494B (zh) * 2021-08-27 2024-04-05 北京百度网讯科技有限公司 基于知识图谱的实体检索方法、装置、设备以及存储介质
CN113704494A (zh) * 2021-08-27 2021-11-26 北京百度网讯科技有限公司 基于知识图谱的实体检索方法、装置、设备以及存储介质
CN113761167B (zh) * 2021-09-09 2023-10-20 上海明略人工智能(集团)有限公司 一种会话信息抽取方法、系统、电子设备及存储介质
CN113761167A (zh) * 2021-09-09 2021-12-07 上海明略人工智能(集团)有限公司 一种会话信息抽取方法、系统、电子设备及存储介质
CN113946651A (zh) * 2021-09-27 2022-01-18 盛景智能科技(嘉兴)有限公司 维修知识推荐方法、装置、电子设备、介质及产品
CN113946651B (zh) * 2021-09-27 2024-05-10 盛景智能科技(嘉兴)有限公司 维修知识推荐方法、装置、电子设备、介质及产品
CN116089587A (zh) * 2023-02-20 2023-05-09 星环信息科技(上海)股份有限公司 答案生成方法、装置、设备及存储介质
CN116089587B (zh) * 2023-02-20 2024-03-01 星环信息科技(上海)股份有限公司 答案生成方法、装置、设备及存储介质
CN116955592A (zh) * 2023-07-21 2023-10-27 广州拓尔思大数据有限公司 基于可视化推理结果的数据处理方法及系统
CN116955592B (zh) * 2023-07-21 2024-02-09 广州拓尔思大数据有限公司 基于可视化推理结果的数据处理方法及系统

Also Published As

Publication number Publication date
CN110502621B (zh) 2023-06-13
CN110502621A (zh) 2019-11-26

Similar Documents

Publication Publication Date Title
WO2021000676A1 (zh) 问答方法、问答装置、计算机设备及存储介质
US10698977B1 (en) System and methods for processing fuzzy expressions in search engines and for information extraction
CN108875051B (zh) 面向海量非结构化文本的知识图谱自动构建方法及系统
CN110059160B (zh) 一种端到端的基于上下文的知识库问答方法及装置
CN109472033B (zh) 文本中的实体关系抽取方法及系统、存储介质、电子设备
CN111950285B (zh) 多模态数据融合的医疗知识图谱智能自动构建系统和方法
JP5936698B2 (ja) 単語意味関係抽出装置
US11080295B2 (en) Collecting, organizing, and searching knowledge about a dataset
Bordes et al. Open question answering with weakly supervised embedding models
CN112035730B (zh) 一种语义检索方法、装置及电子设备
Zubrinic et al. The automatic creation of concept maps from documents written using morphologically rich languages
US9514098B1 (en) Iteratively learning coreference embeddings of noun phrases using feature representations that include distributed word representations of the noun phrases
US20150081277A1 (en) System and Method for Automatically Classifying Text using Discourse Analysis
US20160275073A1 (en) Semantic parsing for complex knowledge extraction
US20210117625A1 (en) Semantic parsing of natural language query
US20220277005A1 (en) Semantic parsing of natural language query
WO2021146831A1 (zh) 实体识别的方法和装置、建立词典的方法、设备、介质
US9720962B2 (en) Answering superlative questions with a question and answer system
US20220405484A1 (en) Methods for Reinforcement Document Transformer for Multimodal Conversations and Devices Thereof
CN112328800A (zh) 自动生成编程规范问题答案的系统及方法
CN111400584A (zh) 联想词的推荐方法、装置、计算机设备和存储介质
CN114153994A (zh) 医保信息问答方法及装置
Orellana et al. A text mining methodology to discover syllabi similarities among higher education institutions
CN116562280A (zh) 一种基于通用信息抽取的文献分析系统及方法
Rousseau Graph-of-words: mining and retrieving text with networks of features

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20835406

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20835406

Country of ref document: EP

Kind code of ref document: A1