WO2021169718A1 - 信息获取方法、装置、电子设备及计算机可读存储介质 - Google Patents

信息获取方法、装置、电子设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2021169718A1
WO2021169718A1 PCT/CN2021/074046 CN2021074046W WO2021169718A1 WO 2021169718 A1 WO2021169718 A1 WO 2021169718A1 CN 2021074046 W CN2021074046 W CN 2021074046W WO 2021169718 A1 WO2021169718 A1 WO 2021169718A1
Authority
WO
WIPO (PCT)
Prior art keywords
question
answered
text
entity
answer
Prior art date
Application number
PCT/CN2021/074046
Other languages
English (en)
French (fr)
Inventor
王炳乾
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US17/425,045 priority Critical patent/US20230169100A1/en
Publication of WO2021169718A1 publication Critical patent/WO2021169718A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Definitions

  • the present disclosure relates to the field of natural language processing technology, and in particular to an information acquisition method, device, electronic equipment, and computer-readable storage medium.
  • Question answering system is one of the current research hotspots of natural language processing.
  • An important step in the question answering system is the question entity chain finger, the result of the chain finger directly affects the performance of the question answering system.
  • the traditional question entity connection method is mainly completed by two steps, namely named entity recognition and entity connection.
  • Entity recognition is currently mainly based on Conditional Random Field (CRF) or Bidirectional Long Short-term Memory CRF (BLSTM CRF) and other methods.
  • CRF Conditional Random Field
  • BSSTM CRF Bidirectional Long Short-term Memory CRF
  • the entity chain finger mainly uses classification methods and Similarity calculation and other methods.
  • the classification method needs to select candidate entities first, and use classic machine learning methods or neural network methods for classification.
  • the present disclosure provides an information acquisition method, device, electronic equipment, and computer-readable storage medium to solve the problems of requiring a large number of manual templates, time-consuming and labor-intensive, lack of flexibility, and poor scalability in related technologies.
  • an information acquisition method including:
  • the determining the target answer of the question to be answered according to the retrieval text in the form of the target subgraph includes:
  • a target answer of the question to be answered is determined from the at least one candidate answer.
  • the identifying at least one entity search term in the question to be answered includes:
  • the at least one entity search term is determined.
  • the performing information retrieval based on the at least one entity search term to obtain the retrieval text in the form of a subgraph corresponding to the at least one entity search term includes:
  • the at least one entity search term is associated with the plurality of initial search texts in the form of subgraphs to obtain the search text in the form of subgraphs.
  • the matching the search text in the form of a subgraph with the question to be answered to determine the search text in the form of a target subgraph includes:
  • the determining at least one candidate answer corresponding to the question to be answered according to the retrieval text in the form of the target subgraph includes:
  • the retrieval text in the form of the target sub-picture is disassembled to obtain the at least one candidate answer.
  • the obtaining the similarity corresponding to the at least one candidate answer and the question to be answered includes:
  • the obtaining the similarity corresponding to the at least one candidate answer and the question to be answered includes:
  • the similarity matching between the at least one candidate answer and the question to be answered is performed through the cosine similarity calculation model, and the similarity between the at least one candidate answer and the question to be answered is determined.
  • the determining the target answer of the question to be answered from the at least one candidate answer according to the similarity includes:
  • an electronic device including:
  • the determining the target answer of the question to be answered according to the retrieval text in the form of the target subgraph includes:
  • a target answer of the question to be answered is determined from the at least one candidate answer.
  • the identifying at least one entity search term in the question to be answered includes:
  • the at least one entity search term is determined.
  • the performing information retrieval based on the at least one entity search term to obtain the retrieval text in the form of a subgraph corresponding to the at least one entity search term includes:
  • the at least one entity search term is associated with the plurality of initial search texts in the form of subgraphs to obtain the search text in the form of subgraphs.
  • the matching the search text in the form of a subgraph with the question to be answered to determine the search text in the form of a target subgraph includes:
  • the determining at least one candidate answer corresponding to the question to be answered according to the retrieval text in the form of the target subgraph includes:
  • the retrieval text in the form of the target sub-picture is disassembled to obtain the at least one candidate answer.
  • the obtaining the similarity corresponding to the at least one candidate answer and the question to be answered includes:
  • the present disclosure provides a non-volatile computer-readable storage medium.
  • the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can perform the following operations:
  • the determining the target answer of the question to be answered according to the retrieval text in the form of the target subgraph includes:
  • a target answer of the question to be answered is determined from the at least one candidate answer.
  • the identifying at least one entity search term in the question to be answered includes:
  • the at least one entity search term is determined.
  • the performing information retrieval based on the at least one entity search term to obtain the retrieval text in the form of a subgraph corresponding to the at least one entity search term includes:
  • the at least one entity search term is associated with the plurality of initial search texts in the form of subgraphs to obtain the search text in the form of subgraphs.
  • the present disclosure provides a computer program product, including computer-readable code, which when the computer-readable code runs on an electronic device, causes the electronic device to perform any of the above-mentioned information acquisition method.
  • Fig. 1 shows a flow chart of the steps of an information acquisition method provided by an embodiment of the present disclosure
  • Figure 2 shows a flowchart of another method for obtaining information provided by an embodiment of the present disclosure
  • FIG. 3 shows a schematic diagram of a question answering system provided by an embodiment of the present disclosure
  • FIG. 4 shows a schematic diagram of an entity labeling example provided by an embodiment of the present disclosure
  • FIG. 5 shows a schematic diagram of an entity recognition model provided by an embodiment of the present disclosure
  • FIG. 6 shows a schematic diagram of entity subgraph information provided by an embodiment of the present disclosure
  • Fig. 7 shows a schematic diagram of a bert-based subgraph matching algorithm provided by an embodiment of the present disclosure
  • FIG. 8 shows a schematic diagram of disassembling a sub-picture provided by an embodiment of the present disclosure
  • FIG. 9 shows a schematic diagram of text similarity matching provided by an embodiment of the present disclosure.
  • FIG. 10 shows a schematic diagram of a joint learning model provided by an embodiment of the present disclosure
  • FIG. 11 shows a schematic structural diagram of an information acquisition device provided by an embodiment of the present disclosure
  • FIG. 12 shows a schematic structural diagram of another information acquisition device provided by an embodiment of the present disclosure.
  • FIG. 13 schematically shows a block diagram of an electronic device for executing the method according to the present disclosure.
  • Fig. 14 schematically shows a storage unit for holding or carrying program codes for implementing the method according to the present disclosure.
  • the information acquisition method may specifically include the following steps:
  • Step 101 Identify at least one entity search term in the question to be answered.
  • the embodiments of the present disclosure can be applied to a question and answer system to obtain the answer corresponding to the question to be answered.
  • the question answering system can be described in conjunction with Figure 3 as follows.
  • FIG. 3 a schematic diagram of a question and answer system provided by an embodiment of the present disclosure is shown.
  • the question to be answered "Q: In which year Xu Beihong's eight horses were created"
  • the information in the knowledge graph is in the form of subgraphs)
  • entity disambiguation is carried out through subgraph matching, and the non-retrieved information is removed to obtain the eight horses graph ( Xu Beihong) corresponds to the sub-picture information, and matches the entity information with the text similarity of the question to be answered to obtain the final answer.
  • the question to be answered refers to the question used to obtain the corresponding answer from the knowledge graph.
  • the question to be answered may be a question input by the user. For example, when user A needs to obtain an answer to a certain question, he can input the corresponding question in the knowledge graph, so as to obtain the corresponding question to be answered.
  • the question to be answered may also be a question obtained from the Internet. For example, it may be possible to obtain which questions the user is interested in, and regard the question that the user is more interested in as the question to be answered.
  • the entity search term refers to the entity term used for information retrieval in the question to be answered.
  • the entity search term in the question to be answered can be obtained by means of pointer labeling, and the specific method for obtaining the entity search term will be described below. The detailed description is given in the above embodiments, and the details of the embodiments of the present disclosure are not repeated here.
  • the question to be answered can be identified, so as to obtain at least one entity search term contained in the question to be answered.
  • the question to be answered is: in which year Xu Beihong's Eight Horses was created, and the entities included: Xu Beihong, Eight Horses.
  • step 102 After identifying at least one entity search term in the question to be answered, step 102 is executed.
  • Step 102 Perform information retrieval according to the at least one entity search term to obtain a search text in the form of a subgraph corresponding to the at least one entity search term.
  • the search text in the form of subgraph refers to the search result text obtained by using at least one entity search term to perform information search on the knowledge graph.
  • the entity search term After identifying at least one entity search term in the question to be answered, the entity search term can be used for information retrieval in the knowledge graph, and further, a search text in the form of a subgraph corresponding to each entity search term can be obtained.
  • step 103 After information retrieval is performed according to at least one entity retrieval term, and the retrieval text in the form of a subgraph corresponding to the at least one entity retrieval term is obtained, step 103 is executed.
  • Step 103 Match the retrieval text in the form of the sub-picture with the question to be answered, and determine the retrieval text in the form of the target sub-picture.
  • the search text in the form of a target subgraph refers to a search text in the form of a subgraph selected from at least one entity search term that matches the question to be answered. That is, in this step, entity disambiguation is realized, and the search text in the form of subgraphs that does not match the question to be answered is removed, so that the final search text that matches the question to be answered can be obtained, which is the search text in the form of target subgraph.
  • the search text in the form of a subgraph can be matched with the question to be answered. According to the matching result, the question to be answered can be determined from at least one entity search term.
  • the search text in the form of the matched target subgraph The process of matching and determining the retrieval text in the form of the target sub-picture will be described in detail in the following embodiments, which will not be repeated here in the embodiments of the present disclosure.
  • step 104 After matching the retrieval text in the form of a sub-picture with the question to be answered, and determining the retrieval text in the form of a target sub-picture, step 104 is executed.
  • Step 104 Determine the target answer of the question to be answered according to the search text in the form of the target sub-picture.
  • This step 104 may include the following steps 104a, 104b, and 104c:
  • Step 104a Determine at least one candidate answer corresponding to the question to be answered according to the retrieval text in the form of the target sub-picture.
  • Candidate answer refers to the candidate selected as the answer to the question to be answered from the retrieval text in the form of the target subgraph.
  • At least one candidate answer to the question to be answered can be determined according to the search text in the form of the target subgraph.
  • the search text in the form of the target subgraph can be After disassembling, at least one candidate answer can be obtained.
  • FIG. 8 a schematic diagram of subgraph disassembling provided by an embodiment of the present disclosure is shown. As shown in Fig. 8, after disassembling the left half of Fig.
  • step 104b After determining at least one candidate answer corresponding to the question to be answered according to the search text in the form of the target sub-picture, step 104b is executed.
  • Step 140b Obtain the similarity between the at least one candidate answer and the question to be answered.
  • Similarity refers to the degree of similarity between at least one candidate answer and the question to be answered.
  • the similarity can reflect which candidate answers are closer to the question to be answered, and can be used as the standard answer to the question to be answered.
  • the similarity between the at least one candidate answer and the question to be answered can be obtained.
  • the at least one candidate answer can be respectively compared with The question to be answered is input to a preset network model, and the similarity between at least one candidate answer and the question to be answered is identified through the preset network model.
  • step 104c After obtaining the similarity between each candidate answer and the question to be answered, step 104c is executed.
  • Step 104c Determine the target answer of the question to be answered from the at least one candidate answer according to the similarity.
  • the target answer refers to the standard answer to the question to be answered selected from at least one candidate answer, that is, the final selected target answer is used as the accurate answer to the question to be answered.
  • the target answer of the question to be answered can be selected from at least one candidate answer in combination with the similarity of the at least one candidate answer.
  • the target answer of the question to be answered can be selected from at least one candidate answer.
  • the candidate answer with the greatest similarity is selected as the target answer of the question to be answered, or at least one candidate answer whose similarity is greater than the set similarity threshold is selected from at least one candidate answer as the target answer of the question to be answered.
  • it may be determined according to business requirements, which is not limited in the embodiments of the present disclosure.
  • entity disambiguation is performed by adopting a sub-graph matching manner, without the need to construct a template, and the information retrieval efficiency of the question answering system is improved.
  • the information acquisition method identifies at least one entity search term in the question to be answered, performs information retrieval based on the at least one entity search term, and obtains the search text in the form of a subgraph corresponding to the at least one entity search term.
  • the search text in the form of a subgraph is matched with the question to be answered, the search text in the form of the target subgraph is determined, and the target answer of the question to be answered is determined according to the search text in the form of the target subgraph.
  • the embodiment of the present disclosure uses subgraph matching to perform entity disambiguation, and simultaneously realizes the three key tasks of entity recognition, entity disambiguation, and text matching. This method does not require the introduction of external corpus or the construction of templates, which improves the flexibility of the question and answer system And efficiency.
  • the information acquisition method may specifically include the following steps:
  • Step 201 Obtain the question to be answered.
  • the embodiments of the present disclosure can be applied to a question and answer system to obtain the answer corresponding to the question to be answered.
  • the question answering system can be described in conjunction with Figure 3 as follows.
  • FIG. 3 a schematic diagram of a question and answer system provided by an embodiment of the present disclosure is shown.
  • the question to be answered "Q: In which year Xu Beihong's eight horses were created"
  • the information in the knowledge graph is in the form of subgraphs)
  • entity disambiguation is carried out through subgraph matching, and the non-retrieved information is removed to obtain the eight horses graph ( Xu Beihong) corresponds to the sub-picture information, and matches the entity information with the text similarity of the question to be answered to obtain the final answer.
  • the question to be answered refers to the question used to obtain the corresponding answer from the knowledge graph.
  • the question to be answered may be a question input by the user. For example, when user A needs to obtain an answer to a certain question, he can input the corresponding question in the knowledge graph, so as to obtain the corresponding question to be answered.
  • the question to be answered may also be a question obtained from the Internet. For example, it may be possible to obtain which questions the user is interested in, and regard the question that the user is more interested in as the question to be answered.
  • step 202 is executed.
  • Step 202 Input the question to be answered into the first network model for text recognition.
  • the first network model refers to a model used for text recognition of the question to be answered.
  • the first network model may be a bert model or the like.
  • the question to be answered can be input to the first network model, and the first network model performs text recognition of the question to be answered.
  • pointer annotation can be used to implement text recognition.
  • FIG. 4 a schematic diagram of an entity annotation example provided by an embodiment of the present disclosure is shown.
  • two sequence annotations can be used. Mark the start and end positions of the entities in the data respectively, as shown in Figure 4.
  • Question Xu Beihong’s Eight Horses was created in which year? The labeling method of "Xu Beihong” and "Eight Horses" in the middle.
  • the question to be answered can be input to the first network model in a single input manner.
  • the sentence can be encoded as [CLS] Xu Beihong’s Eight Horses In what year was the picture created? [SEP], pass the BERT output code through a fully connected layer, using the Sigmod activation function, and the loss function using the binary cross-entropy loss function.
  • the value at each position of the final output sequence is the confidence of the beginning and ending positions of the entity, here the confidence is taken
  • the position with a degree greater than 0.5 is the start and end position of the entity, and the entity can be obtained by intercepting the corresponding position of the original input text.
  • step 203 After inputting the question to be answered into the first network model for text recognition, step 203 is executed.
  • Step 203 Determine the starting and ending positions of the question to be answered according to the text recognition result.
  • the starting and ending positions refer to the starting and ending positions of labeling in the question to be answered, and the labelled entity words can be determined by the labeling starting and ending positions.
  • the start and end positions of the text of the question to be answered can be obtained according to the text recognition result, as shown in Figure 4, the entity recognition can be performed by means of pointer marking , The specific method is: use two sequences to mark the beginning and end positions of the entities in the data respectively.
  • Figure 4 is the question “Xu Beihong's Eight Horses in which year was created?" in “Xu Beihong” and “Eight Horses” The way of labeling.
  • step 204 After determining the starting and ending positions of the question to be answered according to the text recognition result, step 204 is executed.
  • Step 204 Determine the at least one entity search term according to the start and end positions.
  • the entity search term refers to the entity term used for information retrieval in the question to be answered.
  • the entity words in the question to be answered can be identified according to the starting and ending positions. Chun Tu".
  • step 205 After obtaining at least one entity search term according to the text recognition result, step 205 is executed.
  • Step 205 Use the at least one entity search term to perform a search in a preset knowledge base to obtain multiple initial search texts associated with the at least one entity search term.
  • the preset knowledge base refers to a pre-generated database corresponding to the knowledge graph.
  • all the information of the knowledge graph can be stored in the database in an associated form to obtain the preset knowledge base.
  • the form of a database list can be used, with a certain entity word as an index, and its associated information can be arranged in sequence, so as to form associated information in the form of a subgraph with numerous association relationships.
  • the initial search text refers to the search text obtained by using entity search terms to search in the preset knowledge base.
  • At least one entity search term can be used to search in the preset knowledge base, so that multiple initial search texts associated with each entity search term can be obtained.
  • step 206 After at least one entity search term is used to perform a search in the preset knowledge base, and multiple initial search texts associated with the at least one entity search term are obtained, step 206 is executed.
  • Step 206 Associate the at least one entity search term with the plurality of initial search texts in the form of subgraphs to obtain the search text in the form of subgraphs.
  • the identified entity as the search term to search the knowledge graph. For example, when searching the eight horses, there are two eight horses in the knowledge base.
  • the attributes and relationships of the entity can be obtained from the knowledge graph. They are based on subgraphs. The form exists in the knowledge graph, as shown in Figure 6. In order to distinguish which of the eight horses in the question is from Figure 6, the attributes and relationships of the entity are spliced together with "-" as the description information of the entity.
  • the information corresponding to the eight horses map (Xu Beihong) and the eight horses map (Lang Shining) can be associated, and the retrieval text in the form of sub-images corresponding to the two entities can be obtained, for example, two eight horses
  • the physical descriptions of the pictures are as follows: Author Xu Beihong__Creation time modern__Creation category Ink painting__genre Romanticism__Collection location unknown; author Lang Shining__Creation time Qing Dynasty__Creation category Silk coloring__genre court painting __ Collection of the Palace Museum.
  • step 207 After associating at least one entity search term with a plurality of initial search texts in the form of subgraphs to obtain the search text in the form of subgraphs, step 207 is executed.
  • Step 207 Combine the search text in the form of a sub-picture and the question to be answered into a sentence pair text.
  • a sentence pair refers to a pair of sentence texts composed of two texts.
  • the two texts are "Xu Beihong” and "Zhang Daqian”.
  • the sentence pair composed of these two texts is "Xu Beihong-Zhang Daqian”; for another example, two texts For "landscape painting” and "landscape painting”, the sentence pair composed of these two texts is "landscape painting-landscape painting”.
  • Sentence pair text refers to the sentence pair composed of the search text in the form of subgraph and the question to be answered. That is, after the search text in the form of subgraph corresponding to each entity search term is obtained, the search text in the form of each subgraph is combined with The questions to be answered form a sentence pair, so that the sentence pair text can be obtained.
  • step 208 is executed.
  • Step 208 Input the sentence pair text into the second network model.
  • the second network model refers to a pre-set network model for entity disambiguation of search texts in the form of subgraphs.
  • the second network model can be a bert model, etc., specifically, it can be determined according to business requirements. Embodiments of the present disclosure There is no restriction on this.
  • each sentence-pair text can be input to the second network model.
  • the sentence pair input to BERT is coded as: [ CLS] In what year was Xu Beihong’s painting of the eight horses created? [SEP] Author Xu Beihong __ Creation time modern __ Creation category Ink painting __ Genre romanticism __ Collection location unknown [SEP], you can use "[CLS] Xu Beihong's eight horses picture creation year? [SEP] "Input to the bert model, and use the dense layer and sigmod layer to process the input problem.
  • step 209 is executed.
  • Step 209 Perform entity disambiguation processing on the sentence-to-text through the second network model, and determine the retrieval text in the form of the target subgraph.
  • the search text in the form of a target subgraph refers to a search text in the form of a subgraph selected from at least one entity search term that matches the question to be answered. That is, in this step, entity disambiguation is implemented, and the retrieval text in the form of subgraphs that does not match the question to be answered is removed, so that the final retrieval text that matches the question to be answered can be obtained, which is the retrieval text in the form of the target subgraph.
  • the sentence-to-text entity disambiguation process can be performed on the sentence-to-text through the second network model.
  • the retrieval text in the form of a subgraph and the question to be answered can be semantically analyzed and recognized, thereby Identify the search text in the form of a target subgraph that matches the question to be answered.
  • FIG. 7 a schematic diagram of a bert-based subgraph matching algorithm provided by an embodiment of the present disclosure is shown. As shown in FIG.
  • step 210 After performing entity disambiguation processing on the sentence and text through the second network model, and determining the retrieval text in the form of the target subgraph, step 210 is executed.
  • Step 210 Disassemble the search text in the form of the target sub-picture to obtain the at least one candidate answer.
  • Candidate answer refers to the candidate selected as the answer to the question to be answered from the retrieval text in the form of the target subgraph.
  • the subgraph of the core entity After determining the subgraph of the core entity in the question (that is, the retrieval text in the form of the target subgraph), in order to further determine the answer, the subgraph of the core entity needs to be disassembled according to the relationship and attributes, so that at least one candidate answer can be obtained.
  • FIG. 8 there is shown a schematic diagram of disassembling a sub-picture provided by an embodiment of the present disclosure. As shown in FIG. 8, after disassembling the left half of FIG. Multiple candidates: Xu Beihong, the author of the Eight Horses, the creation time of the Eight Horses in modern times, the collection of the Eight Horses is unknown, the romanticism of the Eight Horses genre, the ink painting of the Eight Horses creation category, etc.
  • step 211 or step 213 is executed.
  • Step 211 Input the at least one candidate answer and the question to be answered into a third network model respectively.
  • the third network model refers to a model used to calculate the similarity between the candidate answer and the question to be answered.
  • the third network model may be a bert model, etc., specifically, it may be determined according to business requirements, which is not limited in the embodiment of the present disclosure.
  • the at least one candidate answer and the question to be answered can be input into the third network model respectively.
  • step 212 After inputting the at least one candidate answer and the question to be answered into the third network model, step 212 is executed.
  • Step 212 Perform similarity matching between the at least one candidate answer and the question to be answered through the third network model, and determine the similarity between the at least one candidate answer and the question to be answered.
  • Similarity refers to the degree of similarity between at least one candidate answer and the question to be answered.
  • the similarity can reflect which candidate answers are closer to the question to be answered, and can be used as the standard answer to the question to be answered.
  • the similarity calculation can be performed on the at least one candidate answer through the third network model.
  • FIG. A schematic diagram of text similarity matching.
  • the question sentence i.e. the question to be answered
  • the relationship/attribute description i.e. candidate answer
  • Similarity matching is performed on the answering question, so as to obtain the similarity between at least one candidate answer and the question to be answered.
  • Step 213 Input the at least one candidate answer and the question to be answered into the cosine similarity calculation model.
  • step 214 is executed.
  • Step 214 Perform similarity matching between the at least one candidate answer and the question to be answered through the cosine similarity calculation model, and determine the similarity between the at least one candidate answer and the question to be answered.
  • the method of calculating the cosine similarity may also be used to calculate the similarity between each candidate answer and the question to be answered, and the embodiment of the present disclosure does not specifically limit the method of calculating the similarity.
  • the three models mentioned in the above steps in the embodiments of the present disclosure can be obtained by means of joint learning, that is, the three tasks mentioned above all use Google’s pre-training BERT model as the feature extractor. Therefore, we consider adopting the joint learning scheme to achieve three Tasks.
  • the entity recognition task is referred to as Task A
  • the subgraph matching task is referred to as Task B
  • the text similarity matching task is referred to as Task C.
  • the cosine similarity objective function in Task C can be changed to a two-category cross-entropy loss function.
  • the present disclosure realizes the three key tasks of entity recognition, entity disambiguation, and text matching by simultaneously using the method of joint learning. The method does not need to introduce external corpus and does not need to construct a template, thereby improving the flexibility and efficiency of the question answering system.
  • step 104c After performing similarity matching between at least one candidate answer and the question to be answered through the third network model, and determining the similarity between each candidate answer and the question to be answered, step 104c is executed.
  • Step 104c Determine the target answer of the question to be answered from the at least one candidate answer according to the similarity.
  • the target answer refers to the standard answer to the question to be answered selected from at least one candidate answer, that is, the final selected target answer is used as the accurate answer to the question to be answered.
  • This step 104c may include the following steps 104d and 104f:
  • Step 104d comparing the similarity with a preset similarity threshold.
  • Step 104f Obtain an answer whose similarity is greater than the similarity threshold from the at least one candidate answer, and use the answer as the target answer.
  • a similarity threshold for comparison with the similarity of at least one candidate answer may be preset by the business personnel.
  • the specific value of the similarity threshold may be determined according to business requirements, which is not limited in the embodiment of the present disclosure.
  • entity disambiguation is performed by adopting a sub-graph matching manner, without the need to construct a template, and the information retrieval efficiency of the question answering system is improved.
  • the information acquisition method identifies at least one entity search term in the question to be answered, performs information retrieval based on the at least one entity search term, and obtains the search text in the form of a subgraph corresponding to the at least one entity search term.
  • the search text in the form of a subgraph is matched with the question to be answered, and the search text in the form of the target subgraph is determined.
  • the search text in the form of the target subgraph at least one candidate answer corresponding to the question to be answered is determined, and at least one candidate answer and the question to be answered are obtained.
  • the target answer of the question to be answered is determined from at least one candidate answer according to the similarity degree.
  • the embodiment of the present disclosure uses subgraph matching to perform entity disambiguation, and simultaneously realizes the three key tasks of entity recognition, entity disambiguation, and text matching. This method does not require the introduction of external corpus or the construction of templates, which improves the flexibility of the question and answer system And efficiency.
  • the information acquisition device may specifically include the following modules:
  • the entity search term recognition module 310 is used to identify at least one entity search term in the question to be answered;
  • the subgraph retrieval text acquisition module 320 is configured to perform information retrieval according to the at least one entity retrieval term to obtain retrieval text in the form of a subgraph corresponding to the at least one entity retrieval term;
  • the target sub-picture text determination module 330 is configured to match the search text in the sub-picture form with the question to be answered, and determine the search text in the target sub-picture form;
  • the target answer determining module 340 is configured to determine the target answer of the question to be answered according to the retrieval text in the form of the target sub-picture.
  • the information acquisition device identifies at least one entity search term in the question to be answered, performs information retrieval based on the at least one entity search term, and obtains the search text in the form of a subgraph corresponding to the at least one entity search term.
  • the search text in the form of a subgraph is matched with the question to be answered, the search text in the form of the target subgraph is determined, and the target answer of the question to be answered is determined according to the search text in the form of the target subgraph.
  • the embodiment of the present disclosure uses subgraph matching to perform entity disambiguation, and simultaneously realizes the three key tasks of entity recognition, entity disambiguation, and text matching.
  • the method does not require the introduction of external corpus or the construction of templates, thereby improving the flexibility of the question answering system And efficiency.
  • the information acquisition device may specifically include the following modules:
  • the entity search term recognition module 410 is used to identify at least one entity search term in the question to be answered;
  • the subgraph retrieval text obtaining module 420 is configured to perform information retrieval according to the at least one entity retrieval term to obtain retrieval text in the form of a subgraph corresponding to the at least one entity retrieval term;
  • the target sub-picture text determining module 430 is configured to match the search text in the sub-picture form with the question to be answered, and determine the search text in the target sub-picture form;
  • the target answer determining module 440 is configured to determine the target answer of the question to be answered according to the retrieval text in the form of the target sub-picture.
  • the target answer determination module 440 includes:
  • the candidate answer determining unit 441 is configured to determine at least one candidate answer corresponding to the question to be answered according to the retrieval text in the form of the target sub-picture;
  • the similarity obtaining unit 442 is configured to obtain the similarity corresponding to the at least one candidate answer and the question to be answered;
  • the target answer determining unit 443 is configured to determine the target answer of the question to be answered from the at least one candidate answer according to the similarity.
  • the entity search term recognition module 410 includes:
  • the question to be answered obtaining unit 411 is configured to obtain the question to be answered
  • the text recognition unit 412 is configured to input the question to be answered into the first network model for text recognition
  • the start and end position determining unit 413 is configured to determine the start and end positions in the question to be answered according to the text recognition result;
  • the entity search term determining unit 414 is configured to determine the at least one entity search term according to the start and end positions.
  • the sub-picture retrieval text obtaining module 420 includes:
  • the initial search text acquisition unit 421 is configured to use the at least one entity search term to search in a preset knowledge base to obtain multiple initial search texts associated with the at least one entity search term;
  • the sub-picture retrieval text acquisition unit 422 is configured to associate the at least one entity search term with the plurality of initial retrieval texts in the form of sub-pictures to obtain the retrieval text in the form of the sub-pictures.
  • the target sub-picture text determining module 430 includes:
  • the sentence pair text composing unit 431 is configured to compose the search text in the form of the sub-picture and the question to be answered into a sentence pair text;
  • the sentence-to-text input unit 432 is configured to input the sentence-to-text into the second network model
  • the target subgraph text determining unit 433 is configured to perform entity disambiguation processing on the sentence-to-text text through the second network model to determine the retrieval text in the form of the target subgraph.
  • the candidate answer determining unit 441 includes:
  • the candidate answer obtaining subunit 4411 is configured to disassemble the search text in the form of the target sub-picture to obtain the at least one candidate answer.
  • the similarity acquisition unit 442 includes:
  • the first candidate answer input subunit 4421 is configured to input the at least one candidate answer and the question to be answered into the third network model respectively;
  • the first similarity determination subunit 4422 is configured to perform similarity matching between the at least one candidate answer and the question to be answered through the third network model, and determine whether the at least one candidate answer is the same as the question to be answered. Similarity.
  • the similarity acquisition unit 442 includes:
  • the second candidate answer input subunit 4423 is configured to input the at least one candidate answer and the question to be answered into the cosine similarity calculation model
  • the second similarity determination subunit 4424 is configured to perform similarity matching between the at least one candidate answer and the question to be answered through the cosine similarity calculation model, and determine that the at least one candidate answer is the same as the question to be answered ⁇ similarity.
  • the target answer determining unit 443 includes:
  • a similarity comparison subunit 4431 configured to compare the similarity with a preset similarity threshold
  • the target answer obtaining subunit 4432 is configured to obtain an answer whose similarity is greater than the similarity threshold from the at least one candidate answer, and use the answer as the target answer.
  • the information acquisition device identifies at least one entity search term in the question to be answered, performs information search based on the at least one entity search term, and obtains the search text in the form of a subgraph corresponding to the at least one entity search term.
  • the search text in the form of a subgraph is matched with the question to be answered, and the search text in the form of the target subgraph is determined.
  • the search text in the form of the target subgraph at least one candidate answer corresponding to the question to be answered is determined, and at least one candidate answer and the question to be answered are obtained.
  • the target answer of the question to be answered is determined from at least one candidate answer according to the similarity degree.
  • the embodiment of the present disclosure uses subgraph matching to perform entity disambiguation, and simultaneously realizes the three key tasks of entity recognition, entity disambiguation, and text matching. This method does not require the introduction of external corpus or the construction of templates, which improves the flexibility of the question and answer system And efficiency.
  • an embodiment of the present disclosure also provides an electronic device, including: a processor, a memory, and a computer program stored on the memory and capable of running on the processor.
  • a processor executes the program, Realize any of the above-mentioned information acquisition methods.
  • the embodiments of the present disclosure also provide a non-volatile computer-readable storage medium.
  • the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute the information acquisition described in any one of the above. method.
  • the device embodiments described above are merely illustrative.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units.
  • Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement it without creative work.
  • the various component embodiments of the present disclosure may be implemented by hardware, or by software modules running on one or more processors, or by a combination of them.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the electronic device according to the embodiments of the present disclosure.
  • DSP digital signal processor
  • the present disclosure can also be implemented as a device or device program (for example, a computer program and a computer program product) for executing part or all of the methods described herein.
  • Such a program for realizing the present disclosure may be stored on a computer-readable medium, or may have the form of one or more signals.
  • Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.
  • FIG. 13 shows an electronic device that can implement the method according to the present disclosure.
  • the electronic device traditionally includes a processor 1010 and a computer program product in the form of a memory 1020 or a computer-readable medium.
  • the memory 1020 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the memory 1020 has a storage space 1030 for executing program codes 1031 of any method steps in the above methods.
  • the storage space 1030 for program codes may include various program codes 1031 respectively used to implement various steps in the above method. These program codes can be read from or written into one or more computer program products.
  • These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards, or floppy disks.
  • Such a computer program product is usually a portable or fixed storage unit as described with reference to FIG. 14.
  • the storage unit may have storage segments, storage spaces, etc. arranged similarly to the memory 1020 in the electronic device of FIG. 13.
  • the program code can be compressed in an appropriate form, for example.
  • the storage unit includes computer-readable codes 1031', that is, codes that can be read by, for example, a processor such as 1010. These codes, when run by an electronic device, cause the electronic device to execute each of the methods described above. step.

Abstract

一种信息获取方法、装置、电子设备及计算机可读存储介质,涉及自然语言处理技术领域,所述方法包括:识别出待解答问题中的至少一个实体检索词(101);根据至少一个实体检索词进行信息检索,得到至少一个实体检索词对应的子图形式的检索文本(102);对子图形式的检索文本与待解答问题进行匹配,确定出目标子图形式的检索文本(103);根据目标子图形式的检索文本,确定待解答问题的目标答案(104)。

Description

信息获取方法、装置、电子设备及计算机可读存储介质
相关申请的交叉引用
本公开要求在2020年02月26日提交中国专利局、申请号为202010121474.7、发明名称为“信息获取方法、装置、电子设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及自然语言处理技术领域,特别是涉及一种信息获取方法、装置、电子设备及计算机可读存储介质。
背景技术
问答系统是当前自然语言处理的研究热点之一,问答系统中一个重要步骤是问句实体链指,链指的结果直接影响问答系统的性能。
传统的问句实体连接方法主要通过两个步骤完成,即命名实体识别和实体连接。实体识别当前主要是基于条件随机域(Conditional Random Field,CRF)、或者双向长短期记忆神经网络条件随机域(Bidirectional Long Short-term Memory CRF,BLSTM CRF)等方式,实体链指主要采用分类方法和相似度计算等方法。分类方法需要先选择候选实体,使用经典机器学习方法或神经网络方法进行分类。
在相似度计算方面,有概率主题模型、基于图的方法和排序法等方法。在通常技术方案中,有的采用基于词向量(Word Embedding)的方法进行实体链接,另外一些文献采用构建模版的方法进行问题理解。
概述
本公开提供一种信息获取方法、装置、电子设备及计算机可读存储介质,以解决相关技术中需要大量的人工模板、费时费力、缺乏灵活性,拓展性不强的问题。
为了解决上述问题,本公开公开了一种信息获取方法,包括:
识别出待解答问题中的至少一个实体检索词;
根据所述至少一个实体检索词进行信息检索,得到所述至少一个实体检索词对应的子图形式的检索文本;
对所述子图形式的检索文本与所述待解答问题进行匹配,确定出目标子图形式的检索文本;
根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案。
可选地,所述根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案,包括:
根据所述目标子图形式的检索文本,确定所述待解答问题对应的至少一个候选答案;
获取所述至少一个候选答案与所述待解答问题对应的相似度;
根据所述相似度,从所述至少一个候选答案中确定所述待解答问题的目标答案。
可选地,所述识别出待解答问题中的至少一个实体检索词,包括:
获取所述待解答问题;
将所述待解答问题输入至第一网络模型进行文本识别;
根据文本识别结果,确定出所述待解答问题中的起止位置;
根据所述起止位置,确定所述至少一个实体检索词。
可选地,所述根据所述至少一个实体检索词进行信息检索,得到所述至少一个实体检索词对应的子图形式的检索文本,包括:
采用所述至少一个实体检索词在预置知识库中进行检索,得到与所述至少一个实体检索词关联的多个初始检索文本;
将所述至少一个实体检索词与所述多个初始检索文本以子图形式进行关联,得到所述子图形式的检索文本。
可选地,所述对所述子图形式的检索文本与所述待解答问题进行匹配,确定出目标子图形式的检索文本,包括:
将所述子图形式的检索文本与所述待解答问题组成句子对文本;
将所述句子对文本输入至所述第二网络模型;
通过所述第二网络模型对所述句子对文本进行实体消歧处理,确定所述目标子图形式的检索文本。
可选地,所述根据所述目标子图形式的检索文本,确定所述待解答问题对应的至少一个候选答案,包括:
对所述目标子图形式的检索文本进行拆解,得到所述至少一个候选答案。
可选地,所述获取所述至少一个候选答案与所述待解答问题对应的相似度,包括:
将所述至少一个候选答案分别与所述待解答问题输入至第三网络模型;
通过所述第三网络模型对所述至少一个候选答案和所述待解答问题进行相似度匹配,确定所述至少一个候选答案与所述待解答问题的相似 度。
可选地,所述获取所述至少一个候选答案与所述待解答问题对应的相似度,包括:
将所述至少一个候选答案分别与所述待解答问题输入至余弦相似度计算模型;
通过所述余弦相似度计算模型对所述至少一个候选答案和所述待解答问题进行相似度匹配,确定所述至少一个候选答案与所述待解答问题的相似度。
可选地,所述根据所述相似度,从所述至少一个候选答案中确定所述待解答问题的目标答案,包括:
将所述相似度与预设的相似度阈值进行比较;
从所述至少一个候选答案中获取所述相似度大于所述相似度阈值的答案,将所述答案作为所述目标答案。
为了解决上述问题,本公开提供了一种电子设备,包括:
处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行如下操作:
识别出待解答问题中的至少一个实体检索词;
根据所述至少一个实体检索词进行信息检索,得到所述至少一个实体检索词对应的子图形式的检索文本;
对所述子图形式的检索文本与所述待解答问题进行匹配,确定出目标子图形式的检索文本;并且
根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案。
可选地,所述根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案,包括:
根据所述目标子图形式的检索文本,确定所述待解答问题对应的至少一个候选答案;
获取所述至少一个候选答案与所述待解答问题对应的相似度;并且
根据所述相似度,从所述至少一个候选答案中确定所述待解答问题的目标答案。
可选地,所述识别出待解答问题中的至少一个实体检索词,包括:
获取所述待解答问题;
将所述待解答问题输入至第一网络模型进行文本识别;
根据文本识别结果,确定出所述待解答问题中的起止位置;并且
根据所述起止位置,确定所述至少一个实体检索词。
可选地,所述根据所述至少一个实体检索词进行信息检索,得到所 述至少一个实体检索词对应的子图形式的检索文本,包括:
采用所述至少一个实体检索词在预置知识库中进行检索,得到与所述至少一个实体检索词关联的多个初始检索文本;并且
将所述至少一个实体检索词与所述多个初始检索文本以子图形式进行关联,得到所述子图形式的检索文本。
可选地,所述对所述子图形式的检索文本与所述待解答问题进行匹配,确定出目标子图形式的检索文本,包括:
将所述子图形式的检索文本与所述待解答问题组成句子对文本;
将所述句子对文本输入至所述第二网络模型;并且
通过所述第二网络模型对各所述句子对文本进行实体消歧处理,确定所述目标子图形式的检索文本。
可选地,所述根据所述目标子图形式的检索文本,确定所述待解答问题对应的至少一个候选答案,包括:
对所述目标子图形式的检索文本进行拆解,得到所述至少一个候选答案。
可选地,所述获取所述至少一个候选答案与所述待解答问题对应的相似度,包括:
将所述至少一个候选答案分别与所述待解答问题输入至第三网络模型;并且
通过所述第三网络模型对所述至少一个候选答案和所述待解答问题进行相似度匹配,确定所述至少一个候选答案与所述待解答问题的相似度。
为了解决上述问题,本公开提供了一种非易失性计算机可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如下操作:
识别出待解答问题中的至少一个实体检索词;
根据所述至少一个实体检索词进行信息检索,得到所述至少一个实体检索词对应的子图形式的检索文本;
对所述子图形式的检索文本与所述待解答问题进行匹配,确定出目标子图形式的检索文本;并且
根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案。
可选地,所述根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案,包括:
根据所述目标子图形式的检索文本,确定所述待解答问题对应的至少一个候选答案;
获取所述至少一个候选答案与所述待解答问题对应的相似度;并且
根据所述相似度,从所述至少一个候选答案中确定所述待解答问题的目标答案。
可选地,所述识别出待解答问题中的至少一个实体检索词,包括:
获取所述待解答问题;
将所述待解答问题输入至第一网络模型进行文本识别;
根据文本识别结果,确定出所述待解答问题中的起止位置;并且
根据所述起止位置,确定所述至少一个实体检索词。
可选地,所述根据所述至少一个实体检索词进行信息检索,得到所述至少一个实体检索词对应的子图形式的检索文本,包括:
采用所述至少一个实体检索词在预置知识库中进行检索,得到与所述至少一个实体检索词关联的多个初始检索文本;并且
将所述至少一个实体检索词与所述多个初始检索文本以子图形式进行关联,得到所述子图形式的检索文本。
为了解决上述问题,本公开提供了一种计算机程序产品,包括计算机可读代码,当所述计算机可读代码在电子设备上运行时,导致所述电子设备执行上述的任一个所述的信息获取方法。
上述说明仅是本公开技术方案的概述,为了能够更清楚了解本公开的技术手段,而可依照说明书的内容予以实施,并且为了让本公开的上述和其它目的、特征和优点能够更明显易懂,以下特举本公开的具体实施方式。
附图简述
为了更清楚地说明本公开实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本公开实施例提供的一种信息获取方法的步骤流程图;
图2示出了本公开实施例提供的另一种信息获取方法的步骤流程图;
图3示出了本公开实施例提供的一种问答系统的示意图;
图4示出了本公开实施例提供的一种实体标注样例的示意图;
图5示出了本公开实施例提供的一种实体识别模型的示意图;
图6示出了本公开实施例提供的一种实体子图信息的示意图;
图7示出了本公开实施例提供的一种基于bert的子图匹配算法的示 意图;
图8示出了本公开实施例提供的一种子图拆解的示意图;
图9示出了本公开实施例提供的一种文本相似度匹配的示意图;
图10示出了本公开实施例提供的一种联合学习模型的示意图;
图11示出了本公开实施例提供的一种信息获取装置的结构示意图;
图12示出了本公开实施例提供的另一种信息获取装置的结构示意图;
图13示意性地示出了用于执行根据本公开的方法的电子设备的框图;以及
图14示意性地示出了用于保持或者携带实现根据本公开的方法的程序代码的存储单元。
详细描述
为使本公开的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本公开作进一步详细的说明。
参照图1,示出了本公开实施例提供的一种信息获取方法的步骤流程图,该信息获取方法具体可以包括如下步骤:
步骤101:识别出待解答问题中的至少一个实体检索词。
本公开实施例可以应用于问答系统中,以获取待解答问题对应的答案的场景中。
对于问答系统可以结合图3进行如下描述。
参照图3,示出了本公开实施例提供的一种问答系统的示意图,如图3所示,对于待解答问题“Q:徐悲鸿的八骏图创作于哪一年”,首先,可以对待解答问题进行实体识别,得到识别的实体检索词:“徐悲鸿”、“八骏图”,然后根据实体检索词进行信息检索,可以得到两个子图形式的检索结果:八骏图(郎世宁)和八骏图(徐悲鸿)(可以理解地,在知识图谱中信息都是以子图的形式存在的),然后,再通过子图匹配的方式进行实体消歧,去除非检索的信息,得到八骏图(徐悲鸿)对应的子图信息,并通过实体信息与待解答问题的文本相似度匹配,从而得到最终的答案。
接下来,结合具体地步骤,对本公开实施例的方案为进行详细描述。
待解答问题是指用于从知识图谱中获取到相应答案的问题。
在某些示例中,待解答问题可以是由用户输入的问题,例如,在用户A需要获取某个问题的答案时,可以在知识图谱中输入相应的问 题,从而可以得到对应的待解答问题。
在某些示例中,待解答问题还可以是从互联网上获取的问题,例如,可以获取用户针对哪些问题感兴趣,将用户比较感兴趣的问题作为待解答问题等。
可以理解地,上述示例仅是为了更好地理解本公开实施例的技术方案而列举的示例,在具体实现中,还可以采用其它方式获取待解答问题,本公开实施例对获取待解答问题的方式不加以限制。
实体检索词是指待解答问题中用于进行信息检索的实体词,在本公开中,可以采用指针标注的方式获取待解答问题中的实体检索词,而对于实体检索词的具体获取方式将在下述实施例中进行详细描述,本公开实施例在此不再加以赘述。
在获取待解答问题之后,可以对待解答问题进行识别,从而得到待解答问题中包含的至少一个实体检索词。例如,待解答问题为:徐悲鸿的八骏图创作于哪一年,其中包含的实体为:徐悲鸿、八骏图。
可以理解地,上述示例仅是为了更好地理解本公开实施例的技术方案而列举的示例,不作为对本公开实施例的唯一限制。
在识别出待解答问题中的至少一个实体检索词之后,执行步骤102。
步骤102:根据所述至少一个实体检索词进行信息检索,得到所述至少一个实体检索词对应的子图形式的检索文本。
子图形式的检索文本是指采用至少一个实体检索词在知识图谱进行信息检索,得到的检索结果文本。
可以理解地,在知识图谱中,各类信息通常是以子图形式的,子图形式可以结合图6进行描述,参照图6,示出了本公开实施例提供的一种实体子图信息的示意图,如图6所示,与八骏图相关的信息可以采用“—”连接,从而可以形成相应的子图形式的关联信息。
在识别出待解答问题中的至少一个实体检索词之后,可以采用实体检索词在知识图谱中进行信息检索,进而,可以得到与每一个实体检索词对应的子图形式的检索文本。
在根据至少一个实体检索词进行信息检索,得到至少一个实体检索词对应的子图形式的检索文本之后,执行步骤103。
步骤103:对所述子图形式的检索文本与所述待解答问题进行匹配,确定出目标子图形式的检索文本。
目标子图形式的检索文本是指从至少一个实体检索词中选取的与待解答问题匹配的子图形式的检索文本。即本步骤中实现实体消歧,去除与待解答问题不匹配的子图形式的检索文本,从而可以得到最终 的与待解答问题匹配的检索文本,即为目标子图形式的检索文本。
在得到至少一个实体检索词对应的子图形式的检索文本之后,则可以将子图形式的检索文本与待解答问题进行匹配,根据匹配结果可以从至少一个实体检索词中确定出与待解答问题匹配的目标子图形式的检索文本。对于匹配确定目标子图形式的检索文本的过程将在下述实施例中进行详细描述,本公开实施例在此不再加以赘述。
在对子图形式的检索文本与待解答问题进行匹配,确定出目标子图形式的检索文本之后,执行步骤104。
步骤104:根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案。
这一步骤104可以包括如下步骤104a、104b以及104c:
步骤104a,根据所述目标子图形式的检索文本,确定所述待解答问题对应的至少一个候选答案。
候选答案是指从目标子图形式的检索文本中选取作为待解答问题的答案的候选项。
在获取到与待解答问题匹配的目标子图形式的检索文本之后,则可以根据目标子图形式的检索文本确定待解答问题的至少一个候选答案,具体地,可以对目标子图形式的检索文本进行拆解,可以得到至少一个候选答案,例如,参照图8,示出了本公开实施例提供的一种子图拆解的示意图,如图8所示,在将图8左半图拆解后,可以如图8右半图所示的多个候选项:八骏图作者徐悲鸿,八骏图创作时间近代,八骏图收藏地不详,八骏图流派浪漫主义,八骏图创作类别水墨画等。
可以理解地,上述示例仅是为了更好地理解本公开实施例的技术方案而列举的示例,不作为对本公开实施例的唯一限制。
在根据目标子图形式的检索文本,确定出待解答问题对应的至少一个候选答案之后,执行步骤104b。
步骤140b:获取所述至少一个候选答案与所述待解答问题对应的相似度。
相似度是指至少一个候选答案与待解答问题之间的相似程度,相似度可以反应出哪些候选答案与待解答问题比较接近,能够作为待解答问题的标准答案。
在根据目标子图形式的检索文本,确定出待解答问题对应的至少一个候选答案之后,可以获取至少一个候选答案与待解答问题之间的相似度,具体地,可以将至少一个候选答案分别与待解答问题输入至预置网络模型,通过预置网络模型识别出至少一个候选答案与待解答 问题之间的相似度,具体地,将在下述实施例中进行详细描述,本公开实施例在此不再加以赘述。
在获取各候选答案与待解答问题之间的相似度之后,执行步骤104c。
步骤104c:根据所述相似度,从所述至少一个候选答案中确定所述待解答问题的目标答案。
目标答案是指从至少一个候选答案中选择出的待解答问题的标准答案,即将最终选择的目标答案作为待解答问题的准确答案。
在获取至少一个候选答案与待解答问题之间的相似度之后,可以结合至少一个候选答案的相似度从至少一个候选答案中选择出待解答问题的目标答案,具体地,可以从至少一个候选答案中选择相似度最大的候选答案作为待解答问题的目标答案,或者,从至少一个候选答案中选择相似度大于设定相似度阈值的至少一个候选答案作为待解答问题的目标答案。具体地,可以根据业务需求而定,本公开实施例对此不加以限制。
本公开实施例通过采用子图匹配的方式进行实体消歧,无需构建模板,提高了问答系统的信息检索效率。
本公开实施例提供的信息获取方法,通过识别出待解答问题中的至少一个实体检索词,根据至少一个实体检索词进行信息检索,得到至少一个实体检索词对应的子图形式的检索文本,对子图形式的检索文本与待解答问题进行匹配,确定出目标子图形式的检索文本,根据目标子图形式的检索文本,确定待解答问题的目标答案。本公开实施例通过采用子图匹配的方式进行实体消歧,同时实现实体识别、实体消歧义以及文本匹配三个关键任务,该方法不需要引入外部语料也无需构建模板,提高问答系统的灵活性和效率。
参照图2,示出了本公开实施例提供的另一种信息获取方法的步骤流程图,该信息获取方法具体可以包括如下步骤:
步骤201:获取所述待解答问题。
本公开实施例可以应用于问答系统中,以获取待解答问题对应的答案的场景中。
对于问答系统可以结合图3进行如下描述。
参照图3,示出了本公开实施例提供的一种问答系统的示意图,如图3所示,对于待解答问题“Q:徐悲鸿的八骏图创作于哪一年”,首先,可以对待解答问题进行实体识别,得到识别的实体检索词:“徐悲鸿”、“八骏图”,然后根据实体检索词进行信息检索,可以得到两个子图形式的检索结果:八骏图(郎世宁)和八骏图(徐 悲鸿)(可以理解地,在知识图谱中信息都是以子图的形式存在的),然后,再通过子图匹配的方式进行实体消歧,去除非检索的信息,得到八骏图(徐悲鸿)对应的子图信息,并通过实体信息与待解答问题的文本相似度匹配,从而得到最终的答案。
接下来,结合具体地步骤,对本公开实施例的方案为进行详细描述。
待解答问题是指用于从知识图谱中获取到相应答案的问题。
在某些示例中,待解答问题可以是由用户输入的问题,例如,在用户A需要获取某个问题的答案时,可以在知识图谱中输入相应的问题,从而可以得到对应的待解答问题。
在某些示例中,待解答问题还可以是从互联网上获取的问题,例如,可以获取用户针对哪些问题感兴趣,将用户比较感兴趣的问题作为待解答问题等。
可以理解地,上述示例仅是为了更好地理解本公开实施例的技术方案而列举的示例,在具体实现中,还可以采用其它方式获取待解答问题,本公开实施例对获取待解答问题的方式不加以限制。
在获取到待解答问题之后,执行步骤202。
步骤202:将所述待解答问题输入至第一网络模型进行文本识别。
第一网络模型是指用于对待解答问题进行文本识别的模型,在本公开中,第一网络模型可以为bert模型等。
在获取到待解答问题之后,可以将待解答问题输入至第一网络模型,由第一网络模型对待解答问题进行文本识别。
在本公开中可以采用指针标注的方式实现文本识别,例如,参照图4,示出了本公开实施例提供的一种实体标注样例的示意图,如图4所示,可以采用两个序列标注分别标注实体的在数据中的起止位置,如图4所示,问句徐悲鸿的八骏图创作于哪一年?中“徐悲鸿”和“八骏图”的标注方式。
具体地,可以将待解答问题以单输入的方式输入至第一网络模型,如图5所示,将待解答问题输入至bert模型之后,进而,可以将句子编码为[CLS]徐悲鸿的八骏图创作于哪一年?[SEP],将BERT输出的编码通过一个全连接层,采用Sigmod激活函数,loss函数采用二进制交叉熵损失函数,最终输出序列每个位置上的值即为实体起止位置的置信度,这里取置信度大于0.5的位置为实体的起止位置,截取原始输入文本的相应位置便可以得到实体。
在将待解答问题输入至第一网络模型进行文本识别之后,执行步 骤203。
步骤203:根据文本识别结果,确定出待解答问题中的起止位置。
起止位置是指在待解答问题中进行标注的起始和结束位置,通过标注的起止位置可以确定标注的实体词。
至将待解答问题输入至第一网络模型进行文本识别之后,在可以根据文本识别结果得到在待解答问题的文本中的标注起止位置,如图4所示,可以采用指针标注的方式进行实体识别,具体方法为:用两个序列标注分别标注实体的在数据中的起止位置,图4便是问句“徐悲鸿的八骏图创作于哪一年?”中“徐悲鸿”和“八骏图”的标注方式。
在根据文本识别结果确定出待解答问题中的起止位置之后,执行步骤204。
步骤204:根据所述起止位置,确定所述至少一个实体检索词。
实体检索词是指待解答问题中用于进行信息检索的实体词。
在确定出待解答问题中的起止位置之后,可以根据起止位置识别出待解答问题中的实体词,如图4所示,根据标注结果,可以得到其中的实体词为:“徐悲鸿”和“八骏图”。
在根据文本识别结果得到至少一个实体检索词之后,执行步骤205。
步骤205:采用所述至少一个实体检索词在预置知识库中进行检索,得到与所述至少一个实体检索词关联的多个初始检索文本。
在本公开中,预置知识库是指预先生成的对应于知识图谱的数据库,在预置知识库中,可以将知识图谱的信息全部以关联形式存储于数据库中,以得到预置知识库,具体地,可以采用数据库列表的形式,以某个实体词作为索引,将其关联的信息依次排布,从而可以形成具有众多关联关系的子图形式的关联信息。
初始检索文本是指采用实体检索词在预置知识库中检索得到的检索文本。
在获取至少一个实体检索词之后,则可以采用至少一个实体检索词在预置知识库中进行检索,从而,可以得到与每个实体检索词关联的多个初始检索文本。
在采用至少一个实体检索词在预置知识库中进行检索,得到与至少一个实体检索词关联的多个初始检索文本之后,执行步骤206。
步骤206:将所述至少一个实体检索词与所述多个初始检索文本以子图形式进行关联,得到所述子图形式的检索文本。
将识别的实体作为检索词进行知识图谱检索,例如,当检索八骏图时,知识库中存在两个八骏图,可以从知识图谱中获取该实体的属性和关系,它们是以子图的形式存在知识图谱中,如图6所示。为了区别问句中的八骏图是图6中的哪一个,将实体的属性与关系用“—”拼接起来,作为该实体的描述信息。如图6所示,可以将八骏图(徐悲鸿)和八骏图(郎世宁)分别对应的信息相关联,能够得到这两个实体分别对应的子图形式的检索文本,如,两个八骏图的实体描述分别为:作者徐悲鸿__创作时间近代__创作类别水墨画__流派浪漫主义__收藏地不详;作者郎世宁__创作时间清代__创作类别绢本设色__流派宫廷绘画__收藏地故宫博物院。
在将至少一个实体检索词与多个初始检索文本以子图形式进行关联,得到子图形式的检索文本之后,执行步骤207。
步骤207:将所述子图形式的检索文本与所述待解答问题组成句子对文本。
句子对是指两个文本组成的一对句子文本,例如,两个文本分别为“徐悲鸿”、“张大千”,这两个文本组成句子对即为“徐悲鸿—张大千”;再例如,两个文本为“山水画”、“风景画”,这两个文本组成的句子对即为“山水画—风景画”。
句子对文本是指子图形式的检索文本与待解答问题所组成的句子对,也即在得到每个实体检索词对应的子图形式的检索文本之后,则将每个子图形式的检索文本与待解答问题组成一个句子对,从而可以得到句子对文本。
在将各子图形式的检索文本与待解答问题组成句子对文本之后,执行步骤208。
步骤208:将所述句子对文本输入至所述第二网络模型。
第二网络模型是指预先设置的用于对子图形式的检索文本进行实体消歧的网络模型,第二网络模型可以为bert模型等,具体地,可以根据业务需求而定,本公开实施例对此不加以限制。
在将各子图形式的检索文本与待解答问题组成句子对文本之后,可以将各句子对文本输入至第二网络模型,例如,承接步骤207中的示例,输入BERT的句子对编码为:[CLS]徐悲鸿的八骏图创作于哪一年?[SEP]作者徐悲鸿__创作时间近代__创作类别水墨画__流派浪漫主义__收藏地不详[SEP],则可以采用“[CLS]徐悲鸿的八骏图创作于哪一年?[SEP]”输入至bert模型,采用dense层和sigmod层对输入的问题进行处理。
可以理解地,上述示例仅是为了更好地理解本公开实施例的技术方案而列举的示例,不作为对本公开实施例的唯一限制。
在将句子对文本输入至第二网络模型之后,执行步骤209。
步骤209:通过所述第二网络模型对所述句子对文本进行实体消歧处理,确定所述目标子图形式的检索文本。
目标子图形式的检索文本是指从至少一个实体检索词中选取的与待解答问题匹配的子图形式的检索文本。即本步骤中实现实体消歧,去除与待解答问题不匹配的子图形式的检索文本,从而可以得到最终的与待解答问题匹配的检索文本,即为目标子图形式的检索文本。
在将句子对文本输入至第二网络模型之后,可以通过第二网络模型对句子对文本进行实体消歧处理,具体地,可以将子图形式的检索文本与待解答问题进行语义分析识别,从而识别出与待解答问题匹配的目标子图形式的检索文本。例如,参照图7,示出了本公开实施例提供的一种基于bert的子图匹配算法的示意图,如图7所示,在组成每个子图形式的检索文本与待解答问题之间的句子对文本之后,则可以输入至第二网络模型,由第二网络模型根据待解答问题和实体描述,确定出与待解答问题匹配的目标子图形式的检索文本。
在通过第二网络模型对句子对文本进行实体消歧处理,确定目标子图形式的检索文本之后,执行步骤210。
步骤210:对所述目标子图形式的检索文本进行拆解,得到所述至少一个候选答案。
候选答案是指从目标子图形式的检索文本中选取作为待解答问题的答案的候选项。
在确定了问句中核心实体的子图(即目标子图形式的检索文本),为进一步确定答案,需要将核心实体的子图按照关系和属性进行拆解,从而可以得到至少一个候选答案,例如,参照图8,示出了本公开实施例提供的一种子图拆解的示意图,如图8所示,在将图8左半图拆解后,可以如图8右半图所示的多个候选项:八骏图作者徐悲鸿,八骏图创作时间近代,八骏图收藏地不详,八骏图流派浪漫主义,八骏图创作类别水墨画等。
可以理解地,上述示例仅是为了更好地理解本公开实施例的技术方案而列举的示例,不作为对本公开实施例的唯一限制。
在对目标子图形式的检索文本进行拆解,得到至少一个候选答案之后,执行步骤211或步骤213。
步骤211:将所述至少一个候选答案分别与所述待解答问题输入至第三网络模型。
第三网络模型是指用于计算候选答案与待解答问题之间的相似度的模型。第三网络模型可以为bert模型等,具体地,可以根据业务需求而定,本公开实施例对此不加以限制。
在获取至少一个候选答案之后,则可以将至少一个候选答案分别与待解答问题输入至第三网络模型。
在将至少一个候选答案分别与待解答问题输入至第三网络模型之后,执行步骤212。
步骤212:通过所述第三网络模型对所述至少一个候选答案和所述待解答问题进行相似度匹配,确定所述至少一个候选答案与所述待解答问题的相似度。
相似度是指至少一个候选答案与待解答问题之间的相似程度,相似度可以反应出哪些候选答案与待解答问题比较接近,能够作为待解答问题的标准答案。
在将至少一个候选答案分别与待解答问题输入至第三网络模型之后,可以通过第三网络模型对至少一个候选答案进行相似度计算,例如,参照图9,示出了本公开实施例提供的一种文本相似度匹配的示意图,如图9所示,可以将问句(即待解答问题)与关系/属性描述(即候选答案)输入至BERT,通过BERT模型上对至少一个候选答案和待解答问题进行相似度匹配,从而获取到至少一个候选答案与待解答问题之间的相似度。
步骤213,将所述至少一个候选答案分别与所述待解答问题输入至余弦相似度计算模型。
在将至少一个候选答案分别与待解答问题输入至余弦相似度计算模型之后,执行步骤214。
步骤214,通过所述余弦相似度计算模型对所述至少一个候选答案和所述待解答问题进行相似度匹配,确定所述至少一个候选答案与所述待解答问题的相似度。
可以理解的是,在具体实现中,也可以采用计算余弦相似度的方式计算各候选答案与待解答问题之间的相似度,本公开实施例对于计算相似度的方式不做具体限定。
本公开实施例上述步骤提及的三种模型可以是采用联合学习的方式得到的,即述三个任务均采用google的预训练BERT模型作为特征提取器,因此我们考虑采用联合学习的方案实现三个任务。这里称实体识别任务为Task A,子图匹配任务为Task B,文本相似度匹配任务为Task C。为统一loss函数,可将Task C中的余弦相似度目标函数改成二分类交叉熵损失函数。联合学习的目标函数是最小化loss= loss_TaskA+loss_TaskB+loss_TaskC。本公开通过同时利用联合学习的方法实现实体识别、实体消歧义以及文本匹配三个关键任务,该方法不需要引入外部语料也无需构建模板,提高问答系统的灵活性和效率。
在通过第三网络模型对至少一个候选答案和待解答问题进行相似度匹配,确定各候选答案与待解答问题的相似度之后,执行步骤104c。
步骤104c:根据所述相似度,从所述至少一个候选答案中确定所述待解答问题的目标答案。
目标答案是指从至少一个候选答案中选择出的待解答问题的标准答案,即将最终选择的目标答案作为待解答问题的准确答案。
这一步骤104c可以包括如下步骤104d和104f:
步骤104d,将所述相似度与预设的相似度阈值进行比较。
步骤104f,从所述至少一个候选答案中获取所述相似度大于所述相似度阈值的答案,将所述答案作为所述目标答案。
具体地,可以由业务人员预先设置一个与至少一个候选答案的相似度进行比较的相似度阈值,对于相似度阈值的具体数值可以根据业务需求而定,本公开实施例对此不加以限制。在计算得到至少一个候选答案与待解答问题的相似度之后,可以结合至少一个候选答案的相似度从至少一个候选答案中选择出待解答问题的目标答案,即从至少一个候选答案中获取相似度大于相似度阈值的候选答案,并将相似度大于相似度阈值的候选答案作为目标答案。
本公开实施例通过采用子图匹配的方式进行实体消歧,无需构建模板,提高了问答系统的信息检索效率。
本公开实施例提供的信息获取方法,通过识别出待解答问题中的至少一个实体检索词,根据至少一个实体检索词进行信息检索,得到至少一个实体检索词对应的子图形式的检索文本,对子图形式的检索文本与待解答问题进行匹配,确定出目标子图形式的检索文本,根据目标子图形式的检索文本,确定待解答问题对应的至少一个候选答案,获取至少一个候选答案与待解答问题对应的相似度,根据相似度,从至少一个候选答案中确定待解答问题的目标答案。本公开实施例通过采用子图匹配的方式进行实体消歧,同时实现实体识别、实体消歧义以及文本匹配三个关键任务,该方法不需要引入外部语料也无需构建模板,提高问答系统的灵活性和效率。
参照图11,示出了本公开实施例提供的一种信息获取装置的结构示意图,该信息获取装置具体可以包括如下模块:
实体检索词识别模块310,用于识别出待解答问题中的至少一个实体检索词;
子图检索文本获取模块320,用于根据所述至少一个实体检索词进行信息检索,得到所述至少一个实体检索词对应的子图形式的检索文本;
目标子图文本确定模块330,用于对所述子图形式的检索文本与所述待解答问题进行匹配,确定出目标子图形式的检索文本;
目标答案确定模块340,用于根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案。
本公开实施例提供的信息获取装置,通过识别出待解答问题中的至少一个实体检索词,根据至少一个实体检索词进行信息检索,得到至少一个实体检索词对应的子图形式的检索文本,对子图形式的检索文本与待解答问题进行匹配,确定出目标子图形式的检索文本,根据目标子图形式的检索文本,确定待解答问题的目标答案。本公开实施例通过采用子图匹配的方式进行实体消歧,同时实现实体识别、实体消歧义以及文本匹配三个关键任务,该方法不需要引入外部语料也无需构建模板,提高问答系统的灵活性和效率。
参照图12,示出了本公开实施例提供的一种信息获取装置的结构示意图,该信息获取装置具体可以包括如下模块:
实体检索词识别模块410,用于识别出待解答问题中的至少一个实体检索词;
子图检索文本获取模块420,用于根据所述至少一个实体检索词进行信息检索,得到所述至少一个实体检索词对应的子图形式的检索文本;
目标子图文本确定模块430,用于对所述子图形式的检索文本与所述待解答问题进行匹配,确定出目标子图形式的检索文本;
目标答案确定模块440,用于根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案。
可选地,目标答案确定模块440包括:
候选答案确定单元441,用于根据所述目标子图形式的检索文本,确定所述待解答问题对应的至少一个候选答案;
相似度获取单元442,用于获取所述至少一个候选答案与所述待解答问题对应的相似度;
目标答案确定单元443,用于根据所述相似度,从所述至少一个候选答案中确定所述待解答问题的目标答案。
可选地,所述实体检索词识别模块410包括:
待解答问题获取单元411,用于获取所述待解答问题;
文本识别单元412,用于将所述待解答问题输入至第一网络模型进行文本识别;
起止位置确定单元413,用于根据文本识别结果,确定出所述待解答问题中的起止位置;
实体检索词确定单元414,用于根据所述起止位置,确定所述至少一个实体检索词。
可选地,所述子图检索文本获取模块420包括:
初始检索文本获取单元421,用于采用所述至少一个实体检索词在预置知识库中进行检索,得到与所述至少一个实体检索词关联的多个初始检索文本;
子图检索文本获取单元422,用于将所述至少一个实体检索词与所述多个初始检索文本以子图形式进行关联,得到所述子图形式的检索文本。
可选地,所述目标子图文本确定模块430包括:
句子对文本组成单元431,用于将所述子图形式的检索文本与所述待解答问题组成句子对文本;
句子对文本输入单元432,用于将所述句子对文本输入至所述第二网络模型;
目标子图文本确定单元433,用于通过所述第二网络模型对所述句子对文本进行实体消歧处理,确定所述目标子图形式的检索文本。
可选地,所述候选答案确定单元441包括:
候选答案获取子单元4411,用于对所述目标子图形式的检索文本进行拆解,得到所述至少一个候选答案。
可选地,所述相似度获取单元442包括:
第一候选答案输入子单元4421,用于将所述至少一个候选答案分别与所述待解答问题输入至第三网络模型;
第一相似度确定子单元4422,用于通过所述第三网络模型对所述至少一个候选答案和所述待解答问题进行相似度匹配,确定所述至少一个候选答案与所述待解答问题的相似度。
可选地,所述相似度获取单元442包括:
第二候选答案输入子单元4423,用于将所述至少一个候选答案分别与所述待解答问题输入至余弦相似度计算模型;
第二相似度确定子单元4424,用于通过所述余弦相似度计算模型对所述至少一个候选答案和所述待解答问题进行相似度匹配,确定所述至少一个候选答案与所述待解答问题的相似度。
可选地,所述目标答案确定单元443包括:
相似度比较子单元4431,用于将所述相似度与预设的相似度阈值进行比较;
目标答案获取子单元4432,用于从所述至少一个候选答案中获取所述相似度大于所述相似度阈值的答案,将所述答案作为所述目标答案。
本公开实施例提供的信息获取装置,通过识别出待解答问题中的至少一个实体检索词,根据至少一个实体检索词进行信息检索,得到至少一个实体检索词对应的子图形式的检索文本,对子图形式的检索文本与待解答问题进行匹配,确定出目标子图形式的检索文本,根据目标子图形式的检索文本,确定待解答问题对应的至少一个候选答案,获取至少一个候选答案与待解答问题对应的相似度,根据相似度,从至少一个候选答案中确定待解答问题的目标答案。本公开实施例通过采用子图匹配的方式进行实体消歧,同时实现实体识别、实体消歧义以及文本匹配三个关键任务,该方法不需要引入外部语料也无需构建模板,提高问答系统的灵活性和效率。
对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本公开所必须的。
另外地,本公开实施例还提供了一种电子设备,包括:处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现上述任一项所述的信息获取方法。
本公开实施例还提供了一种非易失性计算机可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行上述任一项所述的信息获取方法。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
本公开的各个部件实施例可以以硬件实现,或者以在一个或者多个 处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本公开实施例的电子设备中的一些或者全部部件的一些或者全部功能。本公开还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本公开的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
例如,图13示出了可以实现根据本公开的方法的电子设备。该电子设备传统上包括处理器1010和以存储器1020形式的计算机程序产品或者计算机可读介质。存储器1020可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器1020具有用于执行上述方法中的任何方法步骤的程序代码1031的存储空间1030。例如,用于程序代码的存储空间1030可以包括分别用于实现上面的方法中的各种步骤的各个程序代码1031。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图14所述的便携式或者固定存储单元。该存储单元可以具有与图13的电子设备中的存储器1020类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括计算机可读代码1031’,即可以由例如诸如1010之类的处理器读取的代码,这些代码当由电子设备运行时,导致该电子设备执行上面所描述的方法中的各个步骤。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同 要素。
以上对本公开所提供的一种信息获取方法、一种信息获取装置、一种电子设备和一种非易失性计算机可读存储介质,进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本公开的方法及其核心思想;同时,对于本领域的一般技术人员,依据本公开的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本公开的限制。

Claims (21)

  1. 一种信息获取方法,其中,包括:
    识别出待解答问题中的至少一个实体检索词;
    根据所述至少一个实体检索词进行信息检索,得到所述至少一个实体检索词对应的子图形式的检索文本;
    对所述子图形式的检索文本与所述待解答问题进行匹配,确定出目标子图形式的检索文本;并且
    根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案。
  2. 根据权利要求1所述的方法,其中,所述根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案,包括:
    根据所述目标子图形式的检索文本,确定所述待解答问题对应的至少一个候选答案;
    获取所述至少一个候选答案与所述待解答问题对应的相似度;并且
    根据所述相似度,从所述至少一个候选答案中确定所述待解答问题的目标答案。
  3. 根据权利要求1所述的方法,其中,所述识别出待解答问题中的至少一个实体检索词,包括:
    获取所述待解答问题;
    将所述待解答问题输入至第一网络模型进行文本识别;
    根据文本识别结果,确定出所述待解答问题中的起止位置;并且
    根据所述起止位置,确定所述至少一个实体检索词。
  4. 根据权利要求1所述的方法,其中,所述根据所述至少一个实体检索词进行信息检索,得到所述至少一个实体检索词对应的子图形式的检索文本,包括:
    采用所述至少一个实体检索词在预置知识库中进行检索,得到与所述至少一个实体检索词关联的多个初始检索文本;并且
    将所述至少一个实体检索词与所述多个初始检索文本以子图形式进行关联,得到所述子图形式的检索文本。
  5. 根据权利要求1所述的方法,其中,所述对所述子图形式的检索文本与所述待解答问题进行匹配,确定出目标子图形式的检索文本,包括:
    将所述子图形式的检索文本与所述待解答问题组成句子对文本;
    将所述句子对文本输入至所述第二网络模型;并且
    通过所述第二网络模型对各所述句子对文本进行实体消歧处理,确 定所述目标子图形式的检索文本。
  6. 根据权利要求2所述的方法,其中,所述根据所述目标子图形式的检索文本,确定所述待解答问题对应的至少一个候选答案,包括:
    对所述目标子图形式的检索文本进行拆解,得到所述至少一个候选答案。
  7. 根据权利要求2所述的方法,其中,所述获取所述至少一个候选答案与所述待解答问题对应的相似度,包括:
    将所述至少一个候选答案分别与所述待解答问题输入至第三网络模型;并且
    通过所述第三网络模型对所述至少一个候选答案和所述待解答问题进行相似度匹配,确定所述至少一个候选答案与所述待解答问题的相似度。
  8. 根据权利要求2所述的方法,其中,所述获取所述至少一个候选答案与所述待解答问题对应的相似度,包括:
    将所述至少一个候选答案分别与所述待解答问题输入至余弦相似度计算模型;并且
    通过所述余弦相似度计算模型对所述至少一个候选答案和所述待解答问题进行相似度匹配,确定所述至少一个候选答案与所述待解答问题的相似度。
  9. 根据权利要求2所述的方法,所述根据所述相似度,从所述至少一个候选答案中确定所述待解答问题的目标答案,包括:
    将所述相似度与预设的相似度阈值进行比较;并且
    从所述至少一个候选答案中获取所述相似度大于所述相似度阈值的答案,将所述答案作为所述目标答案。
  10. 一种电子设备,其中,包括:
    处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行如下操作:
    识别出待解答问题中的至少一个实体检索词;
    根据所述至少一个实体检索词进行信息检索,得到所述至少一个实体检索词对应的子图形式的检索文本;
    对所述子图形式的检索文本与所述待解答问题进行匹配,确定出目标子图形式的检索文本;并且
    根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案。
  11. 根据权利要求10所述的电子设备,其中,所述根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案,包括:
    根据所述目标子图形式的检索文本,确定所述待解答问题对应的至少一个候选答案;
    获取所述至少一个候选答案与所述待解答问题对应的相似度;并且
    根据所述相似度,从所述至少一个候选答案中确定所述待解答问题的目标答案。
  12. 根据权利要求10所述的电子设备,其中,所述识别出待解答问题中的至少一个实体检索词,包括:
    获取所述待解答问题;
    将所述待解答问题输入至第一网络模型进行文本识别;
    根据文本识别结果,确定出所述待解答问题中的起止位置;并且
    根据所述起止位置,确定所述至少一个实体检索词。
  13. 根据权利要求10所述的电子设备,其中,所述根据所述至少一个实体检索词进行信息检索,得到所述至少一个实体检索词对应的子图形式的检索文本,包括:
    采用所述至少一个实体检索词在预置知识库中进行检索,得到与所述至少一个实体检索词关联的多个初始检索文本;并且
    将所述至少一个实体检索词与所述多个初始检索文本以子图形式进行关联,得到所述子图形式的检索文本。
  14. 根据权利要求10所述的电子设备,其中,所述对所述子图形式的检索文本与所述待解答问题进行匹配,确定出目标子图形式的检索文本,包括:
    将所述子图形式的检索文本与所述待解答问题组成句子对文本;
    将所述句子对文本输入至所述第二网络模型;并且
    通过所述第二网络模型对各所述句子对文本进行实体消歧处理,确定所述目标子图形式的检索文本。
  15. 根据权利要求11所述的电子设备,其中,所述根据所述目标子图形式的检索文本,确定所述待解答问题对应的至少一个候选答案,包括:
    对所述目标子图形式的检索文本进行拆解,得到所述至少一个候选答案。
  16. 根据权利要求11所述的电子设备,其中,所述获取所述至少一个候选答案与所述待解答问题对应的相似度,包括:
    将所述至少一个候选答案分别与所述待解答问题输入至第三网络模型;并且
    通过所述第三网络模型对所述至少一个候选答案和所述待解答问题进行相似度匹配,确定所述至少一个候选答案与所述待解答问题的相似 度。
  17. 一种非易失性计算机可读存储介质,其中,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如下操作:
    识别出待解答问题中的至少一个实体检索词;
    根据所述至少一个实体检索词进行信息检索,得到所述至少一个实体检索词对应的子图形式的检索文本;
    对所述子图形式的检索文本与所述待解答问题进行匹配,确定出目标子图形式的检索文本;并且
    根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案。
  18. 根据权利要求17所述的存储介质,其中,所述根据所述目标子图形式的检索文本,确定所述待解答问题的目标答案,包括:
    根据所述目标子图形式的检索文本,确定所述待解答问题对应的至少一个候选答案;
    获取所述至少一个候选答案与所述待解答问题对应的相似度;并且
    根据所述相似度,从所述至少一个候选答案中确定所述待解答问题的目标答案。
  19. 根据权利要求17所述的存储介质,其中,所述识别出待解答问题中的至少一个实体检索词,包括:
    获取所述待解答问题;
    将所述待解答问题输入至第一网络模型进行文本识别;
    根据文本识别结果,确定出所述待解答问题中的起止位置;并且
    根据所述起止位置,确定所述至少一个实体检索词。
  20. 根据权利要求17所述的存储介质,其中,所述根据所述至少一个实体检索词进行信息检索,得到所述至少一个实体检索词对应的子图形式的检索文本,包括:
    采用所述至少一个实体检索词在预置知识库中进行检索,得到与所述至少一个实体检索词关联的多个初始检索文本;并且
    将所述至少一个实体检索词与所述多个初始检索文本以子图形式进行关联,得到所述子图形式的检索文本。
  21. 一种计算机程序产品,包括计算机可读代码,当所述计算机可读代码在电子设备上运行时,导致所述电子设备执行根据权利要求1-9中的任一个所述的信息获取方法。
PCT/CN2021/074046 2020-02-26 2021-01-28 信息获取方法、装置、电子设备及计算机可读存储介质 WO2021169718A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/425,045 US20230169100A1 (en) 2020-02-26 2021-01-28 Method and apparatus for information acquisition, electronic device, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010121474.7 2020-02-26
CN202010121474.7A CN111368048A (zh) 2020-02-26 2020-02-26 信息获取方法、装置、电子设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021169718A1 true WO2021169718A1 (zh) 2021-09-02

Family

ID=71206363

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/074046 WO2021169718A1 (zh) 2020-02-26 2021-01-28 信息获取方法、装置、电子设备及计算机可读存储介质

Country Status (3)

Country Link
US (1) US20230169100A1 (zh)
CN (1) CN111368048A (zh)
WO (1) WO2021169718A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114385827A (zh) * 2021-12-29 2022-04-22 上海云思智慧信息技术有限公司 面向会议知识图谱的检索方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368048A (zh) * 2020-02-26 2020-07-03 京东方科技集团股份有限公司 信息获取方法、装置、电子设备及计算机可读存储介质
CN112052680B (zh) * 2020-10-14 2023-01-10 腾讯科技(深圳)有限公司 问题生成方法、装置、设备及存储介质
CN112579750A (zh) * 2020-11-30 2021-03-30 百度健康(北京)科技有限公司 相似病案的检索方法、装置、设备及存储介质
CN112860866B (zh) * 2021-02-09 2023-09-19 北京百度网讯科技有限公司 语义检索方法、装置、设备以及存储介质
CN113139037B (zh) * 2021-03-18 2023-04-14 北京三快在线科技有限公司 文本处理方法、装置、设备以及存储介质
CN116401340B (zh) * 2023-06-08 2023-08-11 中国标准化研究院 一种标准文献的查询比对方法及系统
CN116775947B (zh) * 2023-06-16 2024-04-19 北京枫清科技有限公司 一种图数据语义检索方法、装置、电子设备及存储介质
CN117421416B (zh) * 2023-12-19 2024-03-26 数据空间研究院 交互检索方法、装置和电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748757A (zh) * 2017-09-21 2018-03-02 北京航空航天大学 一种基于知识图谱的问答方法
CN110502621A (zh) * 2019-07-03 2019-11-26 平安科技(深圳)有限公司 问答方法、问答装置、计算机设备及存储介质
CN110659366A (zh) * 2019-09-24 2020-01-07 Oppo广东移动通信有限公司 语义解析方法、装置、电子设备以及存储介质
CN110837550A (zh) * 2019-11-11 2020-02-25 中山大学 基于知识图谱的问答方法、装置、电子设备及存储介质
CN111368048A (zh) * 2020-02-26 2020-07-03 京东方科技集团股份有限公司 信息获取方法、装置、电子设备及计算机可读存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915340B (zh) * 2014-03-10 2019-09-10 北京大学 自然语言问答方法及装置
US10867256B2 (en) * 2015-07-17 2020-12-15 Knoema Corporation Method and system to provide related data
US11520813B2 (en) * 2016-01-04 2022-12-06 International Business Machines Corporation Entailment knowledge base in natural language processing systems
US10509860B2 (en) * 2016-02-10 2019-12-17 Weber State University Research Foundation Electronic message information retrieval system
CN109284363B (zh) * 2018-12-03 2023-03-14 北京羽扇智信息科技有限公司 一种问答方法、装置、电子设备及存储介质
CN109858528A (zh) * 2019-01-10 2019-06-07 平安科技(深圳)有限公司 推荐系统训练方法、装置、计算机设备及存储介质
CN109885660B (zh) * 2019-02-22 2020-10-02 上海乐言信息科技有限公司 一种知识图谱赋能的基于信息检索的问答系统和方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748757A (zh) * 2017-09-21 2018-03-02 北京航空航天大学 一种基于知识图谱的问答方法
CN110502621A (zh) * 2019-07-03 2019-11-26 平安科技(深圳)有限公司 问答方法、问答装置、计算机设备及存储介质
CN110659366A (zh) * 2019-09-24 2020-01-07 Oppo广东移动通信有限公司 语义解析方法、装置、电子设备以及存储介质
CN110837550A (zh) * 2019-11-11 2020-02-25 中山大学 基于知识图谱的问答方法、装置、电子设备及存储介质
CN111368048A (zh) * 2020-02-26 2020-07-03 京东方科技集团股份有限公司 信息获取方法、装置、电子设备及计算机可读存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114385827A (zh) * 2021-12-29 2022-04-22 上海云思智慧信息技术有限公司 面向会议知识图谱的检索方法

Also Published As

Publication number Publication date
US20230169100A1 (en) 2023-06-01
CN111368048A (zh) 2020-07-03

Similar Documents

Publication Publication Date Title
WO2021169718A1 (zh) 信息获取方法、装置、电子设备及计算机可读存储介质
US11314370B2 (en) Method for extracting salient dialog usage from live data
CN107256267B (zh) 查询方法和装置
TWI746690B (zh) 自然語言問句答案的產生方法、裝置及伺服器
WO2021179897A1 (zh) 实体链接方法及装置
CN108038183B (zh) 结构化实体收录方法、装置、服务器和存储介质
JP2019501466A (ja) 検索エンジンの選択および最適化のための方法およびシステム
CN105677735B (zh) 一种视频搜索方法及装置
WO2021174783A1 (zh) 近义词推送方法、装置、电子设备及介质
US10762150B2 (en) Searching method and searching apparatus based on neural network and search engine
Peled et al. Matching entities across online social networks
CN110147494B (zh) 信息搜索方法、装置,存储介质及电子设备
Nockels et al. Understanding the application of handwritten text recognition technology in heritage contexts: a systematic review of Transkribus in published research
CN112818093A (zh) 基于语义匹配的证据文档检索方法、系统及存储介质
CN113742446A (zh) 一种基于路径排序的知识图谱问答方法及系统
Gygli et al. Efficient object annotation via speaking and pointing
CN110377790B (zh) 一种基于多模态私有特征的视频自动标注方法
CN116662518A (zh) 问答方法、装置、电子设备及可读存储介质
CN112989011B (zh) 数据查询方法、数据查询装置和电子设备
US20210056149A1 (en) Search system, search method, and program
CN105630837A (zh) 一种媒体记录搜索方法和装置
US8027957B2 (en) Grammar compression
CN114090777A (zh) 文本数据处理方法及装置
CN113157964A (zh) 一种语音搜索数据集的方法、装置及电子设备
CN112860940B (zh) 基于描述逻辑知识库上有序概念空间的音乐资源检索方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21759918

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21759918

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21759918

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 030423)

122 Ep: pct application non-entry in european phase

Ref document number: 21759918

Country of ref document: EP

Kind code of ref document: A1