CN109710732B - Information query method, device, storage medium and electronic equipment - Google Patents

Information query method, device, storage medium and electronic equipment Download PDF

Info

Publication number
CN109710732B
CN109710732B CN201811379175.2A CN201811379175A CN109710732B CN 109710732 B CN109710732 B CN 109710732B CN 201811379175 A CN201811379175 A CN 201811379175A CN 109710732 B CN109710732 B CN 109710732B
Authority
CN
China
Prior art keywords
record
knowledge base
preset
vocabulary set
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811379175.2A
Other languages
Chinese (zh)
Other versions
CN109710732A (en
Inventor
刘嘉伟
董超
崔朝辉
赵立军
张霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201811379175.2A priority Critical patent/CN109710732B/en
Publication of CN109710732A publication Critical patent/CN109710732A/en
Application granted granted Critical
Publication of CN109710732B publication Critical patent/CN109710732B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to an information query method, an information query device, a storage medium and electronic equipment, and relates to the technical field of information, wherein the method comprises the following steps: the method comprises the steps of obtaining a first vocabulary set by segmenting an obtained target problem, wherein the first vocabulary set comprises a segmentation result of the target problem, carrying out synonym expansion on the first vocabulary set according to a preset word vector to obtain a second vocabulary set, the word vector is obtained by utilizing a preset model to train a preset corpus, obtaining a matching score of the target problem and each record in the knowledge base according to a preset algorithm and a preset knowledge base according to the second vocabulary set and the preset knowledge base, wherein the knowledge base comprises at least one record, each record comprises a problem and an answer corresponding to the problem, and determining the answer matched with the target problem according to the matching score of each record. The method can effectively utilize the existing knowledge base to realize the information query service on the semantic level, and improve the accuracy and the coverage of information query.

Description

Information query method, device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of information technologies, and in particular, to an information query method, an information query device, a storage medium, and an electronic device.
Background
With the rapid development of information technologies such as internet, cloud computing and language processing technologies, artificial intelligence has increasingly affected people's daily life, wherein the intelligent question-answering system can query questions provided by users by using the existing knowledge base and provide corresponding answers for the users. In the prior art, a search engine is generally used for single keyword retrieval on the existing knowledge base, records matched with keywords are fed back to a user as answers, the accuracy of query is not high, and in many technical fields, a large amount of unstructured historical data are accumulated in the existing knowledge base and cannot be directly retrieved, so that the coverage rate of query is low.
Disclosure of Invention
The disclosure aims to provide an information query method, an information query device, a storage medium and electronic equipment, which are used for solving the problems of low information query accuracy and coverage rate in the prior art.
In order to achieve the above object, according to a first aspect of embodiments of the present disclosure, there is provided an information query method, including:
the method comprises the steps of obtaining a first vocabulary set by segmenting words of an obtained target problem, wherein the first vocabulary set comprises word segmentation results of the target problem;
performing synonym expansion on the first vocabulary set according to a preset word vector to obtain a second vocabulary set, wherein the word vector is obtained by training a preset corpus by using a preset model;
according to the second vocabulary set and a preset knowledge base, obtaining a matching score of the target question and each record in the knowledge base according to a preset algorithm, wherein the knowledge base comprises at least one record, and each record comprises a question and an answer corresponding to the question;
and determining answers matched with the target questions according to the matching scores of each record.
Optionally, the performing synonym expansion on the first vocabulary set according to a preset word vector to obtain a second vocabulary set includes:
training the corpus by using a preset word vector generation model to obtain the word vectors;
and performing synonym expansion on the first vocabulary set according to the word vector, preset stop words and professional words in the target field to which the target problem belongs to obtain the second vocabulary set.
Optionally, the obtaining, according to the second vocabulary set and a preset knowledge base, a matching score between the target problem and each record in the knowledge base according to a preset algorithm includes:
training the corpus by using a preset word vector generation model to obtain the word vectors;
performing synonym expansion on each record in the knowledge base according to the word vector, preset stop words and professional words of a target field to which the target problem belongs;
carrying out synonymy sentence expansion on each record in the knowledge base by utilizing a Neural Machine Translation (NMT) algorithm;
and according to the second vocabulary set and the knowledge base, obtaining the matching score of the target problem and each record in the knowledge base according to a preset algorithm.
Optionally, the performing synonym expansion on each record in the knowledge base by using a neural machine translation NMT algorithm includes:
translating a first record in a first language in the knowledge base into an intermediate record in a second language using the NMT algorithm;
translating the intermediate record into a synonymous record expressed in the first language using the NMT algorithm;
and storing the synonymous records into the knowledge base, wherein the first record is any one record in the knowledge base.
Optionally, the determining an answer matched with the target question according to the matching score of each record includes:
arranging the matching scores of each record in descending order from high to low to obtain a score ordering;
selecting answers in the top n records with the highest ranking in the grading ordering as answers matched with the target question; or,
when the ratio of the first-ranked matching score to the second-ranked matching score in the score ranking is larger than a preset threshold value, taking an answer in a record corresponding to the first-ranked matching score as an answer matched with the target question;
and when the ratio of the first-ranked matching score to the second-ranked matching score is less than or equal to a preset threshold value, selecting the answer in the top m records with the highest ranking in the score ranking as the answer matched with the target question.
Optionally, the obtaining, according to the second vocabulary set and a preset knowledge base, a matching score between the target problem and each record in the knowledge base according to a preset algorithm includes:
calculating the matching score of the target problem and each record in a knowledge base by using a first calculation formula according to the second vocabulary set and a preset knowledge base;
the first calculation formula includes:
Figure BDA0001871483450000031
wherein d isjScore for j record in the knowledge basejIs denoted by djS is the second vocabulary set and djQ is the number of words in the first set, tiNum (d) as the ith vocabulary in the second vocabulary setj) Is djThe number of words in the word segmentation result, num (t)i) Is tiAt djD is the number of occurrences ofIdentifying the number of records in the library, NiFor the inclusion of t in the knowledge baseiThe number of records of (2).
According to a second aspect of the embodiments of the present disclosure, there is provided an information query apparatus, the apparatus including:
the word segmentation module is used for segmenting the acquired target problem to acquire a first word set, wherein the first word set comprises a word segmentation result of the target problem;
the expansion module is used for carrying out synonym expansion on the first vocabulary set according to a preset word vector to obtain a second vocabulary set, wherein the word vector is obtained by utilizing a preset model to train a preset corpus;
the scoring module is used for acquiring a matching score between the target question and each record in a knowledge base according to a preset algorithm and the second vocabulary set and the preset knowledge base, wherein the knowledge base comprises at least one record, and each record comprises a question and an answer corresponding to the question;
and the determining module is used for determining answers matched with the target questions according to the matching scores of each record.
Optionally, the expansion module includes:
the first training submodule is used for training the corpus by utilizing a preset word vector generation model so as to obtain the word vectors;
and the first expansion submodule is used for carrying out synonym expansion on the first vocabulary set according to the word vector, preset stop words and professional words in the target field to which the target problem belongs so as to obtain the second vocabulary set.
Optionally, the scoring module includes:
the second training submodule is used for training the corpus by utilizing a preset word vector generation model so as to obtain the word vectors;
a second expansion submodule, configured to perform synonym expansion on each record in the knowledge base according to the word vector, a preset stop word, and a professional word in a target field to which the target problem belongs;
the synonym expansion submodule is used for carrying out synonym expansion on each record in the knowledge base by utilizing a Neural Machine Translation (NMT) algorithm;
and the scoring submodule is used for acquiring the matching score of the target problem and each record in the knowledge base according to a preset algorithm according to the second vocabulary set and the knowledge base.
Optionally, the synonym expansion submodule is configured to:
translating a first record in a first language in the knowledge base into an intermediate record in a second language using the NMT algorithm;
translating the intermediate record into a synonymous record expressed in the first language using the NMT algorithm;
and storing the synonymous records into the knowledge base, wherein the first record is any one record in the knowledge base.
Optionally, the determining module includes:
the sorting submodule is used for sorting the matching scores of each record in a descending order from high to low so as to obtain a score sorting;
a determining submodule, configured to select answers in the top n records in the ranking order as answers matching the target question; or,
the determining submodule is used for taking an answer in a record corresponding to the first ranked matching score as an answer matched with the target question when the ratio of the first ranked matching score to the second ranked matching score in the score ranking is larger than a preset threshold value;
the determining sub-module is further configured to select answers in the top m records with the highest ranking in the ranking order of scores as answers matching the target question when a ratio of the first ranked matching score to the second ranked matching score is less than or equal to a preset threshold.
Optionally, the scoring module is configured to:
calculating the matching score of the target problem and each record in a knowledge base by using a first calculation formula according to the second vocabulary set and a preset knowledge base;
the first calculation formula includes:
Figure BDA0001871483450000051
wherein d isjScore for j record in the knowledge basejIs denoted by djS is the second vocabulary set and djQ is the number of words in the first set, tiNum (d) as the ith vocabulary in the second vocabulary setj) Is djThe number of words in the word segmentation result, num (t)i) Is tiAt djD is the number of entries recorded in the knowledge base, NiFor the inclusion of t in the knowledge baseiThe number of records of (2).
According to a third aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the steps of the information query method provided by the first aspect.
According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the information query method provided by the first aspect.
According to the technical scheme, the method comprises the steps of firstly segmenting an obtained target problem to obtain a first vocabulary set containing a segmentation result of the target problem, then carrying out synonym expansion on the first vocabulary set according to a preset word vector to obtain a second vocabulary set, wherein the word vector is obtained by training a preset corpus by using a preset model, then determining a matching score of the target problem and each record in a knowledge base according to a preset algorithm according to the second vocabulary set and the preset knowledge base, each record contains a corresponding answer of the problem and the problem, and finally determining an answer matched with the target problem according to the matching score of each record in the knowledge base.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is a flow diagram illustrating a method of querying information in accordance with an exemplary embodiment;
FIG. 2 is a flow diagram illustrating another method of information querying, according to an example embodiment;
FIG. 3 is a flow diagram illustrating another method of information querying, according to an example embodiment;
FIG. 4 is a flow diagram illustrating another method of information querying, according to an example embodiment;
FIG. 5 is a block diagram illustrating an information query device in accordance with an exemplary embodiment;
FIG. 6 is a block diagram illustrating another information querying device, according to an example embodiment;
FIG. 7 is a block diagram illustrating another information querying device, according to an example embodiment;
FIG. 8 is a block diagram illustrating another information querying device, according to an example embodiment;
FIG. 9 is a block diagram illustrating an electronic device in accordance with an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Before introducing the information query method, the information query device, the storage medium and the electronic device provided by the present disclosure, an application scenario related to each embodiment in the present disclosure is first introduced, where the application scenario may be a human-computer interactive intelligent question-answering system, and a user may input a target question to be queried through the intelligent question-answering system to obtain a corresponding answer. The smart question answering system may be any kind of terminal, for example, a mobile terminal such as a smart phone, a tablet computer, a smart television, a smart watch, a PDA (Personal Digital Assistant, chinese), a portable computer, or a fixed terminal such as a desktop computer.
Fig. 1 is a flow chart illustrating a method of querying information, as shown in fig. 1, according to an exemplary embodiment, the method comprising the steps of:
step 101, performing word segmentation on the obtained target problem to obtain a first word set, wherein the first word set comprises word segmentation results of the target problem.
For example, first, a target problem input by a user is obtained, a word is segmented for the target problem according to a preset word segmentation method, and a word segmentation result of the target problem is stored in a first vocabulary set. The word segmentation method may be a Maximum Matching algorithm (MM), a semantic-based word segmentation method, a statistical-based word segmentation method, and the like, and for example, words included in the target problem may be identified according to a pre-stored dictionary in a left-to-right order, and words that do not conform to the language habit are removed by using a disambiguation method to obtain a word segmentation result of the target problem. The target questions are: taking the social insurance card where to get as an example, the target problem is participled to obtain a first vocabulary combination as follows: { where, earning, social security card }.
And 102, performing synonym expansion on the first vocabulary set according to a preset word vector to obtain a second vocabulary set, wherein the word vector is obtained by training a preset corpus by using a preset model.
For example, a preset model is first used to train a preset corpus to obtain Word vectors (english: Word embedding), the Word vectors have good semantic characteristics and can effectively express semantic and grammatical features, wherein the preset model may be a Word Vector generation model (english: Word to Vector, for short: Word2vec), the corpus may be a large amount of existing semantic data, and an information acquisition tool (e.g., a web crawler tool) may also be used to acquire semantic data of various technical fields on the internet, such as news, microblogs, forums, and the like. And carrying out synonym expansion on the first vocabulary set according to the word vector, and taking the synonym expanded first vocabulary set as a second vocabulary set. Taking the word "social security card" in the first word set as an example, the synonyms "social security card", and the like of the "social security card" can be put into the second word set after expansion of the synonyms. When the synonym expansion is carried out on the first vocabulary set by utilizing the word vector, the vocabulary in the second vocabulary set can be confirmed again by the manager, so that the synonym expansion accuracy is improved.
And 103, acquiring a matching score of the target question and each record in the knowledge base according to a preset algorithm according to the second vocabulary set and a preset knowledge base, wherein the knowledge base comprises at least one record, and each record comprises a question and an answer corresponding to the question.
For example, the predetermined knowledge base may include a plurality of records, where each record includes a question and an answer corresponding to the question, and may be understood as a question-answer pair, where the question in each record is not repeated. And according to the words and the knowledge base in the second word set, sequentially acquiring the matching scores of the target problem and each record in the knowledge base, wherein the matching scores can reflect the matching degree of each record and the second word set corresponding to the target problem. For example, the matching degree of each vocabulary in the second vocabulary set with each record in the knowledge base may be calculated respectively, and then the matching degree of each vocabulary is summed according to different weights to obtain the matching score of the target problem with each record in the knowledge base, wherein the weight corresponding to each vocabulary may be determined according to the number of times that each vocabulary appears in a record, for example, the more times that a vocabulary appears in a record, the higher the matching degree of the vocabulary with the record is, the higher the corresponding weight is.
And step 104, determining answers matched with the target questions according to the matching scores of each record.
For example, after determining the matching score of the target question and each record, according to the size of the matching score, determining the record matching the target question, and using the answer in the record as the answer matching the target question. For example, a preset number (e.g., 3) of records with the highest matching score may be determined, answers in the preset number of records may be used as answers matched with the target question and recommended to the user, so that the user may select the most needed answer, the answer in the record with the highest matching score may be directly used as the answer matched with the target question, and the answer in the record with the matching score satisfying a preset condition may be used as the answer matched with the target question.
In summary, the present disclosure first performs word segmentation on an obtained target problem to obtain a first vocabulary set including a word segmentation result of the target problem, and then performs synonym expansion on the first vocabulary set according to a preset word vector to obtain a second vocabulary set, where the word vector is obtained by training a preset corpus using a preset model, and then determines a matching score between the target problem and each record in the knowledge base according to a preset algorithm according to the second vocabulary set and the preset knowledge base, where each record includes a question and a question-corresponding answer, and finally determines an answer matched with the target problem according to the matching score of each record in the knowledge base.
Fig. 2 is a flow chart illustrating another information query method according to an exemplary embodiment, and as shown in fig. 2, step 102 may be implemented by:
step 1021, training the corpus by using a preset word vector generation model to obtain a word vector.
And 1022, performing synonym expansion on the first vocabulary set according to the word vector, the preset stop word and the professional word in the target field to which the target problem belongs to obtain a second vocabulary set.
For example, a corpus is trained by using a preset word vector generation model, wherein the word vector generation model is a neural network model, and can extract the features of semantics and grammar in the natural language according to semantic data in the corpus and train to obtain word vectors. And performing synonym expansion on the first vocabulary set according to the word vector, the preset stop words and the professional words in the target field to which the target problem belongs, and taking the synonym expanded first vocabulary set as a second vocabulary set. Stop Words (english: Stop Words) are Words or vocabularies that can be filtered out when processing natural language data (such as text) in a retrieval process, such as partial prepositions, conjunctions, adverbs and the like in chinese, and may include, for example: "in," "and," "to," "of," "in," and the like. The professional word of the target field to which the target question belongs may be determined first, for example, when the user inputs the target question, the target field to which the target question belongs may be selected in advance, or semantic analysis may be performed on the target question to determine the target field, and then the corresponding professional word may be obtained according to the target field. The target questions are: taking the social insurance card where to get as an example, the target problem is participled to obtain a first vocabulary combination as follows: { where, earning, social security card }, synonym expansion is performed on the first vocabulary set, and the second vocabulary set can be obtained as { where, earning, getting, social security card }. When the synonyms are expanded for the first vocabulary set, the manager can confirm the vocabulary in the second vocabulary set again to improve the accuracy of synonym expansion.
Fig. 3 is a flowchart illustrating another information query method according to an exemplary embodiment, and as shown in fig. 3, step 103 may include:
and step 1031, training the corpus by using a preset word vector generation model to obtain word vectors.
And 1032, performing synonym expansion on each record in the knowledge base according to the word vector, the preset stop word and the professional word of the target field to which the target problem belongs.
For example, a corpus is trained by using a preset word vector generation model to obtain word vectors. And carrying out synonym expansion on each record in the knowledge base according to the word vector, the preset stop words and the professional words of the target field to which the target problem belongs. For example, a record in the knowledge base is "how to receive the social security card", the record is participled to obtain a vocabulary set which is { social security card, how, and receiving }, the vocabulary set is synonym expanded, and the result of the participle corresponding to the record can be expanded as follows: { social security card, how, getting }.
And 1033, performing synonym expansion on each record in the knowledge base by using a Neural Machine Translation (NMT) algorithm.
Illustratively, the method utilizes an NMT (english: Natural Machine Translation, chinese: neural Machine Translation) algorithm to perform synonymy expansion on each record in the knowledge base, and can store the synonymy of each record as the synonymy of the record in the knowledge base. For example, an NMT algorithm may be used to translate any chinese record in the knowledge base into english, and then translate the english translation result into chinese, and the obtained chinese result is used as a synonymous sentence of the chinese record, and through two translation processes, the recorded vocabulary and sentence pattern are expanded. It should be noted that, when the synonym expansion is performed on each record in the knowledge base, the administrator may confirm the expansion again to improve the accuracy of the synonym expansion.
And 1034, acquiring a matching score between the target problem and each record in the knowledge base according to a preset algorithm according to the second vocabulary set and the knowledge base.
Illustratively, according to the words in the second word set and the knowledge base subjected to synonym expansion and synonym expansion, the matching scores of the target problem and each record in the knowledge base are sequentially obtained.
Alternatively, step 1033 may be implemented by:
1) a first record in a knowledge base expressed in a first language is translated into an intermediate record in a second language using an NMT algorithm.
2) The intermediate records are translated into synonymous records expressed in the first language using the NMT algorithm.
3) And storing the synonymous records into a knowledge base, wherein the first record is any record in the knowledge base.
Fig. 4 is a flowchart illustrating another information query method according to an exemplary embodiment, and as shown in fig. 4, step 104 includes:
step 1041, arranging the matching scores of each record in descending order from high to low to obtain a score ranking.
Step 1042, the answer in the top n records with the highest ranking in the ranking order of scores is selected as the answer matching the target question.
Or, in step 1043, when the ratio of the first ranked matching score to the second ranked matching score in the score ranking is greater than the preset threshold, the answer in the record corresponding to the first ranked matching score is taken as the answer matched with the target question.
Step 1044 is to select the answer in the top m records with the highest ranking in the ranking order as the answer matched with the target question when the ratio of the first ranked matching score to the second ranked matching score is less than or equal to a preset threshold.
For example, the matching scores of each record in the knowledge base are arranged in descending order from high to low to obtain a score ranking. And according to the grading ranking, taking the answers in the records meeting the preset conditions as answers matched with the target questions. The preset conditions can be as follows: answers in the top n (for example, 3) records in the ranking of scores are selected as answers matched with the target question, and the n answers are recommended to the user so that the user can select the most needed answer. The answer in the highest ranked record in the ranking order of scores may also be recommended directly as the answer for the target question match. The ratio of the first-ranked matching score to the second-ranked matching score in the score ranking may also be calculated first, when the ratio is greater than a preset threshold (indicating that the difference between the first-ranked matching score and the subsequent matching score is large), the answer in the record corresponding to the first-ranked matching score is taken as the answer matched with the target question, and when the ratio is less than or equal to the preset threshold, the answer in the top m (for example, 5) records ranked highest in the score ranking is selected as the answer matched with the target question.
Optionally, step 103 may be implemented by:
and calculating the matching score of the target problem and each record in the knowledge base by using a first calculation formula according to the second vocabulary set and a preset knowledge base.
The first calculation formula includes:
Figure BDA0001871483450000131
wherein d isjScore for the j record in the knowledge basejIs denoted by djS is the second vocabulary set and djQ is the number of words in the first set of words, tiNum (d) as the ith word in the second set of wordsj) Is djThe number of words in the word segmentation result, num (t)i) Is tiAt djThe number of occurrences in (1), D is the number of entries in the knowledge base, NiFor inclusion of t in the knowledge baseiThe number of records of (2).
It should be noted that the predetermined knowledge base (including D records, the jth record being D)j) The method may be an original knowledge base, that is, the problem in each record is not repeated, or may be a knowledge base subjected to synonym expansion and synonym expansion, that is, the original knowledge base is subjected to synonym expansion and synonym expansion by performing steps 1031 to 1033, and at this time, the problem in each record in the knowledge base may be repeated (for example: synonymous sentence). For example, take the knowledge base as the original knowledge base, the knowledge base includes 20 records as an example, djFor the j record in the 20 records, num (d) if the vocabulary in the word segmentation result of the j record is 3j) 3. Alternatively, the knowledge base may be the knowledge base after being expanded (via steps 1031-1034), assuming that d is determined by performing step 1032jPerforming synonym expansion, and performing synonym expansion on each of the 20 records by executing 1033, expanding the knowledge base to 30 records, djFor the j-th record of the 30 records, if djThe word segmentation result of (2) has 5 words, num (d)j) 5. Since the second vocabulary set is obtained by expanding the first vocabulary set, a scene with s larger than Q may occur, and at this time, provision may be made for
Figure BDA0001871483450000141
Is 1, i.e.
Figure BDA0001871483450000142
Is a positive number less than or equal to 1.
The target questions are: "where to pick up social security card", the knowledge base contains 30 records, (i.e., D ═ 30), and D in the knowledge basejFor example, to "how the social security card receives", the target problem is participled to obtain a first vocabulary set as: { where, pick, social security card }, the first vocabulary set is synonymously expanded to obtain a second vocabulary set of { where, pick, social security card } (i.e. corresponding to { t }1、t2、t3、t4}) to d)jPerforming word segmentation to obtain: { social security card, how, and acquisition }, then s is 2 (acquisition, social security card, respectively), Q is 3, t1Corresponding to
Figure BDA0001871483450000143
Is 0, t2Corresponding to
Figure BDA0001871483450000144
Is composed of
Figure BDA0001871483450000145
t3Corresponding to
Figure BDA0001871483450000146
Is 0, t4Corresponding to
Figure BDA0001871483450000147
Is composed of
Figure BDA0001871483450000148
Then ScorejIs composed of
Figure BDA0001871483450000149
In summary, the present disclosure first performs word segmentation on an obtained target problem to obtain a first vocabulary set including a word segmentation result of the target problem, and then performs synonym expansion on the first vocabulary set according to a preset word vector to obtain a second vocabulary set, where the word vector is obtained by training a preset corpus using a preset model, and then determines a matching score between the target problem and each record in the knowledge base according to a preset algorithm according to the second vocabulary set and the preset knowledge base, where each record includes a question and a question-corresponding answer, and finally determines an answer matched with the target problem according to the matching score of each record in the knowledge base.
Fig. 5 is a block diagram illustrating an information inquiry apparatus according to an exemplary embodiment, and as shown in fig. 5, the apparatus 200 includes:
the word segmentation module 201 is configured to obtain a first word set by performing word segmentation on the obtained target problem, where the first word set includes a word segmentation result of the target problem.
The expansion module 202 is configured to perform synonym expansion on the first vocabulary set according to a preset word vector to obtain a second vocabulary set, where the word vector is obtained by training a preset corpus using a preset model.
And the scoring module 203 is configured to obtain a matching score between the target question and each record in the knowledge base according to the second vocabulary set and a preset knowledge base according to a preset algorithm, where the knowledge base includes at least one record, and each record includes a question and an answer corresponding to the question.
A determining module 204, configured to determine an answer matching the target question according to the matching score of each record.
Fig. 6 is a block diagram illustrating another information query apparatus according to an exemplary embodiment, and as shown in fig. 6, the expansion module 202 includes:
the first training sub-module 2021 is configured to train the corpus using a preset word vector generation model to obtain a word vector.
The first expansion sub-module 2022 is configured to perform synonym expansion on the first vocabulary set according to the word vector, the preset stop word, and the professional word in the target field to which the target problem belongs, so as to obtain a second vocabulary set.
Fig. 7 is a block diagram illustrating another information query apparatus according to an exemplary embodiment, and as shown in fig. 7, the scoring module 203 includes:
the second training submodule 2031 is configured to train the corpus using a preset word vector generation model to obtain a word vector.
The second expansion sub-module 2032 is configured to expand synonyms for each record in the knowledge base according to the word vector, the preset stop word, and the professional word in the target field to which the target question belongs.
The synonym expansion submodule 2033 is configured to perform synonym expansion on each record in the knowledge base by using a neural machine translation NMT algorithm.
And the scoring submodule 2034 is configured to obtain, according to the second vocabulary set and the knowledge base, a matching score between the target problem and each record in the knowledge base according to a preset algorithm.
Optionally, the synonym expansion submodule 2033 may be implemented by:
1) a first record in a knowledge base expressed in a first language is translated into an intermediate record in a second language using an NMT algorithm.
2) The intermediate records are translated into synonymous records expressed in the first language using the NMT algorithm.
3) And storing the synonymous records into a knowledge base, wherein the first record is any record in the knowledge base.
Fig. 8 is a block diagram illustrating another information querying device according to an exemplary embodiment, and as shown in fig. 8, the determining module 204 includes:
the sorting sub-module 2041 is configured to sort the matching scores of each record in descending order from high to low, so as to obtain a score sorting.
The determining submodule 2042 is configured to select answers in the top n records in the ranking order of scores as answers matching the target question. Or,
the determining sub-module 2042 is configured to, when a ratio of a first ranked matching score to a second ranked matching score in the score ranking is greater than a preset threshold, take an answer in a record corresponding to the first ranked matching score as an answer matched with the target question.
The determining sub-module 2042 is further configured to select answers in the top m records with the highest ranking in the ranking order of scores as answers matching the target question when the ratio of the first-ranked matching score to the second-ranked matching score is less than or equal to a preset threshold.
Optionally, the scoring module 203 may be implemented by:
and calculating the matching score of the target problem and each record in the knowledge base by using a first calculation formula according to the second vocabulary set and a preset knowledge base.
The first calculation formula includes:
Figure BDA0001871483450000171
wherein d isjScore for the j record in the knowledge basejIs denoted by djS is the second vocabulary set and djQ is the number of words in the first set of words, tiNum (d) as the ith word in the second set of wordsj) Is djThe number of words in the word segmentation result, num (t)i) Is tiAt djThe number of occurrences in (1), D is the number of entries in the knowledge base, NiFor inclusion of t in the knowledge baseiThe number of records of (2).
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
In summary, the present disclosure first performs word segmentation on an obtained target problem to obtain a first vocabulary set including a word segmentation result of the target problem, and then performs synonym expansion on the first vocabulary set according to a preset word vector to obtain a second vocabulary set, where the word vector is obtained by training a preset corpus using a preset model, and then determines a matching score between the target problem and each record in the knowledge base according to a preset algorithm according to the second vocabulary set and the preset knowledge base, where each record includes a question and a question-corresponding answer, and finally determines an answer matched with the target problem according to the matching score of each record in the knowledge base.
Fig. 9 is a block diagram illustrating an electronic device 300 in accordance with an example embodiment. As shown in fig. 9, the electronic device 300 may include: a processor 301 and a memory 302. The electronic device 300 may also include one or more of a multimedia component 303, an input/output (I/O) interface 304, and a communication component 305.
The processor 301 is configured to control the overall operation of the electronic device 300, so as to complete all or part of the steps in the information query method. The memory 302 is used to store various types of data to support operation at the electronic device 300, such as instructions for any application or method operating on the electronic device 300 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and the like. The Memory 302 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 303 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 302 or transmitted through the communication component 305. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 304 provides an interface between the processor 301 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 305 is used for wired or wireless communication between the electronic device 300 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G or 4G, or a combination of one or more of them, so that the corresponding Communication component 305 may include: Wi-Fi module, bluetooth module, NFC module.
In an exemplary embodiment, the electronic Device 300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described information query method.
In another exemplary embodiment, there is also provided a computer readable storage medium including program instructions, which when executed by a processor, implement the steps of the information query method described above. For example, the computer readable storage medium may be the memory 302 including program instructions executable by the processor 301 of the electronic device 300 to perform the information query method described above.
In summary, the present disclosure first performs word segmentation on an obtained target problem to obtain a first vocabulary set including a word segmentation result of the target problem, and then performs synonym expansion on the first vocabulary set according to a preset word vector to obtain a second vocabulary set, where the word vector is obtained by training a preset corpus using a preset model, and then determines a matching score between the target problem and each record in the knowledge base according to a preset algorithm according to the second vocabulary set and the preset knowledge base, where each record includes a question and a question-corresponding answer, and finally determines an answer matched with the target problem according to the matching score of each record in the knowledge base.
Preferred embodiments of the present disclosure are described in detail above with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and other embodiments of the present disclosure may be easily conceived by those skilled in the art within the technical spirit of the present disclosure after considering the description and practicing the present disclosure, and all fall within the protection scope of the present disclosure.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. Meanwhile, any combination can be made between various different embodiments of the disclosure, and the disclosure should be regarded as the disclosure of the disclosure as long as the combination does not depart from the idea of the disclosure. The present disclosure is not limited to the precise structures that have been described above, and the scope of the present disclosure is limited only by the appended claims.

Claims (8)

1. An information query method, the method comprising:
the method comprises the steps of obtaining a first vocabulary set by segmenting words of an obtained target problem, wherein the first vocabulary set comprises word segmentation results of the target problem;
performing synonym expansion on the first vocabulary set according to a preset word vector to obtain a second vocabulary set, wherein the word vector is obtained by training a preset corpus by using a preset model;
according to the second vocabulary set and a preset knowledge base, obtaining a matching score of the target question and each record in the knowledge base according to a preset algorithm, wherein the knowledge base comprises at least one record, and each record comprises a question and an answer corresponding to the question;
determining answers matched with the target questions according to the matching scores of each record;
the obtaining of the matching score between the target problem and each record in the knowledge base according to the second vocabulary set and a preset knowledge base and a preset algorithm includes:
training the corpus by using a preset word vector generation model to obtain the word vectors;
performing synonym expansion on each record in the knowledge base according to the word vector, preset stop words and professional words of a target field to which the target problem belongs;
carrying out synonymy sentence expansion on each record in the knowledge base by utilizing a Neural Machine Translation (NMT) algorithm;
according to the second vocabulary set and the knowledge base, obtaining a matching score of the target problem and each record in the knowledge base according to a preset algorithm;
the obtaining of the matching score between the target problem and each record in the knowledge base according to the second vocabulary set and a preset knowledge base and a preset algorithm includes:
calculating the matching score of the target problem and each record in a knowledge base by using a first calculation formula according to the second vocabulary set and a preset knowledge base;
the first calculation formula includes:
Figure FDA0002903786180000011
wherein d isjScore for j record in the knowledge basejIs denoted by djS is the second vocabulary set and djQ is the number of words in the first set, tiNum (d) as the ith vocabulary in the second vocabulary setj) Is djThe number of words in the word segmentation result, num (t)i) Is tiAt djD is the number of entries recorded in the knowledge base, NiFor the inclusion of t in the knowledge baseiThe number of records of (2).
2. The method of claim 1, wherein synonymously expanding the first vocabulary set according to a predetermined word vector to obtain a second vocabulary set comprises:
training the corpus by using a preset word vector generation model to obtain the word vectors;
and performing synonym expansion on the first vocabulary set according to the word vector, preset stop words and professional words in the target field to which the target problem belongs to obtain the second vocabulary set.
3. The method according to claim 1, wherein said synonym augmenting each of said records in said knowledge base using a neural machine translation, NMT, algorithm comprises:
translating a first record in a first language in the knowledge base into an intermediate record in a second language using the NMT algorithm;
translating the intermediate record into a synonymous record expressed in the first language using the NMT algorithm;
and storing the synonymous records into the knowledge base, wherein the first record is any one record in the knowledge base.
4. The method of claim 1, wherein determining an answer that matches the target question based on the match score for each record comprises:
arranging the matching scores of each record in descending order from high to low to obtain a score ordering;
selecting answers in the top n records with the highest ranking in the grading ordering as answers matched with the target question; or,
when the ratio of the first-ranked matching score to the second-ranked matching score in the score ranking is larger than a preset threshold value, taking an answer in a record corresponding to the first-ranked matching score as an answer matched with the target question;
and when the ratio of the first-ranked matching score to the second-ranked matching score is less than or equal to a preset threshold value, selecting the answer in the top m records with the highest ranking in the score ranking as the answer matched with the target question.
5. An information query apparatus, comprising:
the word segmentation module is used for segmenting the acquired target problem to acquire a first word set, wherein the first word set comprises a word segmentation result of the target problem;
the expansion module is used for carrying out synonym expansion on the first vocabulary set according to a preset word vector to obtain a second vocabulary set, wherein the word vector is obtained by utilizing a preset model to train a preset corpus;
the scoring module is used for acquiring a matching score between the target question and each record in a knowledge base according to a preset algorithm and the second vocabulary set and the preset knowledge base, wherein the knowledge base comprises at least one record, and each record comprises a question and an answer corresponding to the question;
the determining module is used for determining answers matched with the target questions according to the matching scores of each record;
the scoring module comprises:
the second training submodule is used for training the corpus by utilizing a preset word vector generation model so as to obtain the word vectors;
the second expansion submodule is used for carrying out synonym expansion on each record in the knowledge base according to the word vector, preset stop words and professional words of the target field to which the target problem belongs;
the synonym expansion submodule is used for carrying out synonym expansion on each record in the knowledge base by utilizing a Neural Machine Translation (NMT) algorithm;
the scoring submodule is used for acquiring the matching score of the target problem and each record in the knowledge base according to a preset algorithm according to the second vocabulary set and the knowledge base;
the scoring module is configured to:
calculating the matching score of the target problem and each record in a knowledge base by using a first calculation formula according to the second vocabulary set and a preset knowledge base;
the first calculation formula includes:
Figure FDA0002903786180000041
wherein d isjScore for j record in the knowledge basejIs denoted by djS is the second vocabulary set and djQ is the number of words in the first set, tiNum (d) as the ith vocabulary in the second vocabulary setj) Is djThe number of words in the word segmentation result, num (t)i) Is tiAt djD is the number of entries recorded in the knowledge base, NiFor the inclusion of t in the knowledge baseiThe number of records of (2).
6. The apparatus of claim 5, wherein the expansion module comprises:
the first training submodule is used for training the corpus by utilizing a preset word vector generation model so as to obtain the word vectors;
and the first expansion submodule is used for carrying out synonym expansion on the first vocabulary set according to the word vector, preset stop words and professional words in the target field to which the target problem belongs so as to obtain the second vocabulary set.
7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
8. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 4.
CN201811379175.2A 2018-11-19 2018-11-19 Information query method, device, storage medium and electronic equipment Active CN109710732B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811379175.2A CN109710732B (en) 2018-11-19 2018-11-19 Information query method, device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811379175.2A CN109710732B (en) 2018-11-19 2018-11-19 Information query method, device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN109710732A CN109710732A (en) 2019-05-03
CN109710732B true CN109710732B (en) 2021-03-05

Family

ID=66254959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811379175.2A Active CN109710732B (en) 2018-11-19 2018-11-19 Information query method, device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN109710732B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210952B (en) * 2019-06-13 2022-06-10 讯飞智元信息科技有限公司 Bidding evaluation method and device
CN110750632B (en) * 2019-10-21 2022-09-09 闽江学院 Improved Chinese ALICE intelligent question-answering method and system
CN111488735B (en) * 2020-04-09 2023-10-27 中国银行股份有限公司 Test corpus generation method and device and electronic equipment
CN111858851A (en) * 2020-06-30 2020-10-30 银盛支付服务股份有限公司 Intelligent customer service knowledge base multidimensional training method and device
CN113032677A (en) * 2021-04-01 2021-06-25 李旻达 Query information processing method and device based on artificial intelligence
CN113780561B (en) * 2021-09-07 2024-07-30 国网北京市电力公司 Construction method and device of power grid regulation operation knowledge base

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049447A (en) * 2011-10-12 2013-04-17 英业达股份有限公司 System for memorizing bilingual synonymy words in assisting mode and method thereof
CN105843897A (en) * 2016-03-23 2016-08-10 青岛海尔软件有限公司 Vertical domain-oriented intelligent question and answer system
CN105955976A (en) * 2016-04-15 2016-09-21 中国工商银行股份有限公司 Automatic answering system and method
CN106202372A (en) * 2016-07-08 2016-12-07 中国电子科技网络信息安全有限公司 A kind of method of network text information emotional semantic classification
CN107220380A (en) * 2017-06-27 2017-09-29 北京百度网讯科技有限公司 Question and answer based on artificial intelligence recommend method, device and computer equipment
CN107391614A (en) * 2017-07-04 2017-11-24 重庆智慧思特大数据有限公司 A kind of Chinese question and answer matching process based on WMD
CN107832291A (en) * 2017-10-26 2018-03-23 平安科技(深圳)有限公司 Client service method, electronic installation and the storage medium of man-machine collaboration
CN108536708A (en) * 2017-03-03 2018-09-14 腾讯科技(深圳)有限公司 A kind of automatic question answering processing method and automatically request-answering system
CN108595619A (en) * 2018-04-23 2018-09-28 海信集团有限公司 A kind of answering method and equipment
WO2018195875A1 (en) * 2017-04-27 2018-11-01 Microsoft Technology Licensing, Llc Generating question-answer pairs for automated chatting

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10528612B2 (en) * 2017-02-21 2020-01-07 International Business Machines Corporation Processing request documents

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049447A (en) * 2011-10-12 2013-04-17 英业达股份有限公司 System for memorizing bilingual synonymy words in assisting mode and method thereof
CN105843897A (en) * 2016-03-23 2016-08-10 青岛海尔软件有限公司 Vertical domain-oriented intelligent question and answer system
CN105955976A (en) * 2016-04-15 2016-09-21 中国工商银行股份有限公司 Automatic answering system and method
CN106202372A (en) * 2016-07-08 2016-12-07 中国电子科技网络信息安全有限公司 A kind of method of network text information emotional semantic classification
CN108536708A (en) * 2017-03-03 2018-09-14 腾讯科技(深圳)有限公司 A kind of automatic question answering processing method and automatically request-answering system
WO2018195875A1 (en) * 2017-04-27 2018-11-01 Microsoft Technology Licensing, Llc Generating question-answer pairs for automated chatting
CN107220380A (en) * 2017-06-27 2017-09-29 北京百度网讯科技有限公司 Question and answer based on artificial intelligence recommend method, device and computer equipment
CN107391614A (en) * 2017-07-04 2017-11-24 重庆智慧思特大数据有限公司 A kind of Chinese question and answer matching process based on WMD
CN107832291A (en) * 2017-10-26 2018-03-23 平安科技(深圳)有限公司 Client service method, electronic installation and the storage medium of man-machine collaboration
CN108595619A (en) * 2018-04-23 2018-09-28 海信集团有限公司 A kind of answering method and equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
IT领域问答系统的研究与实现;李家南;《中国优秀硕士学位论文全文数据库信息科技辑》;20160415(第4期);第二章-第四章 *
受限领域内基于中文问句语义相关度计算的智能问答系统研究;王新磊;《中国优秀硕士学位论文全文数据库信息科技辑》;20140815(第8期);全文 *
面向远程教育的限定领域内自动问答系统设计;刘旭东等;《泰山学院学报》;20171130(第6期);全文 *

Also Published As

Publication number Publication date
CN109710732A (en) 2019-05-03

Similar Documents

Publication Publication Date Title
CN109710732B (en) Information query method, device, storage medium and electronic equipment
CN106649818B (en) Application search intention identification method and device, application search method and server
WO2021159632A1 (en) Intelligent questioning and answering method and apparatus, computer device, and computer storage medium
CN107608532B (en) Association input method and device and electronic equipment
US20180341871A1 (en) Utilizing deep learning with an information retrieval mechanism to provide question answering in restricted domains
US8019748B1 (en) Web search refinement
CN111241237B (en) Intelligent question-answer data processing method and device based on operation and maintenance service
WO2020233380A1 (en) Missing semantic completion method and apparatus
CN108345612B (en) Problem processing method and device for problem processing
US9805120B2 (en) Query selection and results merging
CN110162768B (en) Method and device for acquiring entity relationship, computer readable medium and electronic equipment
CN110147494B (en) Information searching method and device, storage medium and electronic equipment
US11954097B2 (en) Intelligent knowledge-learning and question-answering
US11379527B2 (en) Sibling search queries
CN114880447A (en) Information retrieval method, device, equipment and storage medium
CN116882372A (en) Text generation method, device, electronic equipment and storage medium
CN111813993A (en) Video content expanding method and device, terminal equipment and storage medium
CN116685966A (en) Adjusting query generation patterns
De Boni et al. An analysis of clarification dialogue for question answering
CN111444321B (en) Question answering method, device, electronic equipment and storage medium
CN113505196B (en) Text retrieval method and device based on parts of speech, electronic equipment and storage medium
CN113822038B (en) Abstract generation method and related device
CN113342944B (en) Corpus generalization method, apparatus, device and storage medium
CN108268443B (en) Method and device for determining topic point transfer and acquiring reply text
KR101955920B1 (en) Search method and apparatus using property language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant