CN109710732A - Information query method, device, storage medium and electronic equipment - Google Patents

Information query method, device, storage medium and electronic equipment Download PDF

Info

Publication number
CN109710732A
CN109710732A CN201811379175.2A CN201811379175A CN109710732A CN 109710732 A CN109710732 A CN 109710732A CN 201811379175 A CN201811379175 A CN 201811379175A CN 109710732 A CN109710732 A CN 109710732A
Authority
CN
China
Prior art keywords
record
preset
knowledge base
lexical set
target problem
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811379175.2A
Other languages
Chinese (zh)
Other versions
CN109710732B (en
Inventor
刘嘉伟
董超
崔朝辉
赵立军
张霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201811379175.2A priority Critical patent/CN109710732B/en
Publication of CN109710732A publication Critical patent/CN109710732A/en
Application granted granted Critical
Publication of CN109710732B publication Critical patent/CN109710732B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

This disclosure relates to a kind of information query method, device, storage medium and electronic equipment, it is related to information technology field, this method comprises: by being segmented to the target problem got, obtain the first lexical set, it include the word segmentation result of target problem in first lexical set, synonym expansion is carried out to the first lexical set according to preset term vector, to obtain the second lexical set, term vector is to be trained acquisition to preset corpus using preset model, according to the second lexical set and preset knowledge base, it scores according to the matching that preset algorithm obtains each record in target problem and knowledge base, it include at least one record in knowledge base, each record includes problem and answer corresponding with problem, according to the matching scoring of each record, the determining and matched answer of target problem.Existing knowledge base can be efficiently used, realizes the information query service of semantic level, improves the accuracy and coverage of information inquiry.

Description

Information query method, device, storage medium and electronic equipment
Technical field
This disclosure relates to information technology field, and in particular, to a kind of information query method, device, storage medium and electricity Sub- equipment.
Background technique
With the fast development of the information technologies such as internet, cloud computing and language processing techniques, artificial intelligence is more next More affects daily life, wherein what intelligent Answer System can propose user using existing knowledge base Problem is inquired, and provides corresponding answer for user.Usually existing knowledge library is carried out using search engine in the prior art Single keyword retrieval, using with the record of Keywords matching as answer feedback to user, the accuracy of inquiry is not high, and In many technical fields, a large amount of non-structured historical datas are had accumulated in existing knowledge library, can not directly be retrieved, therefore The coverage rate of inquiry is low.
Summary of the invention
Purpose of this disclosure is to provide a kind of information query method, device, storage medium and electronic equipments, existing to solve There is the problem that information query accuracy and coverage rate are low in technology.
To achieve the goals above, according to the first aspect of the embodiments of the present disclosure, a kind of information query method is provided, it is described Method includes:
By being segmented to the target problem got, the first lexical set is obtained, is wrapped in first lexical set Word segmentation result containing the target problem;
Synonym expansion is carried out to first lexical set according to preset term vector, to obtain the second lexical set, The term vector is to be trained acquisition to preset corpus using preset model;
According to second lexical set and preset knowledge base, the target problem and institute are obtained according to preset algorithm The matching scoring that each records in knowledge base is stated, includes at least one record in the knowledge base, each record packet Include problem and answer corresponding with described problem;
According to the matching scoring of each record, the determining and matched answer of the target problem.
Optionally, described that synonym expansion is carried out to first lexical set according to preset term vector, to obtain the Two lexical sets, comprising:
It generates model using preset term vector to be trained the corpus, to obtain the term vector;
According to the professional word of target domain belonging to the term vector, preset stop words and the target problem, to institute It states the first lexical set and carries out synonym expansion, to obtain second lexical set.
Optionally, described according to second lexical set and preset knowledge base, according to described in the acquisition of preset algorithm The matching of each record is scored in target problem and the knowledge base, comprising:
It generates model using preset term vector to be trained the corpus, to obtain the term vector;
According to the professional word of target domain belonging to the term vector, preset stop words and the target problem, to institute The each record stated in knowledge base carries out synonym expansion;
Sentence of same meaning expansion is carried out to each record in the knowledge base using neural machine translation NMT algorithm;
According to second lexical set and the knowledge base, according to preset algorithm obtain the target problem with it is described The matching scoring that each records in knowledge base.
Optionally, described recorded using neural machine translation NMT algorithm to each in the knowledge base is carried out The sentence of same meaning expands, comprising:
It will be translated as using the NMT algorithm in the knowledge base with the first record of first language expression with the second language Say the intermediate record of expression;
The intermediate record is translated as with the synonymous record of first language expression using the NMT algorithm;
The synonymous record is stored in the knowledge base, described first is recorded as any bar record in the knowledge base.
Optionally, the matching according to each record is scored, the determining and matched answer of the target problem, Include:
The matching scoring that each records is arranged according to the sequence of descending from high to low, to obtain marking and queuing;
Select the answer in preceding n item record top ranked in the marking and queuing as matched with the target problem Answer;Alternatively,
When in the marking and queuing sequence first matching scoring with sequence second match scoring ratio greater than preset When threshold value, using the matching of the sequence first score the answer in corresponding record as with the target problem is matched answers Case;
When it is described sequence first matching scoring with it is described sequence second match scoring ratio less than or equal to preset When threshold value, select the answer in preceding m item record top ranked in the marking and queuing as matched with the target problem Answer.
Optionally, described according to second lexical set and preset knowledge base, according to described in the acquisition of preset algorithm The matching of each record is scored in target problem and the knowledge base, comprising:
According to second lexical set and preset knowledge base, using the first calculation formula calculate the target problem with The matching scoring that each records in the knowledge base;
First calculation formula includes:
Wherein, djFor the j-th strip record in the knowledge base, ScorejIt is expressed as djMatching scoring, s be described second Lexical set and djWord segmentation result in identical vocabulary number, Q be first lexical set in vocabulary number, tiFor I-th of vocabulary in second lexical set, num (dj) it is djWord segmentation result in vocabulary number, num (ti) it is tiIn dj The number of middle appearance, D are the item number recorded in the knowledge base, NiFor in the knowledge base include tiRecord item number.
According to the second aspect of an embodiment of the present disclosure, a kind of information query device is provided, described device includes:
Word segmentation module, for by segmenting to the target problem got, obtaining the first lexical set, described first It include the word segmentation result of the target problem in lexical set;
Enlargement module, for carrying out synonym expansion to first lexical set according to preset term vector, to obtain Second lexical set, the term vector are to be trained acquisition to preset corpus using preset model;
Grading module, for obtaining institute according to preset algorithm according to second lexical set and preset knowledge base The matching for stating each record in target problem and the knowledge base is scored, and includes at least one record, institute in the knowledge base Stating each record includes problem and answer corresponding with described problem;
Determining module, the matching scoring for being recorded according to each determine and the target problem is matched answers Case.
Optionally, the enlargement module includes:
First training submodule, is trained the corpus for generating model using preset term vector, to obtain Take the term vector;
First expands submodule, for the mesh according to belonging to the term vector, preset stop words and the target problem The professional word in mark field carries out synonym expansion to first lexical set, to obtain second lexical set.
Optionally, institute's scoring module includes:
Second training submodule, is trained the corpus for generating model using preset term vector, to obtain Take the term vector;
Second expands submodule, for the mesh according to belonging to the term vector, preset stop words and the target problem The professional word in mark field carries out synonym expansion to each record in the knowledge base;
The sentence of same meaning expands submodule, for utilizing neural machine translation NMT algorithm to described each in the knowledge base Item record carries out sentence of same meaning expansion;
Score submodule, for obtaining institute according to preset algorithm according to second lexical set and the knowledge base The matching for stating each record in target problem and the knowledge base is scored.
Optionally, the sentence of same meaning expands submodule and is used for:
It will be translated as using the NMT algorithm in the knowledge base with the first record of first language expression with the second language Say the intermediate record of expression;
The intermediate record is translated as with the synonymous record of first language expression using the NMT algorithm;
The synonymous record is stored in the knowledge base, described first is recorded as any bar record in the knowledge base.
Optionally, the determining module includes:
Sorting sub-module, the matching scoring for recording each are arranged according to the sequence of descending from high to low, To obtain marking and queuing;
Determine submodule, for select the answer in the marking and queuing in top ranked preceding n item record as with institute State the matched answer of target problem;Alternatively,
The determining submodule, the matching for matching scoring and sequence second when sequence first in the marking and queuing When the ratio of scoring is greater than preset threshold, using the matching of the sequence first score the answer in corresponding record as with it is described The matched answer of target problem;
The determining submodule is also used to score when the matching scoring and the matching of the sequence second of the sequence first Ratio when being less than or equal to preset threshold, select answer in the marking and queuing in top ranked preceding m item record as With the matched answer of the target problem.
Optionally, institute's scoring module is used for:
According to second lexical set and preset knowledge base, using the first calculation formula calculate the target problem with The matching scoring that each records in the knowledge base;
First calculation formula includes:
Wherein, djFor the j-th strip record in the knowledge base, ScorejIt is expressed as djMatching scoring, s be described second Lexical set and djWord segmentation result in identical vocabulary number, Q be first lexical set in vocabulary number, tiFor I-th of vocabulary in second lexical set, num (dj) it is djWord segmentation result in vocabulary number, num (ti) it is tiIn dj The number of middle appearance, D are the item number recorded in the knowledge base, NiFor in the knowledge base include tiRecord item number.
According to the third aspect of an embodiment of the present disclosure, a kind of computer readable storage medium is provided, calculating is stored thereon with The step of machine program, the information query method that realization first aspect provides when which is executed by processor.
According to a fourth aspect of embodiments of the present disclosure, a kind of electronic equipment is provided, comprising:
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize the information of first aspect offer The step of querying method.
Through the above technical solutions, the disclosure first segments the target problem got, to obtain containing mesh First lexical set of mark problem word segmentation result carries out synonym expansion to the first lexical set further according to preset term vector, To obtain the second lexical set, wherein term vector is to be trained acquisition to preset corpus using preset model, it Afterwards according to the second lexical set and preset knowledge base, each in target problem and knowledge base is determined according to preset algorithm The matching of record is scored, and all contains problem in each record and problem corresponds to answer, finally according to each in knowledge base The matching of record is scored, and the determining and matched answer of target problem can efficiently use existing knowledge base, realize semantic level Information query service improves the accuracy and coverage of information inquiry.
Other feature and advantage of the disclosure will the following detailed description will be given in the detailed implementation section.
Detailed description of the invention
Attached drawing is and to constitute part of specification for providing further understanding of the disclosure, with following tool Body embodiment is used to explain the disclosure together, but does not constitute the limitation to the disclosure.In the accompanying drawings:
Fig. 1 is a kind of flow chart of information query method shown according to an exemplary embodiment;
Fig. 2 is the flow chart of another information query method shown according to an exemplary embodiment;
Fig. 3 is the flow chart of another information query method shown according to an exemplary embodiment;
Fig. 4 is the flow chart of another information query method shown according to an exemplary embodiment;
Fig. 5 is a kind of block diagram of information query device shown according to an exemplary embodiment;
Fig. 6 is the block diagram of another information query device shown according to an exemplary embodiment;
Fig. 7 is the block diagram of another information query device shown according to an exemplary embodiment;
Fig. 8 is the block diagram of another information query device shown according to an exemplary embodiment;
Fig. 9 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
Before introducing information query method, device, storage medium and the electronic equipment of disclosure offer, first to this public affairs It opens application scenarios involved in each embodiment to be introduced, which can be the intelligent answer system of human-computer interaction System, user can input the target problem for needing to inquire by the intelligent Answer System, to obtain corresponding answer.Intelligent answer System can be any terminal, for example, can be smart phone, tablet computer, smart television, smartwatch, PDA (English: Personal Digital Assistant, Chinese: personal digital assistant), the mobile terminals such as portable computer, be also possible to platform The fixed terminals such as formula computer.
Fig. 1 is a kind of flow chart of information query method shown according to an exemplary embodiment, as shown in Figure 1, the party Method the following steps are included:
Step 101, by segmenting to the target problem got, the first lexical set, the first lexical set are obtained In include target problem word segmentation result.
For example, the target problem for obtaining user's input first, divides target problem according to preset participle method The word segmentation result of target problem is stored in the first lexical set by word.Wherein, participle method can be maximum matching algorithm (English: Maximum Matching, referred to as: MM), semantic-based participle method, the participle method based on statistics etc., for example, can according to from The sequence of left-to-right, dictionary according to the pre-stored data identify the vocabulary for including in target problem, recycle and disambiguate rule removal The vocabulary for not meeting speech habits obtains the word segmentation result of target problem.With target problem are as follows: " where get social insurance For card ", target problem is segmented, the first lexical set is obtained are as follows: { where, get, social insurance card }.
Step 102, synonym expansion is carried out to the first lexical set according to preset term vector, to obtain the second word finder It closes, term vector is to be trained acquisition to preset corpus using preset model.
It is exemplary, preset corpus is trained first with preset model to obtain term vector (English: Word Embedding), term vector has the good feature of semanteme, can effectively express semantic and grammer feature, wherein default Model can be term vector generate model (English: Word to Vector, referred to as: Word2vec), corpus can be greatly Existing semantic data is measured, information acquisition instrument (such as: web crawlers tool) is also can use and obtains each skill on internet The semantic data, such as news, microblogging, forum in art field etc..Synonym expansion is carried out to the first lexical set further according to term vector It fills, using the first lexical set expanded by synonym as the second lexical set.With " the social insurance in the first lexical set For this vocabulary of card ", expand by synonym, it can be by the synonym " social security card " of " social insurance card ", " social security The vocabulary such as card " are put into the second lexical set.It, can be with when carrying out synonym expansion to the first lexical set using term vector The vocabulary in the second lexical set is reaffirmed by administrative staff, to improve the accuracy of synonym expansion.
Step 103, it according to the second lexical set and preset knowledge base, obtains target problem according to preset algorithm and knows The matching scoring for knowing each record in library, includes at least one record in knowledge base, each record include problem and with ask Inscribe corresponding answer.
For example, it may include a plurality of record in preset knowledge base, wherein every record all contains problem and asks Inscribe corresponding answer, it can be understood as the problem of (problem-answer) pair, wherein the problems in every record is unduplicated.Root According to the vocabulary and knowledge base in the second lexical set, successively obtains the matching that target problem is recorded with each in knowledge base and comment Point, matching scoring is able to reflect the matching degree that each records the second lexical set corresponding with target problem.For example, can be with The matching degree for calculating separately each vocabulary and each record in knowledge base in the second lexical set, according still further to different weights pair The matching degree of each vocabulary is summed, and is scored with obtaining the matching of each record in target problem and knowledge base, wherein The corresponding weight of each vocabulary can be determined according to the number that each vocabulary occurs in a record, for example, a vocabulary The number occurred in a record is more, indicates that the matching degree of the vocabulary and the record is higher, corresponding weight is bigger.
Step 104, it is scored according to the matching of each record, the determining and matched answer of target problem.
It is exemplary, after determining the matching scoring of target problem and each record, according to the size of matching scoring, determine With the matched record of target problem, and using the answer in the record as with the matched answer of target problem.For example, it may be determined that Matching is scored the record of highest preset quantity (such as: 3), using the answer in the record of preset quantity as with target problem User is recommended in matched answer, for selection by the user the most desirable answer, also can determine the highest note of matching scoring directly Answer in record is as the answer with the matched answer of target problem, in the record that meet preset condition that matching can also be scored Make be the matched answer of target problem.
In conclusion the disclosure first segments the target problem got, to obtain containing target problem point First lexical set of word result carries out synonym expansion to the first lexical set further according to preset term vector, to obtain the Two lexical sets, wherein term vector is using preset model to be trained acquisition to preset corpus, later according to the Two lexical sets and preset knowledge base determine of each record in target problem and knowledge base according to preset algorithm With scoring, problem is all contained in each record and problem corresponds to answer, finally recorded according to each in knowledge base With scoring, the determining and matched answer of target problem can efficiently use existing knowledge base, realize the information inquiry of semantic level Service improves the accuracy and coverage of information inquiry.
Fig. 2 is the flow chart of another information query method shown according to an exemplary embodiment, as shown in Fig. 2, step Rapid 102 can be realized by following steps:
Step 1021, it generates model using preset term vector to be trained corpus, to obtain term vector.
Step 1022, the professional word of the target domain according to belonging to term vector, preset stop words and target problem, to One lexical set carries out synonym expansion, to obtain the second lexical set.
For example, it generates model using preset term vector to be trained corpus, wherein term vector generates model For neural network model, the feature of the semanteme and grammer in natural language, instruction can be extracted according to the semantic data in corpus Get term vector.Further according to the professional word of target domain belonging to term vector, preset stop words and target problem, to first Lexical set carries out synonym expansion, using the first lexical set expanded by synonym as the second lexical set.Wherein, stop Word (English: Stop Words) is the word that can filter out when handling natural language data (such as text) in retrieving Or part preposition, conjunction, adverbial word etc. in vocabulary, such as Chinese, such as may include: " ", "and", " arriving ", " ", " in " Deng.The professional word of target domain belonging to target problem, can first determine target domain belonging to target problem (such as: it is public Service class, literature and history class, social sciences, amusement class etc.), for example, user can pre-select target problem when inputting target problem Affiliated target domain can also carry out semantic analysis to target problem, determine target domain, later further according to target domain, Obtain corresponding professional word.With target problem are as follows: for " where getting social insurance card ", target problem is segmented, Obtain the first lexical set are as follows: { where, get, social insurance card } carries out synonym expansion to the first lexical set, can obtain It is { where, at which, where, get, lead, obtain, social insurance card, social security card, social security card } to the second lexical set.It needs It is noted that when carrying out synonym expansion to the first lexical set, it can also be by administrative staff in the second lexical set Vocabulary reaffirmed, with improve synonym expansion accuracy.
Fig. 3 is the flow chart of another information query method shown according to an exemplary embodiment, as shown in figure 3, step Rapid 103 may include:
Step 1031, it generates model using preset term vector to be trained corpus, to obtain term vector.
Step 1032, the professional word of the target domain according to belonging to term vector, preset stop words and target problem, to knowing The each record known in library carries out synonym expansion.
For example, model is generated using preset term vector to be trained corpus, obtain term vector.Further according to word The professional word of target domain belonging to vector, preset stop words and target problem records each in knowledge base and carries out Synonym expands.For example, a certain item is recorded as " how social security card is got " in knowledge base, this record is segmented, is obtained Lexical set is { social security card, how, get }, carries out synonym expansion to the lexical set, this can be recorded corresponding Word segmentation result expand are as follows: social insurance card, social security card, social security card, how, how, get, lead, obtaining.
Step 1033, sentence of same meaning expansion is carried out to each record in knowledge base using neural machine translation NMT algorithm.
It is exemplary, it is calculated using NMT (English: Natural Machine Translation, Chinese: neural machine translation) Method carries out sentence of same meaning expansion to each record in knowledge base, and the sentence of same meaning that each can be recorded is recorded as this Synonymous record deposit knowledge base in.English is translated into any bar Chinese record in knowledge base for example, can use NMT algorithm Text, then the translation result of English is translated into Chinese, the sentence of same meaning for Chinese result will be obtained being recorded as the Chinese, by twice Translation process has expanded the vocabulary and clause of record.It should be noted that synonymous to each record progress in knowledge base When sentence expands, expansion can also be reaffirmed by administrative staff, to improve the accuracy of sentence of same meaning expansion.
Step 1034, according to the second lexical set and knowledge base, target problem and knowledge base are obtained according to preset algorithm The matching scoring of middle each record.
It is exemplary, expand the knowledge base expanded with the sentence of same meaning according to the vocabulary in the second lexical set and by synonym, The matching for successively obtaining each record in target problem and knowledge base is scored.
Optionally, step 1033 can be realized by following steps:
1) it will be translated as expressing with second language with the first record of first language expression in knowledge base using NMT algorithm Intermediate record.
2) intermediate record is translated as to the synonymous record expressed with first language using NMT algorithm.
3) synonymous record is stored in knowledge base, first is recorded as any bar record in knowledge base.
Fig. 4 is the flow chart of another information query method shown according to an exemplary embodiment, as shown in figure 4, step Rapid 104 include:
Step 1041, the matching scoring of each record is arranged according to the sequence of descending from high to low, to be scored Sequence.
Step 1042, the answer in preceding n item record top ranked in marking and queuing is selected to match as with target problem Answer.
Alternatively, step 1043, when matching scoring and the sequence second of sequence first in marking and queuing match the ratio to score When value is greater than preset threshold, using sort first the answer scored in corresponding record of matching as with target problem is matched answers Case.
Step 1044, it is less than or equal to when the matching scoring of sequence first with the ratio for matching scoring of sequence second default When threshold value, select answer in marking and queuing in top ranked preceding m item record as with the matched answer of target problem.
For example, the matching scoring that each in knowledge base records is arranged according to the sequence of descending from high to low, with Obtain marking and queuing.According to marking and queuing, using the answer in the record for meeting preset condition as with target problem is matched answers Case.Wherein, preset condition can be divided into: answering in top ranked preceding n (such as can be 3) item record in selection marking and queuing Case as with the matched answer of target problem, user is recommended into this n answer, the most desirable answer for selection by the user.? Can answer directly in recommendation score sequence in top ranked record as the matched answer of target problem.It can also first count The ratio for matching scoring that the matching of sequence first in marking and queuing is scored with sequence second is calculated, when ratio is greater than preset threshold (indicating that the matching scoring of sequence first and matching scoring gap later are big), first matching of sorting are scored corresponding record In answer as with the matched answer of target problem, when ratio be less than or equal to preset threshold when, select marking and queuing in arrange Answer in highest preceding m (such as can be 5) the item record of name as with the matched answer of target problem.
Optionally, step 103 can be realized by following steps:
According to the second lexical set and preset knowledge base, calculated in target problem and knowledge base using the first calculation formula The matching scoring of each record.
First calculation formula includes:
Wherein, djFor the j-th strip record in knowledge base, ScorejIt is expressed as djMatching scoring, s be the second lexical set With djWord segmentation result in identical vocabulary number, Q be the first lexical set in vocabulary number, tiFor the second lexical set In i-th of vocabulary, num (dj) it is djWord segmentation result in vocabulary number, num (ti) it is tiIn djThe number of middle appearance, D are The item number recorded in knowledge base, NiFor in knowledge base include tiRecord item number.
It should be noted that the preset knowledge base (contains D item record, j-th strip is recorded as dj), it can be original Knowledge base, i.e., the problems in wherein every record do not repeat, and are also possible to expand by synonym and what the sentence of same meaning expanded knows Know library, i.e., by execution step 1031 to step 1033, synonym expansion carried out to original knowledge base and the sentence of same meaning expands, The problem of every record in knowledge base, is likely to occur repetition (such as: the sentence of same meaning) at this time.For example, being known with knowledge base to be original It include d for 20 records in the knowledge base for knowing libraryjIt is recorded for the j-th strip in this 20 records, if j-th strip is remembered Vocabulary is 3 in the word segmentation result of record, then num (dj)=3.Alternatively, the knowledge base can expand for (passing through step 1031-1034) Knowledge base after filling, it is assumed that by executing step 1032 to djSynonym expansion is carried out, and by executing 1033 pairs above-mentioned 20 Each record in item record carries out sentence of same meaning expansion, and knowledge base is extended to 30 records, djFor the jth in 30 records Item record, if djWord segmentation result in vocabulary be 5, then num (dj)=5.Since the second lexical set is the first lexical set It is obtained by expansion, it is thus possible to the big scene of s ratio Q occur, can specify that at this timeIt is 1, i.e.,For less than or equal to 1 Positive number.
With target problem are as follows: comprising 30 records in " where getting social insurance card ", knowledge base, (i.e. D=30) should D in knowledge basejFor " how social security card is got ", target problem is segmented, obtains the first lexical set are as follows: { which In, get, social insurance card, by the first lexical set carry out synonym expansion, obtain the second lexical set be where, Get, social insurance card, social security card (i.e. corresponding { t1、t2、t3、t4), to djSegmented to obtain: social security card, how, neck Take, then s is 2 (are respectively as follows: get, social security card), Q 3, t1It is correspondingIt is 0, t2It is corresponding 'sFort3It is correspondingFor 0, t4It is correspondingForSo ScorejFor
In conclusion the disclosure first segments the target problem got, to obtain containing target problem point First lexical set of word result carries out synonym expansion to the first lexical set further according to preset term vector, to obtain the Two lexical sets, wherein term vector is using preset model to be trained acquisition to preset corpus, later according to the Two lexical sets and preset knowledge base determine of each record in target problem and knowledge base according to preset algorithm With scoring, problem is all contained in each record and problem corresponds to answer, finally recorded according to each in knowledge base With scoring, the determining and matched answer of target problem can efficiently use existing knowledge base, realize the information inquiry of semantic level Service improves the accuracy and coverage of information inquiry.
Fig. 5 is a kind of block diagram of information query device shown according to an exemplary embodiment, as shown in figure 5, the device 200 include:
Word segmentation module 201, for by segmenting to the target problem got, obtaining the first lexical set, first It include the word segmentation result of target problem in lexical set.
Enlargement module 202, for carrying out synonym expansion to the first lexical set according to preset term vector, to obtain the Two lexical sets, term vector are to be trained acquisition to preset corpus using preset model.
Grading module 203, for obtaining target according to preset algorithm according to the second lexical set and preset knowledge base The matching of each record is scored in problem and knowledge base, includes at least one record in knowledge base, and each record includes asking Topic and answer corresponding with problem.
Determining module 204, the matching scoring for being recorded according to each, the determining and matched answer of target problem.
Fig. 6 is the block diagram of another information query device shown according to an exemplary embodiment, as shown in fig. 6, expanding Module 202 includes:
First training submodule 2021, is trained corpus for generating model using preset term vector, to obtain Take term vector.
First expands submodule 2022, leads for the target according to belonging to term vector, preset stop words and target problem The professional word in domain carries out synonym expansion to the first lexical set, to obtain the second lexical set.
Fig. 7 is the block diagram of another information query device shown according to an exemplary embodiment, as shown in fig. 7, scoring Module 203 includes:
Second training submodule 2031, is trained corpus for generating model using preset term vector, to obtain Take term vector.
Second expands submodule 2032, leads for the target according to belonging to term vector, preset stop words and target problem The professional word in domain carries out synonym expansion to each record in knowledge base.
The sentence of same meaning expands submodule 2033, for being remembered using neural machine translation NMT algorithm to each in knowledge base Record carries out sentence of same meaning expansion.
Score submodule 2034, for obtaining target according to preset algorithm and asking according to the second lexical set and knowledge base Topic and the matching of each record in knowledge base are scored.
Optionally, the sentence of same meaning expands submodule 2033 and can be realized by following steps:
1) it will be translated as expressing with second language with the first record of first language expression in knowledge base using NMT algorithm Intermediate record.
2) intermediate record is translated as to the synonymous record expressed with first language using NMT algorithm.
3) synonymous record is stored in knowledge base, first is recorded as any bar record in knowledge base.
Fig. 8 is the block diagram of another information query device shown according to an exemplary embodiment, as shown in figure 8, determining Module 204 includes:
Sorting sub-module 2041, the matching scoring for recording each are arranged according to the sequence of descending from high to low, To obtain marking and queuing.
Determine submodule 2042, for select the answer in marking and queuing in top ranked preceding n item record as with mesh The matched answer of mark problem.Alternatively,
It determines submodule 2042, scores for the matching scoring and the matching of sequence second when sequence first in marking and queuing Ratio when being greater than preset threshold, the first matching answer in corresponding record of scoring of sorting matches as with target problem Answer.
It determines submodule 2042, is also used to when the matching scoring of sequence first and the ratio for matching scoring of sequence second are small In or when being equal to preset threshold, select top ranked preceding m item in marking and queuing record in answer as with target problem The answer matched.
Optionally, grading module 203 can be realized by following steps:
According to the second lexical set and preset knowledge base, calculated in target problem and knowledge base using the first calculation formula The matching scoring of each record.
First calculation formula includes:
Wherein, djFor the j-th strip record in knowledge base, ScorejIt is expressed as djMatching scoring, s be the second lexical set With djWord segmentation result in identical vocabulary number, Q be the first lexical set in vocabulary number, tiFor the second lexical set In i-th of vocabulary, num (dj) it is djWord segmentation result in vocabulary number, num (ti) it is tiIn djThe number of middle appearance, D are The item number recorded in knowledge base, NiFor in knowledge base include tiRecord item number.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.
In conclusion the disclosure first segments the target problem got, to obtain containing target problem point First lexical set of word result carries out synonym expansion to the first lexical set further according to preset term vector, to obtain the Two lexical sets, wherein term vector is using preset model to be trained acquisition to preset corpus, later according to the Two lexical sets and preset knowledge base determine of each record in target problem and knowledge base according to preset algorithm With scoring, problem is all contained in each record and problem corresponds to answer, finally recorded according to each in knowledge base With scoring, the determining and matched answer of target problem can efficiently use existing knowledge base, realize the information inquiry of semantic level Service improves the accuracy and coverage of information inquiry.
Fig. 9 is the block diagram of a kind of electronic equipment 300 shown according to an exemplary embodiment.As shown in figure 9, the electronics is set Standby 300 may include: processor 301, memory 302.The electronic equipment 300 can also include multimedia component 303, input/ Export one or more of (I/O) interface 304 and communication component 305.
Wherein, processor 301 is used to control the integrated operation of the electronic equipment 300, to complete above-mentioned information issuer All or part of the steps in method.Memory 302 is for storing various types of data to support the behaviour in the electronic equipment 300 To make, these data for example may include the instruction of any application or method for operating on the electronic equipment 300, with And the relevant data of application program, such as contact data, the message of transmitting-receiving, picture, audio, video etc..The memory 302 It can be realized by any kind of volatibility or non-volatile memory device or their combination, such as static random-access is deposited Reservoir (Static Random Access Memory, abbreviation SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), erasable programmable Read-only memory (Erasable Programmable Read-Only Memory, abbreviation EPROM), programmable read only memory (Programmable Read-Only Memory, abbreviation PROM), and read-only memory (Read-Only Memory, referred to as ROM), magnetic memory, flash memory, disk or CD.Multimedia component 303 may include screen and audio component.Wherein Screen for example can be touch screen, and audio component is used for output and/or input audio signal.For example, audio component may include One microphone, microphone is for receiving external audio signal.The received audio signal can be further stored in storage Device 302 is sent by communication component 305.Audio component further includes at least one loudspeaker, is used for output audio signal.I/O Interface 304 provides interface between processor 301 and other interface modules, other above-mentioned interface modules can be keyboard, mouse, Button etc..These buttons can be virtual push button or entity button.Communication component 305 is for the electronic equipment 300 and other Wired or wireless communication is carried out between equipment.Wireless communication, such as Wi-Fi, bluetooth, near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of combination, therefore corresponding communication Component 305 may include: Wi-Fi module, bluetooth module, NFC module.
In one exemplary embodiment, electronic equipment 300 can be by one or more application specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device, Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array (Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member Part is realized, for executing above-mentioned information query method.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction is additionally provided, it should The step of above-mentioned information query method is realized when program instruction is executed by processor.For example, the computer readable storage medium It can be the above-mentioned memory 302 including program instruction, above procedure instruction can be executed by the processor 301 of electronic equipment 300 To complete above-mentioned information query method.
In conclusion the disclosure first segments the target problem got, to obtain containing target problem point First lexical set of word result carries out synonym expansion to the first lexical set further according to preset term vector, to obtain the Two lexical sets, wherein term vector is using preset model to be trained acquisition to preset corpus, later according to the Two lexical sets and preset knowledge base determine of each record in target problem and knowledge base according to preset algorithm With scoring, problem is all contained in each record and problem corresponds to answer, finally recorded according to each in knowledge base With scoring, the determining and matched answer of target problem can efficiently use existing knowledge base, realize the information inquiry of semantic level Service improves the accuracy and coverage of information inquiry.
The preferred embodiment of the disclosure is described in detail in conjunction with attached drawing above, still, the disclosure is not limited to above-mentioned reality The detail in mode is applied, in the range of the technology design of the disclosure, those skilled in the art are considering specification and practice After the disclosure, it is readily apparent that other embodiments of the disclosure, belongs to the protection scope of the disclosure.
It is further to note that specific technical features described in the above specific embodiments, in not lance In the case where shield, it can be combined in any appropriate way.Simultaneously between a variety of different embodiments of the disclosure Any combination can also be carried out, as long as it, without prejudice to the thought of the disclosure, equally should be considered as disclosure disclosure of that. The disclosure is not limited to the precision architecture being described above out, and the scope of the present disclosure is only limited by the attached claims System.

Claims (10)

1. a kind of information query method, which is characterized in that the described method includes:
By being segmented to the target problem got, the first lexical set is obtained, includes institute in first lexical set State the word segmentation result of target problem;
Synonym expansion is carried out to first lexical set according to preset term vector, it is described to obtain the second lexical set Term vector is to be trained acquisition to preset corpus using preset model;
According to second lexical set and preset knowledge base, the target problem is obtained according to preset algorithm and is known with described Know the matching scoring of each record in library, includes at least one record in the knowledge base, each record includes asking Topic and answer corresponding with described problem;
According to the matching scoring of each record, the determining and matched answer of the target problem.
2. the method according to claim 1, wherein it is described according to preset term vector to first word finder It closes and carries out synonym expansion, to obtain the second lexical set, comprising:
It generates model using preset term vector to be trained the corpus, to obtain the term vector;
According to the professional word of target domain belonging to the term vector, preset stop words and the target problem, to described One lexical set carries out synonym expansion, to obtain second lexical set.
3. the method according to claim 1, wherein described according to second lexical set and preset knowledge It scores according to the matching that preset algorithm obtains each record in the target problem and the knowledge base in library, comprising:
It generates model using preset term vector to be trained the corpus, to obtain the term vector;
According to the professional word of target domain belonging to the term vector, preset stop words and the target problem, know described The each record known in library carries out synonym expansion;
Sentence of same meaning expansion is carried out to each record in the knowledge base using neural machine translation NMT algorithm;
According to second lexical set and the knowledge base, the target problem and the knowledge are obtained according to preset algorithm The matching scoring that each records in library.
4. according to the method described in claim 3, it is characterized in that, described known using neural machine translation NMT algorithm described The each record known in library carries out sentence of same meaning expansion, comprising:
It will be translated as using the NMT algorithm in the knowledge base with the first record of first language expression with second language table The intermediate record reached;
The intermediate record is translated as with the synonymous record of first language expression using the NMT algorithm;
The synonymous record is stored in the knowledge base, described first is recorded as any bar record in the knowledge base.
5. the method according to claim 1, wherein the matching according to each record is scored, really The fixed and matched answer of the target problem, comprising:
The matching scoring that each records is arranged according to the sequence of descending from high to low, to obtain marking and queuing;
Select answer in the marking and queuing in top ranked preceding n item record as with the target problem is matched answers Case;Alternatively,
When the matching scoring of sequence first in the marking and queuing is greater than preset threshold with the ratio for matching scoring of sequence second When, using the matching of the sequence first score the answer in corresponding record as with the matched answer of the target problem;
When the matching scoring of the sequence first is less than or equal to preset threshold with the ratio for matching scoring of the sequence second When, select answer in the marking and queuing in top ranked preceding m item record as with the matched answer of the target problem.
6. method according to any one of claims 1-5, which is characterized in that it is described according to second lexical set with Preset knowledge base obtains the matching that the target problem is recorded with each in the knowledge base according to preset algorithm and comments Point, comprising:
According to second lexical set and preset knowledge base, using the first calculation formula calculate the target problem with it is described The matching scoring that each records in knowledge base;
First calculation formula includes:
Wherein, djFor the j-th strip record in the knowledge base, ScorejIt is expressed as djMatching scoring, s be second vocabulary Set and djWord segmentation result in identical vocabulary number, Q be first lexical set in vocabulary number, tiIt is described I-th of vocabulary in second lexical set, num (dj) it is djWord segmentation result in vocabulary number, num (ti) it is tiIn djIn go out Existing number, D are the item number recorded in the knowledge base, NiFor in the knowledge base include tiRecord item number.
7. a kind of information query device, which is characterized in that described device includes:
Word segmentation module, for obtaining the first lexical set, first vocabulary by segmenting to the target problem got It include the word segmentation result of the target problem in set;
Enlargement module, for carrying out synonym expansion to first lexical set according to preset term vector, to obtain second Lexical set, the term vector are to be trained acquisition to preset corpus using preset model;
Grading module, for obtaining the mesh according to preset algorithm according to second lexical set and preset knowledge base The matching of each record is scored in mark problem and the knowledge base, includes at least one record in the knowledge base, described every One record includes problem and answer corresponding with described problem;
Determining module, the matching scoring for being recorded according to each, the determining and matched answer of the target problem.
8. device according to claim 7, which is characterized in that the enlargement module includes:
First training submodule, is trained the corpus for generating model using preset term vector, to obtain Predicate vector;
First expands submodule, leads for the target according to belonging to the term vector, preset stop words and the target problem The professional word in domain carries out synonym expansion to first lexical set, to obtain second lexical set.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claim 1-6 the method is realized when row.
10. a kind of electronic equipment characterized by comprising
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize described in any one of claim 1-6 The step of method.
CN201811379175.2A 2018-11-19 2018-11-19 Information query method, device, storage medium and electronic equipment Active CN109710732B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811379175.2A CN109710732B (en) 2018-11-19 2018-11-19 Information query method, device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811379175.2A CN109710732B (en) 2018-11-19 2018-11-19 Information query method, device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN109710732A true CN109710732A (en) 2019-05-03
CN109710732B CN109710732B (en) 2021-03-05

Family

ID=66254959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811379175.2A Active CN109710732B (en) 2018-11-19 2018-11-19 Information query method, device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN109710732B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210952A (en) * 2019-06-13 2019-09-06 讯飞智元信息科技有限公司 A kind of tender Evaluation Method and device
CN110750632A (en) * 2019-10-21 2020-02-04 闽江学院 Improved Chinese ALICE intelligent question-answering method and system
CN111488735A (en) * 2020-04-09 2020-08-04 中国银行股份有限公司 Test corpus generation method and device and electronic equipment
CN111858851A (en) * 2020-06-30 2020-10-30 银盛支付服务股份有限公司 Intelligent customer service knowledge base multidimensional training method and device
CN113032677A (en) * 2021-04-01 2021-06-25 李旻达 Query information processing method and device based on artificial intelligence
CN113780561A (en) * 2021-09-07 2021-12-10 国网北京市电力公司 Method and device for constructing power grid regulation and control operation knowledge base

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049447A (en) * 2011-10-12 2013-04-17 英业达股份有限公司 System for memorizing bilingual synonymy words in assisting mode and method thereof
CN105843897A (en) * 2016-03-23 2016-08-10 青岛海尔软件有限公司 Vertical domain-oriented intelligent question and answer system
CN105955976A (en) * 2016-04-15 2016-09-21 中国工商银行股份有限公司 Automatic answering system and method
CN106202372A (en) * 2016-07-08 2016-12-07 中国电子科技网络信息安全有限公司 A kind of method of network text information emotional semantic classification
CN107220380A (en) * 2017-06-27 2017-09-29 北京百度网讯科技有限公司 Question and answer based on artificial intelligence recommend method, device and computer equipment
CN107391614A (en) * 2017-07-04 2017-11-24 重庆智慧思特大数据有限公司 A kind of Chinese question and answer matching process based on WMD
CN107832291A (en) * 2017-10-26 2018-03-23 平安科技(深圳)有限公司 Client service method, electronic installation and the storage medium of man-machine collaboration
US20180239816A1 (en) * 2017-02-21 2018-08-23 International Business Machines Corporation Processing request documents
CN108536708A (en) * 2017-03-03 2018-09-14 腾讯科技(深圳)有限公司 A kind of automatic question answering processing method and automatically request-answering system
CN108595619A (en) * 2018-04-23 2018-09-28 海信集团有限公司 A kind of answering method and equipment
WO2018195875A1 (en) * 2017-04-27 2018-11-01 Microsoft Technology Licensing, Llc Generating question-answer pairs for automated chatting

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049447A (en) * 2011-10-12 2013-04-17 英业达股份有限公司 System for memorizing bilingual synonymy words in assisting mode and method thereof
CN105843897A (en) * 2016-03-23 2016-08-10 青岛海尔软件有限公司 Vertical domain-oriented intelligent question and answer system
CN105955976A (en) * 2016-04-15 2016-09-21 中国工商银行股份有限公司 Automatic answering system and method
CN106202372A (en) * 2016-07-08 2016-12-07 中国电子科技网络信息安全有限公司 A kind of method of network text information emotional semantic classification
US20180239816A1 (en) * 2017-02-21 2018-08-23 International Business Machines Corporation Processing request documents
CN108536708A (en) * 2017-03-03 2018-09-14 腾讯科技(深圳)有限公司 A kind of automatic question answering processing method and automatically request-answering system
WO2018195875A1 (en) * 2017-04-27 2018-11-01 Microsoft Technology Licensing, Llc Generating question-answer pairs for automated chatting
CN107220380A (en) * 2017-06-27 2017-09-29 北京百度网讯科技有限公司 Question and answer based on artificial intelligence recommend method, device and computer equipment
CN107391614A (en) * 2017-07-04 2017-11-24 重庆智慧思特大数据有限公司 A kind of Chinese question and answer matching process based on WMD
CN107832291A (en) * 2017-10-26 2018-03-23 平安科技(深圳)有限公司 Client service method, electronic installation and the storage medium of man-machine collaboration
CN108595619A (en) * 2018-04-23 2018-09-28 海信集团有限公司 A kind of answering method and equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘旭东等: "面向远程教育的限定领域内自动问答系统设计", 《泰山学院学报》 *
李家南: "IT领域问答系统的研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
王新磊: "受限领域内基于中文问句语义相关度计算的智能问答系统研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210952A (en) * 2019-06-13 2019-09-06 讯飞智元信息科技有限公司 A kind of tender Evaluation Method and device
CN110750632A (en) * 2019-10-21 2020-02-04 闽江学院 Improved Chinese ALICE intelligent question-answering method and system
CN110750632B (en) * 2019-10-21 2022-09-09 闽江学院 Improved Chinese ALICE intelligent question-answering method and system
CN111488735A (en) * 2020-04-09 2020-08-04 中国银行股份有限公司 Test corpus generation method and device and electronic equipment
CN111488735B (en) * 2020-04-09 2023-10-27 中国银行股份有限公司 Test corpus generation method and device and electronic equipment
CN111858851A (en) * 2020-06-30 2020-10-30 银盛支付服务股份有限公司 Intelligent customer service knowledge base multidimensional training method and device
CN113032677A (en) * 2021-04-01 2021-06-25 李旻达 Query information processing method and device based on artificial intelligence
CN113780561A (en) * 2021-09-07 2021-12-10 国网北京市电力公司 Method and device for constructing power grid regulation and control operation knowledge base

Also Published As

Publication number Publication date
CN109710732B (en) 2021-03-05

Similar Documents

Publication Publication Date Title
CN110309283B (en) Answer determination method and device for intelligent question answering
WO2021159632A1 (en) Intelligent questioning and answering method and apparatus, computer device, and computer storage medium
CN109710732A (en) Information query method, device, storage medium and electronic equipment
CN108304375B (en) Information identification method and equipment, storage medium and terminal thereof
CN104836720B (en) Method and device for information recommendation in interactive communication
KR102288249B1 (en) Information processing method, terminal, and computer storage medium
CN108595695A (en) Data processing method, device, computer equipment and storage medium
CN110032632A (en) Intelligent customer service answering method, device and storage medium based on text similarity
CN107818781A (en) Intelligent interactive method, equipment and storage medium
CN109271493A (en) A kind of language text processing method, device and storage medium
CN107679082A (en) Question and answer searching method, device and electronic equipment
CN108875074A (en) Based on answer selection method, device and the electronic equipment for intersecting attention neural network
CN105592343A (en) Display Apparatus And Method For Question And Answer
CN110096567A (en) Selection method, system are replied in more wheels dialogue based on QA Analysis of Knowledge Bases Reasoning
CN109710739B (en) Information processing method and device and storage medium
CN108304373A (en) Construction method, device, storage medium and the electronic device of semantic dictionary
CN109635080A (en) Acknowledgment strategy generation method and device
CN108846138A (en) A kind of the problem of fusion answer information disaggregated model construction method, device and medium
CN108345612A (en) A kind of question processing method and device, a kind of device for issue handling
CN113505204A (en) Recall model training method, search recall device and computer equipment
CN111767394A (en) Abstract extraction method and device based on artificial intelligence expert system
CN108073292A (en) A kind of intelligent word method and apparatus, a kind of device for intelligent word
CN108197105A (en) Natural language processing method, apparatus, storage medium and electronic equipment
CN113822038B (en) Abstract generation method and related device
CN110020429A (en) Method for recognizing semantics and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant