CN111177316A - An intelligent question answering method and system based on subject word filtering - Google Patents

An intelligent question answering method and system based on subject word filtering Download PDF

Info

Publication number
CN111177316A
CN111177316A CN201911325753.9A CN201911325753A CN111177316A CN 111177316 A CN111177316 A CN 111177316A CN 201911325753 A CN201911325753 A CN 201911325753A CN 111177316 A CN111177316 A CN 111177316A
Authority
CN
China
Prior art keywords
question
word
answer
subject
thesaurus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911325753.9A
Other languages
Chinese (zh)
Inventor
潘建
汤绍雄
祝训醉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201911325753.9A priority Critical patent/CN111177316A/en
Publication of CN111177316A publication Critical patent/CN111177316A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于主题词过滤的智能问答方法,包括以下步骤:步骤1、获取用户提出的问题信息q0;步骤2、加载主题词库T,得到初始词语集合S0;步骤3、得到主题词语集合S1;步骤4、根据主题模型M和S1得到问题信息q0的向量化表示w;步骤5、保留相似度大于阈值t的的候选问题,按相似度降序排序得到初始候选问题列表L0;步骤6、初始化L1,L2;步骤7、获取L0中第一个候选问题q;步骤8、如果q不存在,前往步骤9,否则对q和q0进行字符串匹配;步骤9、如果L1不为空,倒序排序并返回L1,否则倒序排序并返回L2,结束。以及提供一种基于主题词过滤的智能问答系统。本发明使得用户的提问能够得到更加准确的回答。

Figure 201911325753

An intelligent question answering method based on subject word filtering, comprising the following steps: step 1, obtaining question information q 0 raised by a user; step 2, loading a thesaurus T to obtain an initial word set S 0 ; step 3, obtaining a theme word set S 1 ; Step 4, obtain the vectorized representation w of the question information q 0 according to the topic models M and S 1 ; Step 5, retain the candidate questions whose similarity is greater than the threshold t, and sort them in descending order of similarity to obtain the initial candidate question list L 0 ; Step 6, initialize L 1 , L 2 ; Step 7, obtain the first candidate question q in L 0 ; Step 8, if q does not exist, go to step 9, otherwise string matching q and q 0 ; Step 9 , If L 1 is not empty, sort in reverse order and return L 1 , otherwise sort in reverse order and return L 2 , end. And provide an intelligent question answering system based on subject word filtering. The present invention enables users' questions to be answered more accurately.

Figure 201911325753

Description

Intelligent question and answer method and system based on subject word filtering
Technical Field
The invention relates to an intelligent question and answer method and system based on subject word filtering.
Technical Field
Intelligent question answering aims at automatically providing answers to natural language questions posed by users. In recent years, with the mass growth of internet data, the improvement of computing power and the progress of natural language processing technology, intelligent question-answering methods and systems are rapidly developed and widely applied to daily life of people.
However, due to the diversity and openness of the questions, some of the existing intelligent question-answering algorithms have some disadvantages, for example, some community-oriented question-answering algorithms have too low accuracy of results due to large topic span, and some algorithms for some specific topics have no expansibility and cannot be applied to multiple topics at the same time, so that the questions put forward by the user cannot be answered with useful or high quality, and the user needs to spend more time searching for answers. Therefore, how to accurately search out high-quality answers based on the questions posed by the user has strong theoretical and practical values.
Disclosure of Invention
In order to improve the accuracy of intelligent question answering and enable a user to quickly obtain high-quality answers, the invention provides an intelligent question answering method and system based on subject word filtering.
The technical scheme adopted by the invention is as follows:
an intelligent question-answering method based on subject word filtering comprises the following steps:
step 1, obtaining question information q proposed by a user0
Step 2, loading a subject word bank T and solving the problem information q0Performing word segmentation and stop word removal processing to obtain an initial word set S0
Step 3, using the topic thesaurus T to S0Filtering to obtain a topic word set S1I.e. S1=S0∩T;
Step 4, loading the theme model M, and according to M and S1Get problem information q0Vector w ({ S) }11,w1},{S12,w2},…,{S1n,wn}),S1i(i is 1,2, …, n, n is S1Number of middle words) is S1The term of (1), wiIs a word S1iWherein w isiThe value of (A) is;
wi=M(S1i);
step 5, calculating w and each question C in the question set C one by onejDegree of similarity p ofj(j is 1,2, …, m is the number of C question-answer pairs), the candidate questions with the similarity larger than the threshold value t are reserved, and the initial candidate question list L is obtained by sorting according to the descending order of the similarity0Wherein w is associated with problem cjVector of (2)
Figure BDA0002328331720000021
Figure BDA0002328331720000022
cjk(k=1,2,…,
Figure BDA0002328331720000023
Is cjNumber of middle words) is S1The words and phrases in (1) or (b),
Figure BDA0002328331720000024
is a word cjkThe weight of (2) is obtained from a vector model set Cw of the problem;
step 6, initializing L1={},L2={};
Step 7, obtaining L0The first candidate question q;
step 8, if q does not exist, go to step 9, otherwise, pair q and q0Carrying out character string matching:
the analysis process comprises the following steps:
(8.1) from question-answer Pair set CpObtaining an answer r of q;
(8.2) if q ═ q0Returning the answer r of q, ending, otherwise going to step 8.3;
(8.3) if
Figure BDA0002328331720000025
L1={(q,r)}∪L1Else L2={(q,r)}∪L2
(8.4) from L0Deleting q, and returning to the step 7;
step 9, if L1Not null, sort in reverse order and return L1Otherwise, sorting and returning L in reverse order2And then, the process is ended.
Further, the question-answer library comprises 3 parts:
1. problem set C: only the questions in the question-answer pair set are included, so that data training is facilitated;
2. question and answer set Cp: storing in the form of 'question-answer';
3. vector set C of questionsw: the words are stored in a form of 'word 1, word 2 … -weight 1, weight 2 …', and are obtained by training a question set in a topic model;
the three parts are associated according to the unique index sequence number of the problem, and the data of the corresponding part can be acquired through the sequence number.
Further, in the step 2, the word segmentation adopts an NLPIR word segmentation system and adopts a subject word bank as a user-defined word bank.
In the step 3, the topic word stock is a pre-constructed word stock and is composed of topic keywords and topic-related high-frequency words, and the topic keywords are composed of topic-specific key words, such as the key words of a programming language, and can be obtained from official documents; the high frequency words are automatically extracted from the theme related e-books or documents using the NLPIR keyword extraction tool of the chinese academy of sciences.
In the step 4, the topic model is a model trained in advance according to the topic lexicon, and vectorization representation of the problem information is directly obtained through the topic model; the topic model is obtained by training a question set in a question-answer library and comprises the following steps:
1. loading a question set, performing word segmentation and stop word processing on the question set, and filtering out words which are not in a subject word library to obtain a corresponding initial word set;
2. calculating the weights of the words in the word segmentation result set through a TF _ IDF algorithm;
3. outputting the topic model to a file in a form of 'word weight';
4. outputting the vectorized representation of the problem set to a file in the form of 'word 1 word 2 … -weight 1 weight 2 …' according to the word segmentation result set and the corresponding weight;
in the topic model M, the words are stored according to the key value pairs of the word weight, so when the vectorization representation of the problem information is obtained according to the topic model, the weights of the words can be sequentially and directly obtained through the words.
In said step 5, a set of vectors C of the problemwThe vector representation of all the problems is saved, the similarity calculation can be directly calculated according to the vector, and the threshold t is the best predefined minimum similarity.
Further, in the step 8, the question-answer pair set CpThe questions and the corresponding answers are stored, and the corresponding answers can be obtained through the question indexes.
An intelligent question-answering system based on subject word filtering comprises the following modules:
the problem information acquisition module is used for acquiring problem information of a user;
the question-answer library module is used for storing a subject word library under a subject, a question-answer library and a subject model;
the natural language processing module is used for processing the problem information of the user so as to obtain a word set of the problem information;
the question-answer library matching module is used for matching the question information of the user with the questions in the question set of the question-answer library to obtain a related candidate question list;
the character string matching module is used for processing the candidate question list obtained from the question-answer library matching module and further matching the question information;
and the answer returning module returns the finally obtained answer to the user.
Further, the question answering library module comprises: 1) a topic word library: storing topic keywords and topic-related high-frequency words; 2) a question-answer library: a topic question-answer library is stored; 3) the topic model is as follows: the trained question word sets and vector representations are stored and are stored according to the question-answer pair sequence.
The natural language processing module comprises: word segmentation unit: dividing the question information into word lists, and adding a subject word library as one of bases for word segmentation; a stop word and subject word filtering unit: after word segmentation, stop words and words which do not belong to the subject word bank are filtered.
Furthermore, in the character string matching module, each question-answer pair is obtained from a question-answer library, if a question with the same information as the question exists, the answer of the question is directly returned, otherwise, the answer containing q is searched0Question-answer pair list L1If L is1Not null, sort in reverse order and return L1Otherwise, sorting and returning in reverse order without q0Question-answer pair list L2
The technical conception of the invention is as follows: the method comprises the steps of obtaining question information, carrying out natural language processing and subject word filtering on the question information, obtaining a candidate question list after matching with a question-answering library, carrying out character string matching, and finally returning a result, so that the intelligent question-answering accuracy is improved.
In the process of asking questions of a user, the algorithm updates the question-answer library at regular time, for example, the period is 1 hour, if questions which are not recorded in the question-answer library appear, the question-answer pairs are recorded in the question-answer library after manual answering, and effective answering information is provided for the user.
The invention has the following beneficial effects: the method comprises the steps of filtering contents irrelevant to the subject in question information based on a specific subject word bank to enable the question information to be more suitable for the subject, and meanwhile, improving the matching degree of the question information and a question-answer bank by adopting a character string matching method to enable the question of a user to be answered more accurately.
Drawings
FIG. 1 is a flow chart of the method for implementing intelligent question answering according to the present invention,
figure 2 is a schematic diagram of a system module,
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, an intelligent question-answering method based on subject word filtering obtains user question information, then performs natural language processing to obtain corresponding word vectors, then performs question-answer library similarity matching and character string matching in sequence based on the obtained word vectors, and returns answers or question-answer pair lists to users after obtaining the answers or question-answer pair lists. The method comprises the following steps:
step 1, obtaining user question information q0(e.g., "what do i want to know the difference between the array and the pointer.
In this embodiment, the theme is set as programming language C/C + +, the segmentation adopts NLPIR segmentation system, and adds a theme thesaurus as a user-defined dictionary, the theme thesaurus includes C/C + +, keywords, operators, and "C language confusion" from the book: the method comprises the steps that high-frequency vocabularies extracted from partial sections in pointer, array, function and multi-file programming are extracted, the inactive vocabulary lists synthesize common inactive vocabulary lists provided by Baidu and Haohang, TF-IDF values are used as word weights during training of a theme model, and a similarity threshold value t is 0.6.
Step 2, loading the subject thesaurus T, and aligning q as shown in table 10Natural language processing including word segmentation and stop word removal to obtain initial word set S0(<Thinking, array, pointer, distinction>):
Array of elements
Pointer with a movable finger
Character string
Output of
auto
break
++
&&
TABLE 1
Step 3, using the topic thesaurus T to S0Filtering to obtain a topic word set S1(<Array, pointer, distinction>);
Step 4, loading the theme model M, as shown in Table 2, according to S1Get problem information q0Vector w { "array", 0.2002578}, { "pointer", 0.202271}, { "difference", 0.097653 });
Figure BDA0002328331720000061
Figure BDA0002328331720000071
TABLE 2
Step 5, calculating w and each question C in the question set C one by onejDegree of similarity p ofjThe method comprises the following steps:
when the j is equal to 1, the total weight of the alloy is less than 1,
Figure BDA0002328331720000072
Figure BDA0002328331720000073
Figure BDA0002328331720000074
Figure BDA0002328331720000075
likewise, can obtain
Figure BDA0002328331720000076
As shown in Table 3, the candidate questions greater than the threshold t are retained, and the initial candidate question list L is obtained by sorting the candidate questions in descending order of similarity0As shown in table 4:
Figure BDA0002328331720000077
Figure BDA0002328331720000081
TABLE 3
Figure BDA0002328331720000082
Figure BDA0002328331720000091
TABLE 4
Step 6, initializing L1={},L2={};
Step 7, selecting L0In the first candidate question, when q ═ is "actually say, i want to know what the difference between the array and the pointer is, can tell i? ", q0"what do i want to know the difference between the array and the pointer? ";
step 8, q and q are paired0Matching character strings;
(8.1) from question-answer Pair set CpGet the answer to q, when r ═ array auto allocate space, but … ";
(8.2)q≠q0go to step 8.3;
(8.3)
Figure BDA0002328331720000103
L1{ ("then why can the array and pointer declare as function parameters be interchanged1At this time L1As shown in table 5:
Figure BDA0002328331720000101
TABLE 5
(8.4) from L0Deleting q, and returning to the step 7;
step 7, selecting L0The first candidate problem, when q ═ is then "why can the array and pointer declarations be interchanged as functional parameters? ", q0"what do i want to know the difference between the array and the pointer? ";
step 8, q and q are paired0Matching character strings;
(8.1) from question-answer Pair set CpGet the answer to q, when r ═ array auto allocate space, but … ";
(8.2)q≠q0go to step 8.3;
(8.3)
Figure BDA0002328331720000104
L2{ ("then why can the array and pointer declare as function parameters be interchanged2At this time L2As shown in table 6:
Figure BDA0002328331720000102
TABLE 6
(8.4) from L0Deleting q, and returning to the step 7;
step 7, selecting L0In the first candidate question, when q ═ is "what is a void pointer, can tell me? ", q0"I want to know what the difference between the array and the pointer is?”;
Step 8, q and q are paired0Matching character strings;
(8.1) from question-answer Pair set CpObtaining the answer of q, wherein r is the meaning of ' void ' … ';
(8.2)q≠q0go to step 8.3;
(8.3)
Figure BDA0002328331720000112
L2{ ("what is a pointer2At this time L2As shown in table 7:
Figure BDA0002328331720000111
TABLE 7
(8.4) from L0Deleting m, and returning to the step 7;
step 7, no candidate answer exists;
step 8, proceeding to step 9;
step 9, L1Not empty, and only one record, return L1And then, the process is ended.
In this embodiment, the end question mark is used as a criterion for determining whether the question is question information, the end question mark is not included in the string matching, and the ellipses represent that the text is too long and are hidden and displayed.
It will be appreciated by persons skilled in the art that the foregoing is illustrative only and is not to be construed as limiting the invention, as variations and modifications of the foregoing examples are within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1.一种基于主题词过滤的智能问答方法,其特征在于:所述方法包括以下步骤:1. an intelligent question and answer method based on subject word filtering, is characterized in that: described method comprises the following steps: 步骤1、获取用户提出的问题信息q0Step 1. Obtain the question information q 0 raised by the user; 步骤2、加载主题词库T,对问题信息q0进行分词、去除停用词处理,得到初始词语集合S0Step 2, load the thesaurus T, perform word segmentation and stop word removal processing on the question information q 0 to obtain an initial word set S 0 ; 步骤3、使用主题词库T对S0进行过滤,得到主题词语集合S1,即S1=S0∩T;Step 3. Filter S 0 by using the thesaurus T to obtain a set of subject words S 1 , that is, S 1 =S 0 ∩T; 步骤4、加载主题模型M,根据M和S1得到问题信息q0的向量w=({S11,w1},{S12,w2},…,{S1n,wn}),S1i(i=1,2,…,n,n为S1中词语数量)为S1中的词语,wi为词语S1i的权重,其中wi的取值为;Step 4. Load the topic model M, and obtain the vector w=({S 11 ,w 1 },{S 12 ,w 2 },...,{S 1n ,w n }) of the question information q 0 according to M and S 1 , S 1i (i=1,2,...,n, n is the number of words in S 1 ) is the word in S 1 , wi is the weight of the word S 1i , and the value of wi is; wi=M(S1i);w i =M(S 1i ); 步骤5、逐一计算w与问题集C中每个问题cj的相似度pj,j=1,2,…,m,m为C中问答对数量,保留相似度大于阈值t的的候选问题,按相似度降序排序得到初始候选问题列表L0,其中w与问题cj的向量Step 5. Calculate the similarity p j of w and each question c j in the question set C one by one, j=1,2,...,m,m is the number of question-answer pairs in C, and reserve the candidate questions whose similarity is greater than the threshold t , sorted in descending order of similarity to get the initial candidate question list L 0 , where w is the vector of questions c j
Figure FDA0002328331710000011
Figure FDA0002328331710000011
Figure FDA0002328331710000012
Figure FDA0002328331710000012
cjk(
Figure FDA0002328331710000013
Figure FDA0002328331710000014
为cj中词语数量)为S1中的词语,
Figure FDA0002328331710000015
为词语cjk的权重,从问题的向量模型集Cw中获取;
c jk (
Figure FDA0002328331710000013
Figure FDA0002328331710000014
is the number of words in c j ) is the words in S 1 ,
Figure FDA0002328331710000015
is the weight of the word c jk , obtained from the vector model set Cw of the problem;
步骤6、初始化L1={},L2={};Step 6. Initialize L 1 ={}, L 2 ={}; 步骤7、获取L0中第一个候选问题q;Step 7. Obtain the first candidate question q in L 0 ; 步骤8、如果q不存在,前往步骤9,否则对q和q0进行字符串匹配:Step 8. If q does not exist, go to step 9, otherwise perform string matching on q and q 0 : 所述分析过程为:The analysis process is: (8.1)从问答对集Cp中获取q的答案r;(8.1) Obtain the answer r of q from the question-answer pair set C p ; (8.2)如果q=q0,返回q的答案r,结束,否则前往步骤8.3;(8.2) If q=q 0 , return the answer r of q, end, otherwise go to step 8.3; (8.3)如果
Figure FDA0002328331710000021
L1={(q,r)}∪L1,否则L2={(q,r)}∪L2
(8.3) If
Figure FDA0002328331710000021
L 1 ={(q,r)}∪L 1 , otherwise L 2 ={(q,r)}∪L 2 :
(8.4)从L0中删除q,返回步骤7;(8.4) delete q from L 0 and return to step 7; 步骤9、如果L1不为空,倒序排序并返回L1,否则倒序排序并返回L2,结束。Step 9. If L 1 is not empty, sort in reverse order and return to L 1 , otherwise sort in reverse order and return to L 2 , end.
2.根据权利要求1所述一种基于主题词过滤的智能问答方法,其特征在于:2. a kind of intelligent question answering method based on subject word filtering according to claim 1, is characterized in that: 问答库包括3个部分:The Q&A library consists of 3 parts: 1)问题集C:仅包括上述问答对集中的问题,方便数据训练;1) Question set C: It only includes the questions in the above question-and-answer pair set, which is convenient for data training; 2)问答对集Cp:以“问题-答案”的形式存储;2) Question-answer pair set C p : stored in the form of "question-answer"; 3)问题的向量集Cw:以“词语1词语2…–权重1权重2…”的形式存储,由问题集在主题模型中训练得到;3) The vector set C w of the question: stored in the form of "word 1 word 2...-weight 1 weight 2...", obtained by training the topic model from the question set; 上述三个部分根据问题的唯一索引序号相关联,可以通过该序号进行获取对应部分的数据。The above three parts are associated according to the unique index sequence number of the question, and the data of the corresponding part can be obtained through the sequence number. 3.根据权利要求1或2所述一种基于主题词过滤的智能问答方法,其特征在于:在所述步骤2中,分词采用NLPIR分词系统并采用主题词库作为用户自定义词库。3. a kind of intelligent question answering method based on subject word filtering according to claim 1 and 2, is characterized in that: in described step 2, word segmentation adopts NLPIR word segmentation system and adopts the thesaurus as user-defined thesaurus. 4.根据权利要求1或2所述一种基于主题词过滤的智能问答方法,其特征在于:在所述步骤3中,主题词库为预先构造的词库,是由主题关键词和主题相关的高频词语组成;主题关键词由特定主题的关键词语组成,可从官方文档中获取;高频词语则使用中科院的NLPIR关键词提取工具从主题相关的电子书或文档中自动提取。4. a kind of intelligent question-answering method based on subject word filtering according to claim 1 and 2, is characterized in that: in described step 3, the thesaurus is a pre-constructed thesaurus, which is related to the subject keyword and the subject It consists of high-frequency words of the Chinese Academy of Sciences; topic keywords are composed of key words of a specific topic, which can be obtained from official documents; high-frequency words are automatically extracted from topic-related e-books or documents using the NLPIR keyword extraction tool of the Chinese Academy of Sciences. 5.根据权利要求3所述一种基于主题词过滤的智能问答方法,其特征在于:在步骤4中,主题模型为根据主题词库预先训练好的模型,问题信息的向量化表示直接通过主题模型得到;主题模型由问答库中的问题集训练得到,为:5. a kind of intelligent question answering method based on subject word filtering according to claim 3 is characterized in that: in step 4, the subject model is a model trained in advance according to subject thesaurus, and the vectorized representation of the question information directly passes through the subject The model is obtained; the topic model is trained from the question set in the question and answer library, which is: 1)加载问题集,并对其进行分词、去停用词处理,并过滤掉主题词库外的词语,得到对应的分词结果集;1) Load the question set, perform word segmentation and stop word processing on it, and filter out the words outside the thesaurus to obtain the corresponding word segmentation result set; 2)通过TF_IDF算法计算分词结果集中词语的权重;2) Calculate the weight of the words in the word segmentation result set by the TF_IDF algorithm; 3)以“词语权重”的形式输出主题模型到文件中;3) Output the topic model to the file in the form of "word weight"; 4)根据分词结果集和对应权重,将问题集的向量化表示以“词语1词语2…–权重1权重2…”的形式输出到文件中;4) According to the word segmentation result set and the corresponding weight, the vectorized representation of the question set is output to the file in the form of "word 1 word 2...-weight 1 weight 2..."; 在主题模型M中是按照“词语权重”的键值对存储,所以根据主题模型获取问题信息的向量化表示时,可以依次直接通过词语取得该词语的权重。In the topic model M, it is stored according to the key-value pair of "word weight", so when the vectorized representation of the question information is obtained according to the topic model, the weight of the word can be obtained directly through the word in turn. 6.根据权利要求4所述一种基于主题词过滤的智能问答方法,其特征在于:在步骤5中,问题的向量集Cw中保存有所有问题的向量表示,相似度计算可直接根据向量计算,阈值t为预先定义好的最小相似度。6. a kind of intelligent question answering method based on subject word filtering according to claim 4 is characterized in that: in step 5, the vector representation of all questions is preserved in the vector set C w of the question, and similarity calculation can be directly based on the vector Calculated, the threshold t is the pre-defined minimum similarity. 7.根据权利要求3所述一种基于主题词过滤的智能问答方法,其特征在于:在步骤8中,问答对集Cp中保存有问答对,可通过问题索引获取对应的答案。7. The intelligent question answering method based on subject word filtering according to claim 3, wherein in step 8, question answer pairs are stored in the question answer pair set C p , and corresponding answers can be obtained through the question index. 8.一种基于主题词过滤的智能问答系统,其特征在于:所述系统包括:8. An intelligent question answering system based on subject word filtering, characterized in that: the system comprises: 问题信息获取模块,用来获取用户的问题信息;The problem information acquisition module is used to obtain the user's problem information; 问答库模块,用来存储主题下主题词库和问答库和主题模型;Question and answer library module, used to store subject thesaurus, question and answer library and topic model under the topic; 自然语言处理模块,用来处理用户的问题信息,从而得到问题信息的词语集合;The natural language processing module is used to process the user's question information to obtain the word set of the question information; 问答库匹配模块,用来将用户问题信息和问答库问题集中的问题进行匹配,得到相关的候选问题列表;The question and answer library matching module is used to match the user question information with the questions in the question set of the question and answer library to obtain a list of relevant candidate questions; 字符串匹配模块,用来处理从问答库匹配模块获得的候选问题列表,更进一步对问题信息进行匹配;The string matching module is used to process the candidate question list obtained from the question answering library matching module, and further match the question information; 答案返回模块,将最终得到的答案返回给用户。The answer return module returns the final answer to the user. 9.根据权利要求7所述一种基于主题词过滤的智能问答系统,其特征在于:所述问答库模块包括:9. a kind of intelligent question and answer system based on subject word filtering according to claim 7, is characterized in that: described question and answer library module comprises: 1)主题词库:存储有主题关键词和主题相关的高频词语;1) Thesaurus: Stores subject keywords and high-frequency words related to the subject; 2)问答库:存储有主题问答库;2) Question and answer library: there is a subject question and answer library; 3)主题模型:存储有经过训练的问题词语集和向量表示,按照问答对顺序存储。3) Topic model: The trained question word set and vector representation are stored, and are stored in the order of question and answer pairs. 10.根据权利要求7所述一种基于主题词过滤的智能问答系统,其特征在于:所述自然语言模块包括:分词单元:将问题信息分割成词语列表,同时有加入主题词库作为分词的依据之一;去停用词和主题词过滤单元:分词后,将停用词和不属于主题词库的词语剔除;10. a kind of intelligent question answering system based on subject word filtering according to claim 7, it is characterized in that: described natural language module comprises: word segmentation unit: divides the question information into word list, and has a thesaurus adding thesaurus as word segmentation simultaneously. One of the bases; filter unit to remove stop words and subject words: after word segmentation, remove stop words and words that do not belong to the subject thesaurus; 在字符串匹配模块中,从问答库获取每个问答对,若有和问题信息相同的问题则直接返回该问题答案,否则查找包含q0的问答对列表L1,如果L1不为空,倒序排序并返回L1,如果L1为空则倒序排序并返回不包含q0的问答对列表L2In the string matching module, each question-and-answer pair is obtained from the question-and-answer library. If there is a question with the same information as the question, the answer to the question is returned directly. Otherwise, the question-answer pair list L 1 containing q 0 is searched. If L 1 is not empty, Sort in reverse order and return L 1 , if L 1 is empty, sort in reverse order and return a list of question-answer pairs L 2 that does not contain q 0 .
CN201911325753.9A 2019-12-20 2019-12-20 An intelligent question answering method and system based on subject word filtering Pending CN111177316A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911325753.9A CN111177316A (en) 2019-12-20 2019-12-20 An intelligent question answering method and system based on subject word filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911325753.9A CN111177316A (en) 2019-12-20 2019-12-20 An intelligent question answering method and system based on subject word filtering

Publications (1)

Publication Number Publication Date
CN111177316A true CN111177316A (en) 2020-05-19

Family

ID=70655579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911325753.9A Pending CN111177316A (en) 2019-12-20 2019-12-20 An intelligent question answering method and system based on subject word filtering

Country Status (1)

Country Link
CN (1) CN111177316A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117371435A (en) * 2023-10-09 2024-01-09 北京睿企信息科技有限公司 Data processing system for acquiring hot words with fluctuation of heat

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120101807A1 (en) * 2010-10-25 2012-04-26 Electronics And Telecommunications Research Institute Question type and domain identifying apparatus and method
CN107153639A (en) * 2016-03-04 2017-09-12 北大方正集团有限公司 Intelligent answer method and system
CN108256056A (en) * 2018-01-12 2018-07-06 广州杰赛科技股份有限公司 Intelligent answer method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120101807A1 (en) * 2010-10-25 2012-04-26 Electronics And Telecommunications Research Institute Question type and domain identifying apparatus and method
CN107153639A (en) * 2016-03-04 2017-09-12 北大方正集团有限公司 Intelligent answer method and system
CN108256056A (en) * 2018-01-12 2018-07-06 广州杰赛科技股份有限公司 Intelligent answer method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
丁怡心: "研究生招生咨询智能问答系统的设计与实现" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117371435A (en) * 2023-10-09 2024-01-09 北京睿企信息科技有限公司 Data processing system for acquiring hot words with fluctuation of heat
CN117371435B (en) * 2023-10-09 2024-04-05 北京睿企信息科技有限公司 Data processing system for acquiring hot words with fluctuation of heat

Similar Documents

Publication Publication Date Title
KR102194837B1 (en) Method and apparatus for answering knowledge-based question
CN112800170B (en) Question matching method and device, question answering method and device
CN109344236B (en) A problem similarity calculation method based on multiple features
CN106649818B (en) Application search intent identification method, device, application search method and server
CN107329949B (en) Semantic matching method and system
CN104615767B (en) Training method, search processing method and the device of searching order model
CN112667794A (en) Intelligent question-answer matching method and system based on twin network BERT model
JP3882048B2 (en) Question answering system and question answering processing method
Cohen et al. End to end long short term memory networks for non-factoid question answering
CN111125349A (en) Graph model text abstract generation method based on word frequency and semantics
CN112650840A (en) Intelligent medical question-answering processing method and system based on knowledge graph reasoning
CN108132927B (en) Keyword extraction method for combining graph structure and node association
CN110096567A (en) Selection method, system are replied in more wheels dialogue based on QA Analysis of Knowledge Bases Reasoning
CN111414763A (en) A semantic disambiguation method, device, device and storage device for sign language computing
CN106126619A (en) A kind of video retrieval method based on video content and system
CN109271524B (en) Entity Linking Method in Knowledge Base Question Answering System
EP3726401A1 (en) Encoding textual information for text analysis
CN113569018A (en) Question and answer pair mining method and device
CN113468311B (en) Knowledge graph-based complex question and answer method, device and storage medium
CN112507097B (en) Method for improving generalization capability of question-answering system
CN112711944B (en) A word segmentation method, system, word segmenter generation method and system
Ye et al. A sentiment based non-factoid question-answering framework
CN111177316A (en) An intelligent question answering method and system based on subject word filtering
WO2012143839A1 (en) A computerized system and a method for processing and building search strings
CN112328757B (en) A Similar Text Retrieval Method for Business Robot Question Answering System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200519