CN109885672A - A kind of question and answer mode intelligent retrieval system and method towards online education - Google Patents
A kind of question and answer mode intelligent retrieval system and method towards online education Download PDFInfo
- Publication number
- CN109885672A CN109885672A CN201910159421.1A CN201910159421A CN109885672A CN 109885672 A CN109885672 A CN 109885672A CN 201910159421 A CN201910159421 A CN 201910159421A CN 109885672 A CN109885672 A CN 109885672A
- Authority
- CN
- China
- Prior art keywords
- document
- answer
- module
- retrieval
- submodule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of intelligent retrieval system and method towards online education, comprising: student's status information module, problem analysis module, document retrieval module, passage retrieval module, answer extracting module;The present invention is realized the intelligent retrieval function of precision personalization based on student's state model, satisfied answer is provided for user using the DR-ASF intelligent retrieval algorithm of BM25 searching algorithm and the manual similar features of addition;The expense in system time and space is effectively reduced based on question and answer mode passage retrieval technology, realizes the interacting instant information of user and system in semantic level.
Description
Technical field
The present invention relates to question and answer mode intelligent retrieval system and methods, belong to computer and Internet informatization.
Background technique
With the rapid development of internet, it is information-based deepen continuously and mobile terminal equipment performance constantly enhances, online
Education (E-learning) is come into being.The fast propagation of knowledge is carried out by application Internet technology, people obtain knowledge
Mode becomes versatile and flexible, and learning aid can not be limited by time, space.It is continuous with online education and big data technology
Development, the problems that the user of online education generates in use can be solved by big data technology.Intelligent customer service is made
For the typical case of artificial intelligence, a kind of completely new customer service model is opened for online education field.Intelligent customer service can mention
For automatic question answering service, the problem of being proposed according to user with natural language, provides a specific answer, be information retrieval with
The research field that natural language processing combines.
In the question and answer mode intelligent retrieval system towards online education, User can use natural language rather than keyword
Combination putd question to, system is made accurately anti-based on the current state of the student then on the basis of understanding user demand
Feedback returns to the personalized answer of precision rather than a series of document for the technology of plain text corpus use information retrieval.
Therefore mainly solving the technical problems that the characteristics of how fully taking into account user and the data of online education, provides from educational data
The learning state level of source and student refine more characteristic informations, while also to fully take into account the timeliness of online question answering system
Property demand, to improve user satisfaction.
Summary of the invention
Technology of the invention solves the problems, such as: user for online education and the characteristics of data, provides one kind towards online
The question and answer mode intelligent retrieval system and method for education utilize the DR-ASF intelligent retrieval of BM25 algorithm and the manual similar features of addition
Algorithm is realized the intelligent retrieval function of precision personalization based on student's state model, satisfied answer is provided for user;It is based on
The expense in system time and space can be effectively reduced in question and answer mode passage retrieval technology, realizes user and system in semantic level
Interacting instant information.
A kind of technical solution of the invention: intelligent retrieval system towards online education, comprising: student's status information
Module, problem analysis module, document retrieval module, passage retrieval module, answer extracting module, in which:
Student's status information module: when user proposes problem by online question and answer, it is responsible for access user's portrait number
According to library, generating the user state information of the user according to User ID, (user refers in particular to student, User Status letter in on-line education system
Breath refers in particular to student's status information), which is called by document retrieval module, and input parameter is student ID, and output parameter is to learn
Raw status information.
Described problem parsing module is divided into offline submodule and online submodule.Offline submodule is responsible for utilizing training language
Material is based on svm classifier algorithm, offline to complete intent classifier model training and intent classifier model is deployed to online submodule;
Online submodule is then called by document retrieval module, is responsible for carrying out semantic parsing to the customer problem of input, and use trains
Intent classifier model carry out intention assessment, then by problem repeat progress meaning of a word extension, to obtain more acurrate coverage more
Full problem feature set of words.Output valve as problem analysis module is returned to document retrieval module by the specific word set.
The document retrieval module is known by the way of being combined based on student's status information and field business rule from document
Know retrieval in library and obtains the document of most matching problem.Document retrieval module includes document repositories management submodule and retrieval submodule
Block;It is usually by online education agency qualification that document repositories, which manage submodule for storing notice class document, this kind of document,
Universities and colleges, which issue, is reading student, and Document Title and content can follow unified format specification, therefore parse text using regular expression
Shelves title and content, extract antistop list and save;Document repositories retrieval submodule parses problem analysis module
The problem of feature set of words and student's status information module provide student's status information, utilize canonical matching and business rule
Analysis retrieval then is carried out to the document antistop list and document content stored in document repositories, positions destination document.By student
Status information introduces, and effectively promotes the recognition capability being intended to problem, improves the accuracy of destination document positioning.Finally, by mesh
It marks document and problem characteristic set of words and is passed to passage retrieval module together as input parameter.
The passage retrieval module, as the intermediate module of document retrieval module and answer extracting module, be responsible for processing from
The incoming problem feature set of words of document retrieval module and destination document carry out language to destination document based on problem characteristic set of words
Justice retrieval extracts, most possible partial target paragraph comprising answer most related to problem.The target paragraph being retrieved mentions
For two kinds of processing modes: first is that quizmaster is directly returned to as answer, using retrieval technique based on probability, algorithm complexity
Low, time and space expense are lower;Second is that being passed to answer extracting module as parameter, quality is provided more for answer extracting module
High, the less paragraph of data volume takes into account the answer efficiency and accuracy of system as process object;
The answer extracting module: it is divided into offline submodule and online submodule.Offline submodule is responsible for utilizing training language
Material, is based on deep learning DR-ASF algorithm, offline to complete DR-ASF intelligent retrieval training and by DR-ASF intelligent retrieval mold portion
Affix one's name to online submodule;Online submodule is on the basis of the search result that passage retrieval module is passed to, from candidate target phase
It falls starting and final position of the location answer in paragraph and returns to quizmaster to extract more accurate answer sentence.
DR-ASF intelligent retrieval is better than general DR model due to joined manual similar features, answer extracting effect.Simultaneously because sharp
Paragraph is used the size of process object to be substantially reduced, to online as the middle layer of document retrieval module and answer extracting module
The timeliness demand of question answering system can also provide better guarantee.
In the answer extracting module, DR-ASF model is passed through based on the DR model (Document Reader) in DrQA
Be added in document vectorization Q-D match entirely, Jaccard similarity and with respect to editing distance craft similar features, to document into
Row semantic understanding extracts the match information of problem and document sentence;Problem and document use the two-way shot and long term of multilayer to remember net
Network is encoded, and the matching degree between them is measured by bilinearity similarity, to predict the position range of answer, is mentioned
The accuracy rate of question and answer is risen, realizes the DR-ASF intelligent retrieval model optimized based on manual similar features.
In the answer extracting module, the realization process based on deep learning DR-ASF algorithm is as follows:
(1) offline submodule is responsible for completing intelligent retrieval model training offline using training corpus, the specific steps are as follows:
1) offline submodule carries out vectorization expression to training corpus (i.e. problem and answer document):
A) problem and answer document are segmented and is gone stop words, respectively obtain the word sequence of problem and answer document;
B) it uses through word2vec tool based on the term vector after large-scale corpus pre-training, by each of problem word sequence
Word word vector indicates that the vectorization for completing problem indicates;
C) the same with step b) first to answer document, obtain the term vector of answer document;Then answer document is obtained
Other characteristic values: POS (part-of-speech tagging) feature vector, NER (name Entity recognition) feature vector and manual similar features vector;
POS feature vector and NER feature vector respectively refer to the type sum of part-of-speech tagging and the type sum of name Entity recognition;By hand
Similar features indicate the relationship between problem and answer document, consist of three parts: Q-D match entirely, Jaccard coefficient and relatively
Editing distance;By the term vector feature of document, part-of-speech tagging feature, name Entity recognition feature and three manual similar features
Vector is spliced, and the vectorization for obtaining answer document indicates;
2) after the vectorization expression for completing problem and answer document, multi-layer biaxially oriented shot and long term memory network Stacked is used
BiLSTM is as encoder, by vectorization matrix the problem of input and answer document vectorization matrix coder at a regular length
Vector, obtain representation and answer document coding vector indicate;By the way that manual similar features are added, to answer document
Can more be laid particular stress on when being encoded in document with vocabulary similar in problem;
3) the problem of generating step 2) coding uses the self-attention in attention mechanism as input parameter
Mechanism carries out the weighted transformation in sentence to representation, learn the word dependence inside sentence, and it is new to obtain representation
Vector indicates;
4) finally, by the new problem that answer document coding vector that step 2) obtains indicates and step 3) obtains encode to
Scale is shown as input parameter, by the way that based on bilinearity similarity, come the matching degree of metric question and answer document, prediction is answered
Starting and final position of the case in answer document.
(2) offline submodule completes DR-ASF intelligent retrieval model by repetitive exercise, building;
(3) DR-ASF intelligent retrieval model is deployed to online submodule by offline submodule;
(4) online submodule first pre-processes problem and target paragraph: the use that will be passed to from passage retrieval module
The feature set of words of family problem carries out vectorization expression based on the term vector of pre-training;It is carried out target paragraph as answer document
Vectorization indicates that processing of the specific steps with offline submodule to answer document obtains the vectorization matrix of target paragraph;Then
Using the vectorization matrix of problem and the vectorization matrix of target paragraph as input, DR-ASF intelligent retrieval model, model are called
Output valve is the accurate answer of problem, returns to quizmaster.
Detailed process is as follows for the business rule of the document retrieval module:
(1) in document repositories management submodule, Document Title and content are parsed using regular expression, is extracted every
The antistop list of a document, the specific steps are as follows:
1) according to business rule, document type tags are established;
2) when in a new document deposit document repositories, notice is marked automatically using the regular expression built in system
Topic and content are handled, and keyword field is extracted, and are completed antistop list and are automatically generated;
3) standardization processing is carried out to the antistop list extracted, main includes the rule of Doctype Auto-matching and batch
Generalized;
4) document content is stored classifiedly in the form of a file under corresponding Doctype catalogue;
5) it is deposited using the antistop list extracted from document, Document Title, document store path as Database field
In MySQL database table, database table automatically generates document id as major key for storage;
(2) the problem of document repositories retrieval submodule parses problem analysis module feature set of words, Yi Jixue
Student's status information that raw status information module provides, using canonical matching and business rule in document antistop list and document
Appearance carries out analysis retrieval, positions destination document, and answer document ID and customer problem are sent to passage retrieval module, specific to walk
It is rapid as follows:
1) for different types of customer problem, different notification types can be distributed to according to business rule;
2) Doctype range of search is reduced using heuristic rule using the relevant information in student's status information;
3) issue date newest document is finally returned in same type notification of document set as destination document.
The problem parsing module, the realization process repeated to problem progress intents and problem are as follows:
(1) problem intent classifier: svm classifier algorithm is used, problem is intended to using TF-IDF as feature vector
Classification, in order to increase information content, uses 1-gram and 2-gram model, tool since the text information amount in problem is fewer
Body realizes that steps are as follows:
1) it uses the method choice of word frequency and extracts problem characteristic item;Stop words is segmented and gone to problem;Statistics is every
The keyword and its frequency of class problem pair take 5 before word frequency ranking keywords as problem characteristic word set;Merge different classes of
Feature word set, formed total characteristic word set;
2) TF-IDF feature is used to indicate as the language model of problem;
3) then use the normalized method of linear function, the range of data is limited, weaken different word word frequency it
Between gap, obtain problem final characteristic set indicate;
4) characteristic set of the problem is indicated and the feature set of every class problem carries out Similar contrasts, select similarity highest
Type of the feature set as problem;
5) the offline submodule in problem analysis module is responsible for completing intent classifier model training when offline, and will train
Good intent classifier model is deployed to online submodule;
(2) problem repeats: according to the FAQs and business norms of user, synonym table is sorted out, for problem
In keyword do synonym expansion, i.e., the word in keyword set is inquired by synonym table, when there is synonym to deposit
When, all synonyms of the word are added in keyword set;
(3) the online submodule in problem analysis module is called by document retrieval module, is responsible for carrying out language to customer problem
Justice parsing first carries out intention assessment using trained intent classifier model, then is repeated by problem and carry out meaning of a word extension, finally
Obtained keyword set is problem characteristic set of words, and the output valve as problem analysis module returns to file retrieval mould
Block.
A kind of intelligent search method towards online education of the invention, is included in line process and off-line procedure;
In line process:
(1) student's real name login system is putd question to the problem of system retrieval question and answer interface inputs natural language description;
(2) system obtains student ID according to user information, and calls document retrieval module, input parameter be student ID and
Customer problem;
(3) the file retrieval submodule in document retrieval module calls student's status information module first, and input parameter is
Student ID.Student's status information module accesses user's representation data database and generates student's state of the user according to student ID
Information, and file retrieval submodule is returned to as output valve, output valve includes that student attends school school, enrollment batch, student status batch
Secondary, paper batch, examination batch, graduation batch and study schedule information;
(4) file retrieval submodule then calls problem analysis module, and input parameter is customer problem.Problem analysis module
Customer problem is parsed, identification problem is intended to and characteristic information, using feature set of words the problem of parsing as output valve
Return to file retrieval submodule;
(5) return value of the file retrieval submodule based on above-mentioned two module: student's status information and problem characteristic word set
It closes, is based on business rule, using canonical matching search file antistop list and document content from document repositories, positioning can be returned
Answer the destination document of the customer problem.
(6) the problem of then file retrieval submodule returns to destination document and problem analysis module feature set of words as
Parameter is passed to passage retrieval module, carries out semantic retrieval to destination document using BM25 algorithm, extracts maximally related with problem
Partial target paragraph is as output valve;
(7) system provides two kinds of question and answer response modes according to user demand:
1) if user will be retrieved target paragraph and be directly returned to as answer using the quick response mode of default
Quizmaster.Which is easy to use, and algorithm complexity is low, and when asks lower with space expense, is also able to satisfy higher recall rate;
If 2) user selects accurate answer-mode, the target paragraph retrieved is passed to answer as input parameter and is taken out
Modulus block carrys out prediction result using joined the DR-ASF intelligent retrieval model trained after manual similar features, from candidate
Target paragraph in starting and final position of the location answer in paragraph, the higher answer of precision is extracted, as answer
Return to quizmaster.
Off-line procedure:
(1) the document repositories management submodule of document retrieval module is responsible for storing in off-line phase and processing notification class is literary
Shelves.When there is new notification of document, document repositories management submodule saves the document title and content first, while using just
Then expression formula parses Document Title and content, and the antistop list extracted is also stored in document repositories.The mould
Block is only called when there is new notification of document, after completing to the parsing of notification of document, is responsible for output valve, i.e., Document Title,
Content and antistop list end task after being saved in document repositories.
(2) the offline submodule of problem analysis module is responsible for, using svm classifier algorithm, completing problem using training corpus
Intention assessment training, and trained intent classifier model is deployed to online submodule, to provide on-line annealing parsing function.
The submodule is run in system off-line, and input value is training corpus, and output valve is trained intent classifier model.
(3) the offline submodule of answer extracting module is responsible for using training corpus, using joined manual similar features
DR-ASF algorithm completes DR-ASF intelligent retrieval model training, and trained DR-ASF intelligent retrieval model is deployed to
Line submodule, to provide online answer extracting function.The submodule is run in system off-line, and input value is training corpus, defeated
Value is trained DR-ASF intelligent retrieval model out.
The advantages of the present invention over the prior art are that:
(1) asking for personalized accurate answer can not be provided for the existing customer service automatically request-answering system in online education field
Topic is proposed for student's status information to be integrated in online question and answer searching system to promote the recognition capability for being intended to problem, be utilized
Canonical matching based on business rule can precise search go out student attend school school for relevant batch student in the specific study stage
Under related announcement document, rather than return to unified general " details please check related school's notice " class answer, thus
More personalized question and answer experience is provided for user.
(2) answer extracting technology combination deep learning is made that on the basis of existing DrQA model and improves and optimizates, mentions
The DR-ASF intelligent retrieval model based on manual similar features is gone out.Similar spy by hand is added when model is by document vectorization
Sign preferably can carry out semantic understanding to document, extract the match information of problem and document sentence.Problem and document are using more
The two-way shot and long term memory network of layer is encoded, and the matching degree between them is measured by bilinearity similarity, thus
The position range for predicting answer, further improves the accuracy rate of question and answer.
(3) for the user of online education and data the characteristics of, from the learning state level of educational data resource and student
More characteristic informations have been extracted, while being additionally contemplates that the accuracy rate and timeliness demand of online question answering system, have utilized paragraph
Reduce the size of process object as the middle layer of document retrieval module and answer extracting module.System realizes two kinds of answers
Generating mode: 1) quick response mode: using retrieval technique based on probability, and easy to use, algorithm complexity is low, when ask and empty
Between expense it is lower, higher recall rate can be met;2) accurate answer-mode: using the answer extracting technology based on deep learning,
Algorithm complexity is higher, it is possible to provide higher accuracy rate.
Detailed description of the invention
Fig. 1 is question and answer mode intelligent retrieval system flow diagram of the invention;
Fig. 2 is document retrieval module flow diagram of the invention;
Fig. 3 is passage retrieval block process schematic diagram of the invention;
Fig. 4 is answer extracting model schematic of the invention.
Specific embodiment
With reference to the accompanying drawing and case study on implementation the present invention is described in detail.
As shown in Figure 1, question and answer mode intelligent retrieval system of the present invention towards online education is by hardware platform and software systems
It constitutes, hardware platform includes: that offline submodule need to individually dispose more GPU servers for model training, the property of GPU server
It can be increased and decreased based on training corpus scale with quantity, each submodule can shared server when carrying out off-line training;Online
Module is deployed on cluster server, and the server end as system provides online question and answer service;The deployment of database used in system
On private database server;Client uses Web browser, no special hardware requirement.
Software systems are by student's status information module, problem analysis module, document retrieval module, passage retrieval module and answer
Case abstraction module is constituted, the specific implementation process is as follows:
(1) student's status information module is the basic components of system.It is retrieved when logging in system by user and by online question and answer
When system proposes problem, which accesses user's representation data library first, is generated according to student ID related with question and answer to the user
Student's status information, for document retrieval module call.
(2) document retrieval module calls the online submodule of problem analysis module to carry out semantic parsing to problem, passes through meaning
Figure identification and problem repeat, and obtain problem characteristic set of words.The offline submodule of problem analysis module is responsible for using training corpus
The offline training for completing intent classifier model, and trained intent classifier model is deployed to online submodule.
(3) the problem of system parses problem analysis module feature set of words and student's status information module provide
Student's status information, be transmitted to document retrieval module as parameter.Document retrieval module is divided into document repositories management submodule
With retrieval submodule.Management submodule, which is responsible for storing being issued in each stage by each universities and colleges of online education agency qualification, to be read to learn
Raw notice class document, usually classification and Notice Date index as per advice to establish.Therefore every in knowledge base to being stored in
A document needs to parse Document Title and content using regular expression, extracts each document while saving content
Antistop list simultaneously saves.Submodule is retrieved using regular expression and business rule to the antistop list and text in document repositories
Shelves content carries out retrieval analysis, determines destination document, and be transmitted to passage retrieval mould using destination document and customer problem as parameter
Block.
(4) analysis of passage retrieval module Utilizing question is as a result, based on BM25 algorithm to carrying out language in the content of destination document
Justice retrieval, extract with the maximally related paragraph of problem, as it is most possible include answer part.The paragraph being retrieved is divided to two
Kind of mode is handled: if quick response mode of the user using default, using the highest target paragraph of similarity as answering
Case is directly returned to quizmaster;If user selects accurate answer-mode, (5) are entered step.
(5) target paragraph that system returns to previous step is passed to answer extracting module.Answer extracting module is divided into online son
Module and offline submodule.Offline submodule is responsible for completing DR-ASF intelligent retrieval model training offline using training corpus, and
Trained DR-ASF intelligent retrieval model is deployed to online submodule.Online submodule calls trained DR-ASF intelligence
Retrieval model extracts accurate answer from candidate target paragraph, returns to quizmaster.
Above-mentioned each module the specific implementation process is as follows:
1. student's status information module
Student's status information module is one of infrastructure component of system, and major function is to obtain and save student's state letter
Breath.In order to preferably provide personalized service, online education field has usually all carried out user for student and its learning state
Portrait.User's portrait features the static information and multidate information of student comprehensively, comprising student attend school school, attend school profession,
It attends school the static informations such as batch and study schedule, learn liveness, the multidate informations such as geographical location of attending class.It is retrieved for question and answer and is
System need to only obtain partial students status information relevant to question and answer retrieval.
When User real name logs in question and answer searching system, according to the real name information of student, visited by parameter of student ID
It asks user's representation data library, generates the status information for retrieving the relevant student to problem, specifically include and attend school school, enrollment batch
Secondary, student status batch, paper batch, examination batch, graduation batch and study schedule information.Its middle school student's attends school school and study
Two information of progress are the most key, can reduce the range of file retrieval in document matches by attending school school;According to study
Progress can be matched to the document of most suitable user's current demand.
After getting student's status information, as parameter, directly incoming document retrieval module.
2. problem analysis module
Case study module is responsible for carrying out intents to problem and problem repeats, and obtains the intent classifier and feature of problem
Set of words.It is divided into three steps:
(1) problem intent classifier
Using svm classifier algorithm, come to carry out intent classifier to problem using TF-IDF as feature vector.Due in problem
Text information amount is fewer, in order to increase information content, uses 1-gram and 2-gram model.The specific implementation steps are as follows:
1) problem characteristic is extracted: system uses the method choice feature of word frequency.The extraction step of characteristic item are as follows:
A) stop words is segmented and gone to the problems in training set using the jieba participle tool of open source;
B) keyword and its frequency for counting every a kind of problem, are ranked up keyword by word frequency, take K before ranking
Feature word set of the keyword as this kind of problems.K value is hyper parameter, is defaulted as 5;
C) word that removal occurs simultaneously in the feature set of words of inhomogeneity problem, merges different classes of feature word set,
Form total characteristic word set.
2) problem characteristic indicates: the language model for using TF-IDF feature as problem indicates, TF (term frequency)
It is the frequency that word occurs in problem, higher this word of explanation of frequency that word occurs is more important, IDF (inverse document
Frequency the importance that) can be used for measuring word illustrates that this word does not have when the document that a word occurs
Representativeness, so this word importance is lower.Therefore the weight that available word is calculated by TF-IDF formula, as problem
Language model indicate.
3) normalized: normalization mainly limits the range of data, and system is normalized using linear function
Method:
T is the word frequency of keyword, t in formulaminFor the word frequency of the least keyword of frequency of occurrence in all problems, tmax
For the word frequency of the most keyword of the frequency of occurrence in all problems.When using word frequency to be compared as index, different words
Word frequency difference can be bigger.The gap between different word word frequency is weakened using normalization, it is ensured that the effect of Question Classification is more preferable.
Normalizing work indicates after the completion to get to the final characteristic set of problem.
4) characteristic set of the problem is indicated and the feature set of every class problem carries out Similar contrasts, select similarity highest
Type of the feature set as problem.
5) the offline submodule in problem analysis module is responsible for completing intent classifier model training when offline, and will train
Good intent classifier is deployed to online submodule.
(2) problem repeats
Problem repeats, i.e., expresses problem again.Because there may be problems equivalent in meaning in actual life, but
The case where being deviated in expression.So needing to repeat problem to promote the effect of question and answer, pass through problem weight here
It states to obtain the Feature Words of problem.Specific implementation is divided into two steps:
1) it segments and removes stop words: being extracted as way with problem characteristic, using jieba participle tool in training set
Problem is segmented and is gone stop words, goes the word set obtained after stop words to be collectively referred to as keyword set problem.
2) meaning of a word extends: statement inconsistence problems that may be present when mainly for student question, to the vocabulary after participle
Carry out synonym expansion.The FAQs and business norms according to user are needed, synonym table is sorted out, for in problem
Keyword do synonym expansion, i.e., the word in keyword set is inquired by synonym table, when with the presence of synonym
When, all synonyms of the word are added in keyword set.
(3) the online submodule in problem analysis module is called by document retrieval module, is responsible for carrying out language to customer problem
Justice parsing, specific steps are as follows: first carry out intention assessment using trained intent classifier model, then carry out word is repeated by problem
Justice extension, the keyword set finally obtained is the feature set of words of problem, and the output valve as problem analysis module returns
To document retrieval module.
3. document retrieval module
Document retrieval module is divided into document repositories management submodule and retrieval submodule.Idiographic flow schematic diagram is as schemed
Shown in 2:
(1) document repositories manage submodule
Online education field is used for the document of question and answer, externally issues usually in the form of notice and inquires for student.Based on industry
Business rule, most of Document Title all can include four class fields: school's title, time, batch, notification type, the batch of part
Information is then included in document content.Specific step is as follows:
1) according to business rule, document type tags are established;
2) when in a new document deposit document repositories, system carries out antistop list and automatically generates.The function is specific
It realizes are as follows: using the regular expression built in system, notice title and content are handled automatically based on business rule, extracted
Above-mentioned four classes field out.Higher accuracy rate can guarantee based on business rule when extracting, such as can be by keyword: learning
Phase, spring, summer, autumn, winter, the first half of the year, second half year etc. carry out canonical matching, complete the extraction of " time " field.
3) standardization processing is carried out to the antistop list extracted.Because in the title of rightful notice document, school and when
Between belong to formal statement, therefore the information extracted is opposite standardizes, without doing extra process.Standardization is mainly concerned with
The standardization of Doctype Auto-matching and batch.
A) Doctype matches: the Doctype keyword obtained from Document Title is calculated using cosine similarity and is closed
The similarity of keyword and document type tags, candidate type of the highest notification type label of similarity as document, is submitted to
Business personnel's audit, business personnel carry out manually beating document type tags according to classification.
B) batch is standardized: since batch information needs to extract from title and content, the format extracted indicates more
Kind multiplicity is converted unified representation using rule-based mode to such case.Secondly as in student's status information
In, batch can also be subdivided into enrollment batch, student status batch, paper batch, examination batch, graduation batch.Because to from document
In the batch that extracts using notification of document type is based on specific batch is automatically performed based on business rule automatic mapping
Refinement.
4) document content is stored classifiedly in the form of a file under corresponding Doctype catalogue.
5) it is deposited using the antistop list extracted from document, Document Title, document store path as Database field
In MySQL database table, database table automatically generates document id as major key for storage.
(2) file retrieval submodule
The module examines document repositories by the way of being combined based on student's status information and field business rule
Rope obtains and the most matched destination document of problem.Realize that steps are as follows:
1) for different types of customer problem, different notification types can be distributed to according to business rule
2) it on the basis of search result, is reduced using the relevant information in student's status information using heuristic rule
Doctype range of search;
3) return same type notification of document set in issue date newest document as destination document.
4. passage retrieval module
Passage retrieval module Utilizing question analyzes result to semantic retrieval is carried out in the content of destination document, extracts and asks
Inscribe most related, the most possible part paragraph comprising answer.Implementation process is for example as shown in Figure 3:
(1) document is pre-processed, is divided into more fine-grained paragraph set.Used here as simple paragragh
Drop into capable division:
1) if document format is html format, html document file is parsed by dom tree, basis < p after parsing
> label obtains the text of paragraph, is combined into document segment text collection.
2) if document format is Word format, paragraph segmentation directly is carried out using enter key, obtains document segment text
Set.
(2) similarity between each paragraph and problem in document is calculated based on BM25 algorithm, by similarity it is maximum before
Three paragraphs, return to user as answer.
BM25 algorithm is a kind of classic algorithm for evaluating correlation between search term and document.Algorithm cuts problem
Point, the degree of correlation of each word and document is calculated, obtains problem and file correlation after weighting.The degree of correlation of word and document it is main
By word weight, word and document relevance two parts are measured.
1) the paragraph set of destination document is pre-processed, using open source jieba participle tool to whole paragraphs into
Row segments and removes stop words, and method is the same;
2) BM25 model is established in the library gensim based on python;
3) the whole feature set of words for the customer problem for obtaining issue handling module are inputted as term;
4) correlation of paragraph and term is calculated using BM25 model;
5) the paragraph point two ways being retrieved is handled: if user uses the quick response mode of default,
The target paragraph retrieved is directly returned to quizmaster as answer, entire question and answer process of retrieving terminates;This mode because
The algorithm used is simple, therefore question and answer response quickly;Can also obtain simultaneously with the higher paragraph of the problem word degree of correlation, it is relatively straight
It sees, interpretation is strong.
If 6) user selects accurate answer-mode, the paragraph of retrieval is passed to answer extracting mould by system
Block, as candidate target paragraph.Object to be processed is reduced to paragraph rank from documentation level by this mode, and high degree reduces
The size of process object, also provides higher-quality content for answer extracting module, has combined the answer effect of system
Rate and accuracy.
5. answer extracting module
Answer extracting module belongs to optional module, if the quick response mode of user's selection default when puing question to, the mould
Block will not be called.If the user desired that obtaining more accurately answer, then accurate answer-mode is selected, process will jump to this mould
Block.Model be added based on the DR model (Document Reader) in DrQA, when by document vectorization Q-D match entirely,
The manual similar features such as Jaccard similarity and opposite editing distance, preferably can carry out semantic understanding to document, extraction is asked
The match information of topic and document sentence.Problem and document are encoded using the two-way shot and long term memory network of multilayer, by double
Linear similarity measures the matching degree between them, to predict the position range of answer, further improves question and answer
Accuracy rate realizes the DR-ASF intelligent retrieval model based on the optimization of manual similar features.
Answer extracting module is divided into online submodule and offline submodule.
(1) offline submodule is responsible for completing intelligent retrieval model training offline using training corpus, building DR-ASF intelligence
Retrieval model, concrete model schematic diagram are as shown in Figure 4:
1) offline submodule carries out vectorization expression to training corpus (i.e. problem and answer document):
A) problem is segmented first, obtains the sequence Q={ x of the word composition of problem1,x2,...,xn, answer text
Shelves are segmented, and sequence D={ y of the word composition in answer document is obtained1,y2,...,ym, wherein xiAnd yiIt is all single in sentence
A word, n and m are the number of the word in problem and paragraph respectively.
B) vectorization expression, since the length of problem is generally shorter, semantic information wherein included then are carried out to problem
It is relatively fewer, so only being indicated with the method for term vector.System directly uses pre- based on large-scale corpus through word2vec tool
Term vector after training, dimension are 300 dimensions.To sequence of question Q={ x1,x2,...,xnIn word xiThe expression of word vector, i.e.,
The vectorization for completing problem q indicates that q corresponds to a two-dimensional matrix Q at this timen×v, n is the number of word in q, and v is the dimension of term vector
Degree.
C) vectorization expression next is carried out to answer document.Since the length of answer document is relatively long, wherein including
Semantic information more horn of plenty, so to paragraph vectorization indicate when, used multiple characteristic values.In addition to above-mentioned word
Other than vector, POS (part-of-speech tagging), NER (name Entity recognition) and manual similar features are further comprised.Manual similar features table
Show the relationship between problem and answer document, it can when being encoded to answer document by the way that manual similar features are added
More lay particular stress on document in vocabulary similar in problem, thus promoted answer answer effect.Manual similar features are by three parts
Composition, Q-D match entirely, Jaccard coefficient and with respect to editing distance.
It is to judge whether the word in answer document occurred in problem that Q-D is matched entirely, occurred then being 1, not occur then
It is 0.The thinking of this feature is directly to tell where the word in model problem occurs in the material, those proximates
Just probably there is answer.
Enable the full matching characteristic vector T=[t of Q-D1,t2,...,tm], wherein the feature vector t of i-th of wordi=fQ-D(yi):
Jaccard coefficient is used to measure the similitude and otherness between sample set, and Jaccard coefficient is bigger, sample phase
It is higher like spending.The similarity relation between measurement problem and local document is gone using Jaccard coefficient.For similar local document
Higher score can be given.The definition and calculating of Jaccard coefficient: if there is set A and set B, Jaccard coefficient is being calculated
When, A and B intersection are found out first, then find out A and B union, be finally defined as follows with the size of intersection divided by the size of union:
J (A, B) is the Jaccard coefficient of A and B.
When calculating in a model, cutting is carried out according to window technique to answer document D first, using the length n of problem Q as window
Mouth length, with yiFor window center, work as yiAbove or below when do not have word, with placeholder M polishing.Document is pressed into window
Set after division is denoted as B={ B1,B2,...,Bm, problem set is denoted as A, A=Q={ x1,x2,...,xn, answer text
Each window B of shelvesiCalculate the Jaccard similarity between A.For example, calculating Jaccard (A, B1):
A={ x1,x2,...,xn, B1=M, M ..., M, y1,y2,...,yk, the number of M is length of window n/2, to
Lower rounding.| A |=| B |=n.Calculating A ∩ B1With A ∪ B1When ignore placeholder M.According to formula, j is obtained1.For all i
∈ { 1, m } calculates Jaccard (A, Bi) after, obtain the Jaccard similarity feature vector J=[j of answer document1,j2,...,
jm], wherein jmIt is the similarity of m-th of window and problem in answer document.
Editing distance is the algorithms most in use of similarity between calculating character string, refers to that a character string is converted to another word
Accord with the least number of operations of string.Operation includes being replaced mutually, deleting one character of a character and insertion for character.It is opposite to compile
Collecting distance is to obtain opposite compile divided by the size of window using problem size as the editing distance of window calculation problem and local document
Collect distance.The division of window and calculating thinking are consistent with Jaccard similarity.The solution of editing distance is relative complex, needs
Opposite editing distance feature vector R=[r is obtained with dynamic programming algorithm1,r2,...,rm], wherein rmIt is in answer document
The opposite editing distance of m-th of window and problem.
Using open source NLP tool to document D={ y1,y2,...,ymCarry out part-of-speech tagging and name Entity recognition.It obtains
Part-of-speech tagging feature vector P=[p1,p2,...,pm], pi∈ [0, typenum (POS) -1] and name Entity recognition feature vector
N=[n1,n2,...,nm], ni∈[0,typenum(NER)-1].Typenum (POS) and typenum (NER) respectively refer to part of speech
The type sum of mark and the type sum of name Entity recognition.
Finally, by the term vector feature V of answer documentd, part-of-speech tagging feature P, name Entity recognition feature N and manual phase
Spliced like feature vector T, J, R, the vectorization for obtaining answer document indicates Dm×k, k=| vi|+|pi|+|ni|+|ti|+|ji
|+|ri|, wherein viIt is term vector, piIt is part-of-speech tagging feature vector, niIt is name Entity recognition feature vector, tiIt is Q-D complete
With feature vector, jiIt is Jaccard similarity feature vector, riIt is opposite editing distance feature vector.
2) after carrying out vectorization expression to problem and answer document, with the two-way LSTM of multilayer to problem and answer document
It is encoded.The problem of will inputting vectorization matrix Qn×vWith answer document vectorization matrix Dm×kIt is encoded into a fixed length
The vector of degree.Since RNN is when the length of sentence is too long, it may appear that the problem of gradient disappears, so the encoder selected here
It (Encoder) is multi-layer biaxially oriented shot and long term memory network Stacked BiLSTM.It will be used for the Stacked to representation
BiLSTM network is known as Q-encoder, and the Stacked BiLSTM network encoded to answer document is known as D-encoder.
BiLSTM model before to LSTM network and backward LSTM network extract the semantic of forward and backward and believe
Breath, feedforward networkForward direction reads in sequence, the forward direction hidden state of the sequence of calculationMake preceding to hidden state
For the partial information of word coding.Backward networkIt is reversed to read in sequence, the reversed hidden state of the sequence of calculation
By the rear partial information for being also used as word to encode to hidden state.By by preceding to hidden stateWith backward hidden stateSplicing
Obtain word codinghiInformation above and hereinafter information are contained simultaneously.Thus obtain single layer BiLSTM model
The coding of problemWith the coding of answer document ksFor hiDimension
Spend size.
The BiLSTM model of multilayer is used in encoding model, each model can arrive the volume of problem and answer document
Code, is denoted asWithK is the number of BiLSTM model.Pass through one
The hidden state of every layer of BiLSTM is connected to obtain by full articulamentumIt obtains most
Whole problem coded representationWith answer coded representationkc=ks× k, wherein ksFor hiDimension size.
3) with the self-attention mechanism in attention mechanism to representationIt is converted, is learnt again
Word dependence inside sentence captures the internal structure of sentence, extracts problem sentence word internal relations.It obtains every in sentence
The word weight of a word, using word weight to representationCarry out the weighted transformation in sentence.Pass through a k firstc- 1
The coding vector weighted sum of each word of sentence then by one softmax layers, is obtained each word in sentence by linear layer
In shared weight Wn.Finally by word weight WnMultiplied byObtain new problem codingThe whole of problem is thus obtained
Body semantic expressiveness, and each root in problem is according to its importance, it is different to whole semantic contribution.
4) by based on bilinearity similarity come the matching degree of metric question and document, thus predict answer answer text
Starting and final position in shelves.
Predict answer position it needs to be determined that initial position and end position of the answer in answer document, therefore intelligent retrieval
Model needs to learn two similar function Ss, Se, the probability of answer starting position and answer end position in document are described respectively
Probability.And the probability indicates that we adopt here by the similarity of computational problem vector and the vector of the position word in document
Similarity is calculated with bilinearity (Bilinear) algorithm.
If answer document vectorVector is inscribed in rhetoric questionQ and diDimension be kc。
Similar function Ss, SeInput be all problem vector q and answer document word vector di, it indicates as shown by the equation:
Ss(di, q) and=diWsq
Se(di, q) and=diWeq
Wherein, WsAnd WeIt is the parameter to be learnt.
In order to predict initial position and final position of the answer in answer document, after similar function plus one layer is normalized
Layer, each word obtained in document become answer initial position PstartWith answer final position PendProbability:
Pstart∝exp(diWsq)
Pend∝exp(diWeq)
In training, loss function is negative log-likelihood loss function (Negative Log Likelihood), normalization
Layer learns W using softmax function cooperation log likelihood cost functionsAnd We.In prediction, normalization layer is directly used
Softmax function, each word in document, which can be obtained, becomes the probability of answer banner word and answer termination word.Finally selection is answered
Max (P in case documentstart) and max (Pend) between sentence as prediction answer.
(2) offline submodule completes DR-ASF intelligent retrieval model by repetitive exercise, building;
(3) the DR-ASF intelligent retrieval model that training is completed is deployed to online submodule by offline submodule;
(4) online submodule uses DR-ASF intelligent retrieval model, extracts from candidate target paragraph more accurate
Answer returns to quizmaster, realizes step are as follows:
1) target paragraph being passed to from passage retrieval module is equally pre-processed with the answer document in training corpus:
Candidate target paragraph is segmented using open source NLP tool, removes stop words, extracts part-of-speech tagging feature P, name entity is known
Other feature N, and initial term vector Vd;
2) the whole feature set of words for the customer problem for obtaining issue handling module are carried out based on the term vector of pre-training
Vectorization indicates, constructs two-dimensional matrix Qn×v;
3) the manual similar features vector Q-D for extracting target paragraph match entirely, Jaccard coefficient and with respect to editing distance,
With term vector feature, part-of-speech tagging feature before, name Entity recognition feature carry out feature connection, obtain target paragraph to
Quantization matrix Dm×k;
4) by Qn×vAnd Dm×kAs input, DR-ASF intelligent retrieval model is called to be handled;
5) output valve of model is the accurate answer of problem, returns to quizmaster.
Claims (6)
1. a kind of intelligent retrieval system towards online education characterized by comprising student's status information module, solution
Analyse module, document retrieval module, passage retrieval module, answer extracting module, in which:
Student's status information module: when user proposes problem by online question and answer, being responsible for access user's representation data library,
The user state information of the user is generated according to User ID, wherein user refers to student, User Status in the on-line education system
Information refers in particular to student's status information;The module is called by document retrieval module, and input parameter is student ID, and output parameter is student
Status information;
Described problem parsing module is divided into offline submodule and online submodule;Offline submodule is responsible for utilizing training corpus, base
It is offline to complete intent classifier model training and intent classifier model is deployed to online submodule in svm classifier algorithm;Online son
Module is then called by document retrieval module, carries out semantic parsing to the customer problem of input, and use trained intent classifier
Model carries out intention assessment, then is repeated by problem and carry out meaning of a word extension, obtains the more full problem feature set of words of coverage;It should
Output valve as problem analysis module is returned to document retrieval module by feature set of words;
The document retrieval module, by the way of being combined based on student's status information and field business rule, from document knowledge
Retrieval obtains the document of most matching problem in library;Document retrieval module includes: document repositories management submodule and retrieval submodule
Block;Document repositories manage submodule for storing notice class document, and this kind of notice class document is by online education agency qualification
Universities and colleges issue and reading student, Document Title and content can follow unified format specification, and regular expression is used to parse document
Title and content extract antistop list and save;Document repositories retrieval submodule parses problem analysis module
Student's status information that problem characteristic set of words and student's status information module provide utilizes canonical matching and business rule
Analysis retrieval is carried out to the document antistop list and document content stored in document repositories, positions destination document;By student's shape
State information introduces, and effectively promotes the recognition capability being intended to problem, improves the accuracy of destination document positioning;Finally, by target
Document and problem characteristic set of words are together as the incoming passage retrieval module of input parameter;
The passage retrieval module is responsible for processing from document as the intermediate module of document retrieval module and answer extracting module
The incoming problem feature set of words of retrieval module and destination document;Semantic inspection is carried out to destination document based on problem characteristic set of words
Rope extracts, most possible partial target paragraph comprising answer most related to problem;
The answer extracting module: it is divided into offline submodule and online submodule;Offline submodule is responsible for utilizing training corpus, base
It is offline to complete DR-ASF intelligent retrieval model training and dispose DR-ASF intelligent retrieval model in deep learning DR-ASF algorithm
To online submodule;Online submodule is intelligently examined on the basis of the search result that passage retrieval module is passed to, using DR-ASF
Rope model is returned from starting of the location answer in target paragraph in paragraph and final position to extract more accurate answer
Back to quizmaster.
2. the intelligent retrieval system according to claim 1 towards online education, it is characterised in that: the answer extracting mould
In block, DR-ASF model is based on the DR model (Document Reader) in DrQA, by the way that Q-D is added in document vectorization
Full matching, Jaccard similarity and opposite editing distance craft similar features, carry out semantic understanding to document, extract problem and
The match information of document sentence;Problem and document are encoded using the two-way shot and long term memory network of multilayer, pass through bilinearity
Similarity measures the matching degree between them, to predict the position range of answer, promotes the accuracys rate of question and answer, realizes base
In the DR-ASF intelligent retrieval model of manual similar features optimization.
3. the intelligent retrieval system according to claim 1 towards online education, it is characterised in that: the answer extracting mould
In block, the realization process based on deep learning DR-ASF algorithm is as follows:
(1) offline submodule is responsible for completing intelligent retrieval model training offline using training corpus, the specific steps are as follows:
1) offline submodule carries out vectorization expression to training corpus (i.e. problem and answer document):
A) problem and answer document are segmented and is gone stop words, respectively obtain the word sequence of problem and answer document;
B), based on the term vector after large-scale corpus pre-training, each word of problem word sequence is used using through word2vec tool
Term vector indicates that the vectorization for completing problem indicates;
C) the same with step b) first to answer document, obtain the term vector of answer document;Then other of answer document are obtained
Characteristic value: POS (part-of-speech tagging) feature vector, NER (name Entity recognition) feature vector and manual similar features vector;POS
Feature vector and NER feature vector respectively refer to the type sum of part-of-speech tagging and the type sum of name Entity recognition;Manual phase
Like the relationship between character representation problem and answer document, consist of three parts: Q-D matches entirely, Jaccard coefficient and opposite compiles
Collect distance;By the term vector feature of document, part-of-speech tagging feature, name Entity recognition feature and three manual similar features to
Amount is spliced, and the vectorization for obtaining answer document indicates;
2) after the vectorization expression for completing problem and answer document, multi-layer biaxially oriented shot and long term memory network Stacked is used
BiLSTM is as encoder, by vectorization matrix the problem of input and answer document vectorization matrix coder at a regular length
Vector, obtain representation and answer document coding vector indicate;By the way that manual similar features are added, to answer document
Can more be laid particular stress on when being encoded in document with vocabulary similar in problem;
3) the problem of generating step 2) coding uses the self-attention mechanism in attention mechanism as input parameter
Weighted transformation in sentence is carried out to representation, learns the word dependence inside sentence, obtains the new vector of representation
It indicates;
4) finally, the answer document coding vector that step 2) obtains is indicated the new problem coding vector table obtained with step 3)
It is shown as input parameter, by, come the matching degree of metric question and answer document, predicting that answer exists based on bilinearity similarity
Starting and final position in answer document;
(2) offline submodule completes DR-ASF intelligent retrieval model by repetitive exercise, building;
(3) DR-ASF intelligent retrieval model is deployed to online submodule by offline submodule;
(4) online submodule first pre-processes problem and target paragraph: the user being passed to from passage retrieval module is asked
The feature set of words of topic carries out vectorization expression based on the term vector of pre-training;Vector is carried out using target paragraph as answer document
Changing indicates, processing of the specific steps with offline submodule to answer document obtains the vectorization matrix of target paragraph;Then it will ask
The vectorization matrix of topic and the vectorization matrix of target paragraph call DR-ASF intelligent retrieval model, model output as input
Value is the accurate answer of problem, returns to quizmaster.
4. the intelligent retrieval system according to claim 1 towards online education, it is characterised in that: the document retrieval mould
Detailed process is as follows for the business rule of block:
(1) in document repositories management submodule, Document Title and content is parsed using regular expression, extracts each text
The antistop list of shelves, the specific steps are as follows:
1) according to business rule, document type tags are established;
2) when in a new document deposit document repositories, using the regular expression built in system automatically to notice title and
Content is handled, and keyword field is extracted, and is completed antistop list and is automatically generated;
3) standardization processing is carried out to the antistop list extracted, main includes the specification of Doctype Auto-matching and batch
Change;
4) document content is stored classifiedly in the form of a file under corresponding Doctype catalogue;
5) it is stored in using the antistop list extracted from document, Document Title, document store path as Database field
In MySQL database table, database table automatically generates document id as major key;
(2) the problem of document repositories retrieval submodule parses problem analysis module feature set of words and student's shape
Student's status information that state information module provides, using canonical matching and business rule to document antistop list and document content into
Row analysis retrieval, positions destination document, and answer document ID and customer problem are sent to passage retrieval module, specific steps are such as
Under:
1) for different types of customer problem, different notification types can be distributed to according to business rule;
2) Doctype range of search is reduced using heuristic rule using the relevant information in student's status information;
3) issue date newest document is finally returned in same type notification of document set as destination document.
5. the intelligent retrieval system according to claim 1 towards online education, it is characterised in that: the problem parsing
Module, the realization process repeated to problem progress intents and problem are as follows:
(1) problem intent classifier: using svm classifier algorithm, comes to carry out intent classifier to problem using TF-IDF as feature vector,
Since the text information amount in problem is fewer, in order to increase information content, 1-gram and 2-gram model, specific implementation are used
Steps are as follows:
1) it uses the method choice of word frequency and extracts problem characteristic item;Stop words is segmented and gone to problem;Every class is counted to ask
The keyword and its frequency of topic pair, take 5 before word frequency ranking keywords as problem characteristic word set;Merge different classes of spy
Word set is levied, total characteristic word set is formed;
2) TF-IDF feature is used to indicate as the language model of problem;
3) the normalized method of linear function is then used, the range of data is limited, is weakened between different word word frequency
Gap, the final characteristic set for obtaining problem indicate;
4) characteristic set of the problem is indicated and the feature set of every class problem carries out Similar contrasts, select the highest spy of similarity
Collect the type as problem;
5) the offline submodule in problem analysis module is responsible for completing intent classifier model training when offline, and will be trained
Intent classifier model is deployed to online submodule;
(2) problem repeats: according to the FAQs and business norms of user, synonym table is sorted out, for in problem
Keyword does synonym expansion, i.e., is inquired by synonym table the word in keyword set, in the presence of having synonym,
All synonyms of the word are added in keyword set;
(3) the online submodule in problem analysis module is called by document retrieval module, is responsible for carrying out customer problem semantic solution
Analysis first carries out intention assessment using trained intent classifier model, then is repeated by problem and carry out meaning of a word extension, finally obtains
Keyword set be problem characteristic set of words, the output valve as problem analysis module returns to document retrieval module.
6. a kind of intelligent search method towards online education characterized by comprising in line process and off-line procedure;
It is wherein as follows in line process:
(1) student's real name login system is putd question to the problem of retrieving question and answer interface input natural language description;
(2) according to user information, student ID is obtained, and calls document retrieval module, input parameter is student ID and customer problem;
(3) the file retrieval submodule in document retrieval module calls student's status information module first, and input parameter is student
ID;
Student's status information module accesses user's representation data database, according to student ID, generates student's state letter of the user
Breath, and return to file retrieval submodule as output valve, output valve include student attend school school, enrollment batch, student status batch,
Paper batch, examination batch, graduation batch and study schedule information;
(4) file retrieval submodule then calls problem analysis module, and input parameter is customer problem, problem analysis module to
Family problem is parsed, and identification problem is intended to and characteristic information, is returned feature set of words the problem of parsing as output valve
Give file retrieval submodule;
(5) return value of the file retrieval submodule based on above-mentioned two module: student's status information and problem characteristic set of words, base
In business rule, using canonical matching search file antistop list and document content from document repositories, positioning can answer should
The destination document of customer problem;
(6) the problem of then file retrieval submodule returns to destination document and problem analysis module feature set of words is as parameter
Incoming passage retrieval module carries out semantic retrieval to destination document using BM25 algorithm, extracts and the maximally related part of problem
Target paragraph is as output valve;
(7) two kinds of question and answer response modes are provided according to user demand: if user will examine using the quick response mode of default
Rope goes out target paragraph as answer and is directly returned to quizmaster, and the response of which system is rapid;If user's selection is precisely answered
Mode is then passed to answer extracting module for the target paragraph retrieved as input parameter, using joined manual similar features
The DR-ASF intelligent retrieval model come is trained afterwards and carrys out prediction result, and location answer is in paragraph from candidate target paragraph
Starting and final position, extract the higher answer of precision, return to quizmaster as answer;
The off-line procedure:
(1) the document repositories management submodule of document retrieval module is responsible in off-line phase storage and processing notification class document,
When there is new notification of document, document repositories management submodule saves the document title and content first, while using canonical
Expression formula parses Document Title and content, and the antistop list extracted is also stored in document repositories, the module
It is only called when there is new notification of document, after completing to the parsing of notification of document, is responsible for output valve, is i.e. Document Title, interior
Hold and ends task after being saved in document repositories with antistop list;
(2) the offline submodule of problem analysis module is responsible for, using svm classifier algorithm, completing problem using training corpus and being intended to
Recognition training, and trained intent classifier model is deployed to online submodule, to provide on-line annealing parsing function;The son
Module is run in system off-line, and input value is training corpus, and output valve is trained intent classifier model;
(3) the offline submodule of answer extracting module is responsible for using the DR- that joined manual similar features using training corpus
ASF algorithm completes DR-ASF intelligent retrieval model training, and trained DR-ASF intelligent retrieval model is deployed to online son
Module, to provide online answer extracting function, which runs in system off-line, and input value is training corpus, output valve
For trained DR-ASF intelligent retrieval model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910159421.1A CN109885672B (en) | 2019-03-04 | 2019-03-04 | Question-answering type intelligent retrieval system and method for online education |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910159421.1A CN109885672B (en) | 2019-03-04 | 2019-03-04 | Question-answering type intelligent retrieval system and method for online education |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109885672A true CN109885672A (en) | 2019-06-14 |
CN109885672B CN109885672B (en) | 2020-10-30 |
Family
ID=66930403
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910159421.1A Active CN109885672B (en) | 2019-03-04 | 2019-03-04 | Question-answering type intelligent retrieval system and method for online education |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109885672B (en) |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110297897A (en) * | 2019-06-21 | 2019-10-01 | 科大讯飞(苏州)科技有限公司 | Question and answer processing method and Related product |
CN110598080A (en) * | 2019-08-13 | 2019-12-20 | 浙江省建工集团有限责任公司 | Intelligent enterprise knowledge management platform |
CN110647584A (en) * | 2019-09-23 | 2020-01-03 | 青岛聚好联科技有限公司 | Internet of things platform document data management method and device |
CN111159340A (en) * | 2019-12-24 | 2020-05-15 | 重庆兆光科技股份有限公司 | Answer matching method and system for machine reading understanding based on random optimization prediction |
CN111563378A (en) * | 2020-04-30 | 2020-08-21 | 神思电子技术股份有限公司 | Multi-document reading understanding realization method for combined learning |
CN111597314A (en) * | 2020-04-20 | 2020-08-28 | 科大讯飞股份有限公司 | Reasoning question-answering method, device and equipment |
CN111782759A (en) * | 2020-06-29 | 2020-10-16 | 数网金融有限公司 | Question and answer processing method and device and computer readable storage medium |
CN111881266A (en) * | 2019-07-19 | 2020-11-03 | 马上消费金融股份有限公司 | Response method and device |
CN111984703A (en) * | 2020-08-19 | 2020-11-24 | 中国银行股份有限公司 | Method and device for positioning problems in knowledge base |
CN112131366A (en) * | 2020-09-23 | 2020-12-25 | 腾讯科技(深圳)有限公司 | Method, device and storage medium for training text classification model and text classification |
CN112163079A (en) * | 2020-09-30 | 2021-01-01 | 民生科技有限责任公司 | Intelligent conversation method and system based on reading understanding model |
CN112489740A (en) * | 2020-12-17 | 2021-03-12 | 北京惠及智医科技有限公司 | Medical record detection method, training method of related model, related equipment and device |
CN112711700A (en) * | 2019-10-24 | 2021-04-27 | 富驰律法(北京)科技有限公司 | Method and system for recommending case for fair litigation |
CN112733674A (en) * | 2020-12-31 | 2021-04-30 | 北京华图宏阳网络科技有限公司 | Intelligent correction method and system for official application examination application documents |
CN113076431A (en) * | 2021-04-28 | 2021-07-06 | 平安科技(深圳)有限公司 | Question and answer method and device for machine reading understanding, computer equipment and storage medium |
CN113157884A (en) * | 2021-04-09 | 2021-07-23 | 杭州电子科技大学 | Question-answer retrieval method based on campus service |
CN113191148A (en) * | 2021-04-30 | 2021-07-30 | 西安理工大学 | Rail transit entity identification method based on semi-supervised learning and clustering |
CN113626575A (en) * | 2021-09-01 | 2021-11-09 | 浙江力石科技股份有限公司 | Intelligent recommendation method based on user question answering |
CN114297362A (en) * | 2021-12-31 | 2022-04-08 | 浙江力石科技股份有限公司 | Question-answering system based on combination algorithm of text travel industry |
CN114495143A (en) * | 2021-12-24 | 2022-05-13 | 北京百度网讯科技有限公司 | Text object identification method and device, electronic equipment and storage medium |
CN114661891A (en) * | 2022-04-11 | 2022-06-24 | 北京百度网讯科技有限公司 | Information extraction method, information extraction device, electronic equipment and medium |
CN115658860A (en) * | 2022-10-17 | 2023-01-31 | 吉林大学 | Automatic teacher self-supporting teaching behavior identification method |
CN116108128A (en) * | 2023-04-13 | 2023-05-12 | 华南师范大学 | Open domain question-answering system and answer prediction method |
CN116501859A (en) * | 2023-06-26 | 2023-07-28 | 中国海洋大学 | Paragraph retrieval method, equipment and medium based on refrigerator field |
CN116701579A (en) * | 2023-02-21 | 2023-09-05 | 中国人民解放军海军工程大学 | Information reply system, method and computer readable storage medium |
CN116932730A (en) * | 2023-09-14 | 2023-10-24 | 天津汇智星源信息技术有限公司 | Document question-answering method and related equipment based on multi-way tree and large-scale language model |
CN117171333A (en) * | 2023-11-03 | 2023-12-05 | 国网浙江省电力有限公司营销服务中心 | Electric power file question-answering type intelligent retrieval method and system |
CN117172245A (en) * | 2023-05-26 | 2023-12-05 | 国家计算机网络与信息安全管理中心 | Control method and control system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902652A (en) * | 2014-02-27 | 2014-07-02 | 深圳市智搜信息技术有限公司 | Automatic question-answering system |
US20150324456A1 (en) * | 2014-05-08 | 2015-11-12 | Electronics And Telecommunications Research Institute | Question answering system and method |
CN105824933A (en) * | 2016-03-18 | 2016-08-03 | 苏州大学 | Automatic question answering system based on main statement position and implementation method thereof |
CN107562792A (en) * | 2017-07-31 | 2018-01-09 | 同济大学 | A kind of question and answer matching process based on deep learning |
US20180089307A1 (en) * | 2016-09-26 | 2018-03-29 | International Business Machines Corporation | Minimum coordination passage scoring |
CN108959556A (en) * | 2018-06-29 | 2018-12-07 | 北京百度网讯科技有限公司 | Entity answering method, device and terminal neural network based |
CN108984778A (en) * | 2018-07-25 | 2018-12-11 | 南京瓦尔基里网络科技有限公司 | A kind of intelligent interaction automatically request-answering system and self-teaching method |
-
2019
- 2019-03-04 CN CN201910159421.1A patent/CN109885672B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902652A (en) * | 2014-02-27 | 2014-07-02 | 深圳市智搜信息技术有限公司 | Automatic question-answering system |
US20150324456A1 (en) * | 2014-05-08 | 2015-11-12 | Electronics And Telecommunications Research Institute | Question answering system and method |
CN105824933A (en) * | 2016-03-18 | 2016-08-03 | 苏州大学 | Automatic question answering system based on main statement position and implementation method thereof |
US20180089307A1 (en) * | 2016-09-26 | 2018-03-29 | International Business Machines Corporation | Minimum coordination passage scoring |
CN107562792A (en) * | 2017-07-31 | 2018-01-09 | 同济大学 | A kind of question and answer matching process based on deep learning |
CN108959556A (en) * | 2018-06-29 | 2018-12-07 | 北京百度网讯科技有限公司 | Entity answering method, device and terminal neural network based |
CN108984778A (en) * | 2018-07-25 | 2018-12-11 | 南京瓦尔基里网络科技有限公司 | A kind of intelligent interaction automatically request-answering system and self-teaching method |
Non-Patent Citations (2)
Title |
---|
刘拼拼: ""领域问答系统中问句相似度计算方法研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
毛先领等: ""问答系统研究综述"", 《计算机科学与探索》 * |
Cited By (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110297897A (en) * | 2019-06-21 | 2019-10-01 | 科大讯飞(苏州)科技有限公司 | Question and answer processing method and Related product |
CN111881266A (en) * | 2019-07-19 | 2020-11-03 | 马上消费金融股份有限公司 | Response method and device |
CN111881266B (en) * | 2019-07-19 | 2024-06-07 | 马上消费金融股份有限公司 | Response method and device |
CN110598080A (en) * | 2019-08-13 | 2019-12-20 | 浙江省建工集团有限责任公司 | Intelligent enterprise knowledge management platform |
CN110647584A (en) * | 2019-09-23 | 2020-01-03 | 青岛聚好联科技有限公司 | Internet of things platform document data management method and device |
CN112711700A (en) * | 2019-10-24 | 2021-04-27 | 富驰律法(北京)科技有限公司 | Method and system for recommending case for fair litigation |
CN111159340A (en) * | 2019-12-24 | 2020-05-15 | 重庆兆光科技股份有限公司 | Answer matching method and system for machine reading understanding based on random optimization prediction |
CN111159340B (en) * | 2019-12-24 | 2023-11-03 | 重庆兆光科技股份有限公司 | Machine reading understanding answer matching method and system based on random optimization prediction |
CN111597314B (en) * | 2020-04-20 | 2023-01-17 | 科大讯飞股份有限公司 | Reasoning question-answering method, device and equipment |
CN111597314A (en) * | 2020-04-20 | 2020-08-28 | 科大讯飞股份有限公司 | Reasoning question-answering method, device and equipment |
CN111563378A (en) * | 2020-04-30 | 2020-08-21 | 神思电子技术股份有限公司 | Multi-document reading understanding realization method for combined learning |
CN111782759B (en) * | 2020-06-29 | 2024-04-19 | 数网金融有限公司 | Question-answering processing method and device and computer readable storage medium |
CN111782759A (en) * | 2020-06-29 | 2020-10-16 | 数网金融有限公司 | Question and answer processing method and device and computer readable storage medium |
CN111984703A (en) * | 2020-08-19 | 2020-11-24 | 中国银行股份有限公司 | Method and device for positioning problems in knowledge base |
CN112131366A (en) * | 2020-09-23 | 2020-12-25 | 腾讯科技(深圳)有限公司 | Method, device and storage medium for training text classification model and text classification |
CN112131366B (en) * | 2020-09-23 | 2024-02-09 | 腾讯科技(深圳)有限公司 | Method, device and storage medium for training text classification model and text classification |
CN112163079A (en) * | 2020-09-30 | 2021-01-01 | 民生科技有限责任公司 | Intelligent conversation method and system based on reading understanding model |
CN112163079B (en) * | 2020-09-30 | 2024-02-20 | 民生科技有限责任公司 | Intelligent dialogue method and system based on reading understanding model |
CN112489740A (en) * | 2020-12-17 | 2021-03-12 | 北京惠及智医科技有限公司 | Medical record detection method, training method of related model, related equipment and device |
CN112733674A (en) * | 2020-12-31 | 2021-04-30 | 北京华图宏阳网络科技有限公司 | Intelligent correction method and system for official application examination application documents |
CN113157884A (en) * | 2021-04-09 | 2021-07-23 | 杭州电子科技大学 | Question-answer retrieval method based on campus service |
CN113076431A (en) * | 2021-04-28 | 2021-07-06 | 平安科技(深圳)有限公司 | Question and answer method and device for machine reading understanding, computer equipment and storage medium |
CN113191148A (en) * | 2021-04-30 | 2021-07-30 | 西安理工大学 | Rail transit entity identification method based on semi-supervised learning and clustering |
CN113191148B (en) * | 2021-04-30 | 2024-05-28 | 西安理工大学 | Rail transit entity identification method based on semi-supervised learning and clustering |
CN113626575A (en) * | 2021-09-01 | 2021-11-09 | 浙江力石科技股份有限公司 | Intelligent recommendation method based on user question answering |
CN114495143A (en) * | 2021-12-24 | 2022-05-13 | 北京百度网讯科技有限公司 | Text object identification method and device, electronic equipment and storage medium |
CN114495143B (en) * | 2021-12-24 | 2024-03-22 | 北京百度网讯科技有限公司 | Text object recognition method and device, electronic equipment and storage medium |
CN114297362A (en) * | 2021-12-31 | 2022-04-08 | 浙江力石科技股份有限公司 | Question-answering system based on combination algorithm of text travel industry |
CN114661891A (en) * | 2022-04-11 | 2022-06-24 | 北京百度网讯科技有限公司 | Information extraction method, information extraction device, electronic equipment and medium |
CN115658860A (en) * | 2022-10-17 | 2023-01-31 | 吉林大学 | Automatic teacher self-supporting teaching behavior identification method |
CN115658860B (en) * | 2022-10-17 | 2023-06-06 | 吉林大学 | Automatic identification method for autonomous supporting teaching behavior of teacher |
CN116701579A (en) * | 2023-02-21 | 2023-09-05 | 中国人民解放军海军工程大学 | Information reply system, method and computer readable storage medium |
CN116108128B (en) * | 2023-04-13 | 2023-09-05 | 华南师范大学 | Open domain question-answering system and answer prediction method |
CN116108128A (en) * | 2023-04-13 | 2023-05-12 | 华南师范大学 | Open domain question-answering system and answer prediction method |
CN117172245A (en) * | 2023-05-26 | 2023-12-05 | 国家计算机网络与信息安全管理中心 | Control method and control system |
CN116501859B (en) * | 2023-06-26 | 2023-09-01 | 中国海洋大学 | Paragraph retrieval method, equipment and medium based on refrigerator field |
CN116501859A (en) * | 2023-06-26 | 2023-07-28 | 中国海洋大学 | Paragraph retrieval method, equipment and medium based on refrigerator field |
CN116932730B (en) * | 2023-09-14 | 2023-12-01 | 天津汇智星源信息技术有限公司 | Document question-answering method and related equipment based on multi-way tree and large-scale language model |
CN116932730A (en) * | 2023-09-14 | 2023-10-24 | 天津汇智星源信息技术有限公司 | Document question-answering method and related equipment based on multi-way tree and large-scale language model |
CN117171333A (en) * | 2023-11-03 | 2023-12-05 | 国网浙江省电力有限公司营销服务中心 | Electric power file question-answering type intelligent retrieval method and system |
Also Published As
Publication number | Publication date |
---|---|
CN109885672B (en) | 2020-10-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109885672A (en) | A kind of question and answer mode intelligent retrieval system and method towards online education | |
CN111767741B (en) | Text emotion analysis method based on deep learning and TFIDF algorithm | |
CN112380325B (en) | Knowledge graph question-answering system based on joint knowledge embedded model and fact memory network | |
CN109299865B (en) | Psychological evaluation system and method based on semantic analysis and information data processing terminal | |
CN112667794A (en) | Intelligent question-answer matching method and system based on twin network BERT model | |
CN110489750A (en) | Burmese participle and part-of-speech tagging method and device based on two-way LSTM-CRF | |
CN111666376B (en) | Answer generation method and device based on paragraph boundary scan prediction and word shift distance cluster matching | |
CN113962219A (en) | Semantic matching method and system for knowledge retrieval and question answering of power transformer | |
CN113159187B (en) | Classification model training method and device and target text determining method and device | |
CN112989033B (en) | Microblog emotion classification method based on emotion category description | |
CN112883175B (en) | Meteorological service interaction method and system combining pre-training model and template generation | |
CN114254208A (en) | Identification method of weak knowledge points and planning method and device of learning path | |
CN115526590B (en) | Efficient person post matching and re-pushing method combining expert knowledge and algorithm | |
CN112463944A (en) | Retrieval type intelligent question-answering method and device based on multi-model fusion | |
CN111552773A (en) | Method and system for searching key sentence of question or not in reading and understanding task | |
CN111368058A (en) | Question-answer matching method based on transfer learning | |
CN113468304A (en) | Construction method of ship berthing knowledge question-answering query system based on knowledge graph | |
CN113934835B (en) | Retrieval type reply dialogue method and system combining keywords and semantic understanding representation | |
CN111666374A (en) | Method for integrating additional knowledge information into deep language model | |
CN116562265A (en) | Information intelligent analysis method, system and storage medium | |
CN115659947A (en) | Multi-item selection answering method and system based on machine reading understanding and text summarization | |
KR20230171234A (en) | Method for Providing Question-and-Answer Service Based on User Participation And Apparatus Therefor | |
CN114491023A (en) | Text processing method and device, electronic equipment and storage medium | |
Marivate et al. | An intelligent multi-agent recommender system for human capacity building | |
CN117574858A (en) | Automatic generation method of class case retrieval report based on large language model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |