CN109472305A - Answer quality determines model training method, answer quality determination method and device - Google Patents
Answer quality determines model training method, answer quality determination method and device Download PDFInfo
- Publication number
- CN109472305A CN109472305A CN201811285467.XA CN201811285467A CN109472305A CN 109472305 A CN109472305 A CN 109472305A CN 201811285467 A CN201811285467 A CN 201811285467A CN 109472305 A CN109472305 A CN 109472305A
- Authority
- CN
- China
- Prior art keywords
- answer
- answer data
- data
- quality
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application provides a kind of answer quality to determine model training method, answer quality determination method and device, wherein, model training method includes: acquisition sample set, it include the corresponding answer data of multiple sample problems in the sample set, wherein, each sample problem is corresponding at least one answer data, and each answer data has corresponding quality annotation information;For each answer data, the answer feature of the corresponding setting number of the answer data is obtained, the feature vector of the answer data is constructed;Feature vector using the answer data is input, the quality annotation information as output, determines that model is trained to the answer quality.The embodiment of the present application can reach determining answer quality, improve the effect for recommending answer accuracy rate.
Description
Technical field
This application involves machine learning techniques field, in particular to a kind of answer quality determine model training method,
Answer quality determination method and device.
Background technique
Community's question and answer as a kind of very popular and practical Internet application, for user provide a RELEASE PROBLEM with
Answer other people the platform of problem, such as Baidu is known, Sina's love is asked, the community Zhi Hudeng answer platform.People not only can be in society
RELEASE PROBLEM is putd question to the information requirement to meet oneself in area's answer platform, but also can be returned by community's answer platform
The problem of other users are putd question to is answered to share the knowledge of oneself;Furthermore the problem of user can also be accumulated to system answer library into
Row retrieval, rapidly to meet oneself information requirement, etc..
In practical applications, there may be multiple and different answers for same problem, such as: multiple people provide respectively to be answered
Case.And it is directed to same problem, the quality of answer is not also identical, such as: due to everyone degree of understanding, the knowledge to problem
Deposit answers the differences such as attitude, causes answer quality also different.In order to facilitate user's use, community's answer platform is needed from institute
A quality and the higher answer of accuracy are filtered out in some answers as the optimum answer of problem shows user.
Summary of the invention
A kind of answer quality of being designed to provide of the embodiment of the present application determines model training method, answer quality determination side
Method and device can reach determining answer quality, improve the accuracy rate for recommending answer.
In a first aspect, the embodiment of the present application, which provides a kind of answer quality, determines model training method, comprising:
Sample set is obtained, includes the corresponding answer data of multiple sample problems in the sample set, wherein every
A sample problem is corresponding at least one answer data, and each answer data is believed with corresponding quality annotation
Breath;
For each answer data, the answer feature of the corresponding setting number of the answer data is obtained, the answer number is constructed
According to feature vector;
Feature vector using the answer data is input, the quality annotation information as output, to the answer quality
Determine that model is trained.
In a kind of optional embodiment, the answer quality determines that model is Random Forest model, and
It is described to the answer quality determine model be trained include: with the feature vector of the answer data be it is defeated
Enter, the quality annotation information is output, at least one decision tree is constructed, based at least one described decision tree, described in building
Random Forest model.
In a kind of optional embodiment, the answer feature include it is following any one or it is a variety of: the answer data
Contents attribute, the answer data is provided the evaluation of user, the time attribute of the answer data, the answer data with
The degree of association, the answer data between its sample problem belonged to and belong to same sample problem other answer datas it
Between the degree of association.
A kind of optional embodiment includes the case where the contents attribute of the answer data: institute for the answer feature
State answer data contents attribute include it is following any one or it is a variety of: the uniform resource locator mark in the answer data
Sign quantity, the quantity of picture in the answer data, in the answer data code snippet quantity, the length of the answer data
The readability of degree, the answer data;
Include the case where providing the evaluation of the user of the answer data for the answer feature:
The evaluation for providing the user of the answer data includes any one following or combination: providing the answer data
User answer the scoring of other problems and/or voting results, provide the user of the answer data scoring and/or throwing putd question to
Ticket result;
Include the case where the time attribute of the answer data for the answer feature: the time of the answer data belongs to
Property includes: that the creation time of the corresponding sample problem of the answer data is poor;
It include the feelings of the degree of association between the answer data and its sample problem belonged to for the answer feature
Condition: the degree of association between the answer data and its sample problem belonged to includes: the answer data and it is belonged to
The similarity of sample problem;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature
The degree of association the case where: under the degree of association between the answer data and other answer datas of the same sample problem of ownership includes
State any one or it is a variety of: the average similarity of the answer data and other answer datas for belonging to same sample problem,
The answer data and belongs to the minimum similarity degree of other answer datas of same sample problem, the answer data and belong to same
Other answer datas of the sample problem that the maximum similarity of other answer datas of one problem, the answer data are belonged to
The order that quantity, the answer data are created in all answer datas of the sample problem belonged to.
In a kind of optional embodiment, the contents attribute for the answer data includes the readability of the answer data
The case where, the readability of the answer data is obtained using following manner: according to the quantity of paragraph in the answer data and
The length of each paragraph determines the readability of the answer data;
It include the answer data and its for the degree of association between the answer data and its sample problem belonged to
The case where similarity of the sample problem belonged to, obtains the answer data using following manner and its sample belonged to is asked
The similarity of topic:
The expression vector of the answer data constructed by term vector based on each word in the answer data, and
The expression vector of the sample problem constructed by the term vector of each word, determines institute in the sample problem that it is belonged to
State the similarity of answer data He its sample problem belonged to;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature
The degree of association, including following answer datas and the average similarity for other answer datas for belonging to same sample problem, institute
State the minimum similarity degrees of other answer datas of answer data and the same sample problem of ownership, the answer data with belong to it is same
The case where at least one of maximum similarity of other answer datas of problem, using following manner obtain the answer data and
Belong to the degree of association between other answer datas of same sample problem:
The expression vector of the answer data constructed by term vector based on each word in the answer data, and
The expression vector of other answer datas constructed by the term vector of each word, determines the answer data in other answer datas
With the similarity of other answer datas.
In a kind of optional embodiment, this method further include: be based on every decision tree, determined using Geordie impurity level method
The significance level of each answer feature in every decision tree;According to the important of the answer feature each in each decision tree
Degree determines the significance level of all answer features.
Second aspect, the embodiment of the present application also provide a kind of answer quality determination method, comprising:
The answer feature for obtaining the setting number of target answer, constructs the feature vector of the target answer;
The feature vector of the target answer is input to true by the answer quality provided by the embodiments of the present application
Determine the answer quality that model training method is trained to determine in model, obtains the quality information of the target answer.
The third aspect, the embodiment of the present application provide a kind of answer quality and determine model training apparatus, comprising:
Module is obtained, for obtaining sample set, including that multiple sample problems are corresponding in the sample set is answered
Case data, wherein each sample problem is corresponding at least one answer data, and each answer data has pair
The quality annotation information answered;
First eigenvector constructs module, for being directed to each answer data, obtains the corresponding setting number of the answer data
Purpose answer feature, constructs the feature vector of the answer data;
Training module, for using the feature vector of the answer data be input, the quality annotation information as output, it is right
The answer quality determines that model is trained.
In a kind of optional embodiment, the answer quality determines that model is Random Forest model, and
Training module, for using following manner to determine that model is trained to the answer quality: with the answer number
According to feature vector be input, the quality annotation information is output, construct at least one decision tree, based on it is described at least one
Decision tree constructs the Random Forest model.
In a kind of optional embodiment, the answer feature include it is following any one or it is a variety of: the answer data
Contents attribute, the answer data is provided the evaluation of user, the time attribute of the answer data, the answer data with
The degree of association, the answer data between its sample problem belonged to and belong to same sample problem other answer datas it
Between the degree of association.
In a kind of optional embodiment, the contents attribute of the answer data is included the case where for the answer feature:
The contents attribute of the answer data include it is following any one or it is a variety of: the unified money in the answer data
The quantity of picture in source finger URL number of labels, the answer data, described is answered the quantity of code snippet in the answer data
The readability of the length of case data, the answer data;
Include the case where providing the evaluation of the user of the answer data for the answer feature:
The evaluation for providing the user of the answer data includes any one following or combination: providing the answer data
User answer the scoring of other problems and/or voting results, provide the user of the answer data scoring and/or throwing putd question to
Ticket result;
Include the case where the time attribute of the answer data for the answer feature: the time of the answer data belongs to
Property includes: that the creation time of the corresponding sample problem of the answer data is poor;
It include the feelings of the degree of association between the answer data and its sample problem belonged to for the answer feature
Condition: the degree of association between the answer data and its sample problem belonged to includes: the answer data and it is belonged to
The similarity of sample problem;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature
The degree of association the case where: under the degree of association between the answer data and other answer datas of the same sample problem of ownership includes
State any one or it is a variety of: the average similarity of the answer data and other answer datas for belonging to same sample problem,
The answer data and belongs to the minimum similarity degree of other answer datas of same sample problem, the answer data and belong to same
Other answer datas of the sample problem that the maximum similarity of other answer datas of one problem, the answer data are belonged to
The order that quantity, the answer data are created in all answer datas of the sample problem belonged to.
In a kind of optional embodiment, the contents attribute for the answer data includes the readability of the answer data
The case where, the readability of the answer data is obtained using following manner: according to the quantity of paragraph in the answer data and
The length of each paragraph determines the readability of the answer data;
It include the answer data and its for the degree of association between the answer data and its sample problem belonged to
The case where similarity of the sample problem belonged to, obtains the answer data using following manner and its sample belonged to is asked
The similarity of topic: the expression vector of the answer data constructed by the term vector based on each word in the answer data,
And in the sample problem that it is belonged to the sample problem constructed by the term vector of each word expression vector, really
The similarity of the fixed answer data and its sample problem belonged to;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature
The degree of association, including following answer datas and the average similarity for other answer datas for belonging to same sample problem, institute
State the minimum similarity degrees of other answer datas of answer data and the same sample problem of ownership, the answer data with belong to it is same
The case where at least one of maximum similarity of other answer datas of problem, using following manner obtain the answer data and
Belong to the degree of association between other answer datas of same sample problem:
The expression vector of the answer data constructed by term vector based on each word in the answer data, and
The expression vector of other answer datas constructed by the term vector of each word, determines the answer data in other answer datas
With the similarity of other answer datas.
In a kind of optional embodiment, the device further include: significance level determining module is used for, and is based on every decision tree,
The significance level of each answer feature in every decision tree is determined using Geordie impurity level method;According in each decision tree
The significance level of each answer feature determines the significance level of all answer features.
Fourth aspect, the embodiment of the present application also provide a kind of answer quality determining device, comprising:
Second feature vector constructs module, and the answer feature of the setting number for obtaining target answer constructs the target
The feature vector of answer;
Determining module, for being input to the feature vector of the target answer by claim 1-6 any one institute
The answer quality that the answer quality stated determines that model training method is trained determines in model, obtains the target answer
Quality information.
5th aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: processor, memory and bus, it is described to deposit
Reservoir is stored with the executable machine readable instructions of processor, when electronic equipment operation, the processor and the memory
Between by bus communication, the machine readable instructions execute above-mentioned first aspect any possibility when being executed by the processor
Embodiment in step.
6th aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer readable storage medium
On be stored with computer program, which executes any possible implementation of above-mentioned first aspect when being run by processor
Step in mode.
7th aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: processor, memory and bus, it is described to deposit
Reservoir is stored with the executable machine readable instructions of processor, when electronic equipment operation, the processor and the memory
Between by bus communication, the machine readable instructions execute above-mentioned second aspect any possibility when being executed by the processor
Embodiment in step.
Eighth aspect, the embodiment of the present application also provide a kind of computer readable storage medium, the computer-readable storage medium
Computer program is stored in matter, which executes above-mentioned second aspect any possible reality when being run by processor
Apply the step in mode.
The corresponding answer data of multiple sample problems that the embodiment of the present application includes in the sample set by acquisition,
The feature vector for characterizing each answer data is constructed, and is input with the feature vector of answer data, is believed with quality annotation
Breath is output, determines that model is trained to answer quality, and answer quality is enabled to determine that model learns to optimum answer to have
Standby feature determines that model determines the quality of answer by the answer quality;In this way, for newly generated answer, it also being capable of base
Its quality is determined in trained model.
To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate
Appended attached drawing, is described in detail below.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached
Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows the flow chart that a kind of answer quality provided by the embodiment of the present application one determines model training method;
Fig. 2 shows the flow charts of answer quality determination method provided by the embodiment of the present application two;
Fig. 3 shows the schematic diagram that a kind of answer quality provided by the embodiment of the present application three determines model training apparatus;
Fig. 4 shows the structural schematic diagram of a kind of electronic equipment 400 provided by the embodiment of the present application four.
Fig. 5 shows a kind of schematic diagram of answer quality determining device provided by the embodiment of the present application five;
Fig. 6 shows the schematic diagram of a kind of electronic equipment 600 provided by the embodiment of the present application six.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
Middle attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only
It is some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is real
The component for applying example can be arranged and be designed with a variety of different configurations.Therefore, below to the application's provided in the accompanying drawings
The detailed description of embodiment is not intended to limit claimed scope of the present application, but is merely representative of the selected reality of the application
Apply example.Based on embodiments herein, those skilled in the art institute obtained without making creative work
There are other embodiments, shall fall in the protection scope of this application.
A kind of answer quality provided by the present application determines that model training, answer quality determine model and device, can be preparatory
By the corresponding answer data of multiple sample problems for including in the sample set of acquisition, construct for characterizing each answer
The feature vector of data, and be input with the feature vector of answer data, it is output with quality annotation information, it is true to answer quality
Cover half type is trained, and answer quality is enabled to determine that model learns the feature having to optimum answer, passes through the answer matter
Measure the quality for determining that model determines answer;In this way, the quality of newly generated answer also can determine for newly generated answer,
Such as: determine optimum answer etc..
For convenient for understanding the present invention, model training side is determined to a kind of answer quality disclosed in this invention first
Method describes in detail, it should be noted that the present invention can not only be applied in community's question answering system, additionally it is possible to be used for other
Determine in answer quality or the scene of determining optimum answer.
Embodiment one
The answer quality that Fig. 1 shows the offer of the embodiment of the present application one determines the flow chart of model training method;The application
The answer quality that embodiment one provides determines that model training method includes S101~S1103.
S101: obtaining sample set, includes the corresponding answer data of multiple sample problems in the sample set,
In, each sample problem is corresponding at least one answer data, and each answer data has corresponding quality
Markup information.
Sample set contains the answer data marked, for example, it may be obtained from community's answer platform
The answer data marked, or obtain answer and then be labeled.Here, mark can be artificial mark,
Such as: by the relevant personnel from the answer for corresponding to a certain problem, determining the quality of answer and/or select optimum answer, either
By method of voting, i.e., is voted by the people of the problem of checking and associated answer answer, determine the quality and/or determination of answer
Optimum answer.The mark of answer data can also be carried out by other means, such as: utilize neural network model and/or semanteme
The means such as analysis are based on feature vector, calculate the correlation of answer and problem, determine the quality of answer and/or select and most preferably answer
Case, etc..
When obtaining sample set, multiple sample problems first can be determined based on certain standard.In operation, may be used
With based on the time (such as: recent effective question and answer), answer data quantity (such as: the answer of the problem needs to meet setting
Quantity), the factors such as the markup information of answer (such as: the case of someone's answer has carried out effective mark) determine sample problem.
After sample problem has been determined, sample answer data can be determined.It herein, can for a sample problem
It, can also be corresponding all from the sample problem using the corresponding all answers of the sample problem all as a part of sample set
In answer, a part of a part of answer as sample set is screened.
As previously described, the quality annotation information of the corresponding answer data of sample problem can be and carry out answer number
When according to acquisition, the information that answer data has been provided with is also possible to after obtaining answer data, marks for answer data
Information.
S102: being directed to each answer data, obtains the answer feature of the corresponding setting number of the answer data, constructs this and answer
The feature vector of case data.Specific implementation when, answer feature include it is following any one or it is a variety of:
A, the contents attribute of the answer data.Herein, the contents attribute of answer data is commonly used in characterization answer data
The abundant degree of covered content;In general, answer data is covered under the premise of the content of answer data is without mistake
Content it is abundanter, corresponding answer quality is higher.Therefore in some embodiments of the application, by the contents attribute of answer data
As the feature for measuring answer data quality.Specifically, the contents attribute of answer data may include it is following any one or
It is a variety of:
1.: uniform resource locator (Uniform Resource Locator, URL) number of tags in the answer data
Amount.URL is the expression succinct to the position for the resource that can be obtained from internet and one kind of access method, in answer data
In, the quantity of URL can characterize the abundant degree of the covered content of answer data to a certain extent, and to Mr. Yu
The clear degree that a little contents or concept are described.
When obtaining the URL of answer data, key search method can be used, it is first determined commonly used key in URL
Character, such as " http ", " ftp " are used to characterize the character of transport protocol, "/", " " etc. for indicate in URL different piece it
Between the character etc. that separates;Then it according to determining key character, is retrieved from answer data, to obtain in answer data
URL number of labels.
2., in the answer data picture quantity.The number of picture number, can also characterize answer to a certain extent
The abundant degree of the covered content of data, and for clear degree that certain contents or concept are described.
3., in answer data code snippet quantity.
4., the length of answer data.It in some embodiments, can be using the quantity of word in answer data as answer
Length;It can also length by the file size of answer data, as answer;Can also will own included in answer data
The number of characters of content, the length as answer.
For the different representations of answer length, the length of answer data has different acquisition modes.For example, being directed to
Using the quantity of word in answer data as the length of answer, word segmentation processing can be carried out to the content of answer data, obtain structure
At the word collection of answer, the quantity of word is then concentrated by statistics word, obtains the length of answer data;For by answer number
According to length of the file size as answer the case where, the file attribute of answer data can be read directly, obtain answer data
File size;It, can be with for using the number of characters of all the elements included in answer data as the case where the length of answer
Directly read the number of characters of all the elements included in answer data.
5., the readability of the answer data.In some embodiments of the application, the readability of answer data refers to reading
The complexity of answer.For example, can be read using the length of paragraph longest in answer data as being used to characterize answer data
It is difficult to the readability of degree;Or it can be using the average length of paragraph each in answer data as being used to characterize answer number
According to the readability for reading the degree that is difficult to.
When obtaining the readability of answer data, the quantity of paragraph and each section can be determined in answer data first
The length fallen determines the readability of answer data then according to the quantity of determining paragraph and the length of paragraph.Herein, section
The length fallen can be the quantity for the word for including in the number of characters or paragraph of paragraph.
B, the evaluation of the user of the answer data is provided.Herein, it is considered that be good at the user of answer and/or enquirement more
It is possible that giving mass higher answer data.Therefore in order to describe the quality of answer data, the answer number can will be provided
According to user evaluation as measurement answer data quality a kind of answer feature.Following a kind of or combination conduct can be passed through
The evaluation of the user for the data that furnish an answer:
1., provide state answer data user answer other problems scoring and/or voting results.This feature is for characterizing
Rationality of the user for the data that furnish an answer when answering other problems.For example, can be with the use of the user for the data that furnish an answer
Name in an account book or identity obtain from the database of internet platform as search condition and answer other problems with the user
Scoring and/or voting results.
2., scoring and/or voting results that the user of the answer data puts question to are provided.This feature is answered for characterizing to provide
The user of case data in terms of proposing significant problem and dynamics.When the method and acquisition user answer other problems of acquisition
Rationality method it is similar, details are not described herein.
C, the time attribute of the answer data.Herein, generally by the creation of the corresponding sample answer of answer data
Time difference, the time attribute as answer data.
When obtaining the time attribute of answer data, the corresponding sample of answer data can be obtained from internet platform
The creation time of this answer, and according to the creation time of the corresponding sample answer of answer data, obtain answer data and its
The creation time of corresponding sample answer is poor.
D, the degree of association between the answer data and its sample problem belonged to.Herein, answer data is generally used
The similarity between sample problem belonged to it, to characterize answer evidence and the degree of association between sample problem.And similarity
Higher, then the degree of association between answer data and sample problem is also higher;Similarity is lower, then answer data and sample problem
Between the degree of association it is also lower.
Specifically, answer data and its can be determined based on the expression vector of answer data and the expression vector of sample data
The similarity between sample problem belonged to indicates that vector can the mode based on semantic analysis and/or neural metwork training
It determines, such as: word segmentation processing is carried out to answer data, sample problem, extracts keyword, keyword is subjected to word insertion, is obtained
Obtained term vector is input in neural network training by corresponding term vector, obtain answer data, sample problem expression to
Amount.
According to the expression vector of the sample problem for indicating vector and its ownership of answer data, between the two similar is calculated
Degree.Herein, similarity is indicated by any one following measuring similarity: Euclidean distance, manhatton distance, Chebyshev away from
From, Minkowski Distance, standardization Euclidean distance, mahalanobis distance, included angle cosine, Hamming distance, Jie Kade distance or outstanding block
Moral similarity factor, related coefficient or correlation distance and comentropy, etc..
E, the degree of association between the answer data and other answer datas of the same sample problem of ownership.
Wherein, since the partial answer data for answering same sample problem would generally have certain relevance.Therefore, it answers
The degree of association between case data and other answer datas of the same sample problem of ownership can characterize answer number to a certain extent
According to quality.Under normal circumstances, the degree of association is bigger, then it is assumed that the quality of answer data is higher.The answer data and ownership are same
The degree of association between other answer datas of one sample problem include it is following any one or it is a variety of:
1., the answer data and belong to same sample problem other answer datas average similarity.
2., the answer data and belong to same sample problem other answer datas minimum similarity degree.
3., the answer data with belong to same problem other answer datas maximum similarity.
In specific implementation, when the degree of association packet between answer data and other answer datas of the same sample problem of ownership
When including in above-mentioned E in 1., 2. and 3. any one, answer data first can be obtained using following manner and be asked with the same sample of ownership
Similarity between other answer datas of topic: it is answered described in constructed by the term vector based on each word in the answer data
The expression of other answer datas constructed by the term vector of each word in the expression vector and other answer datas of case data
Vector determines the similarity of the answer data He other answer datas.Herein, the acquisition modes of the expression vector of answer data
Similar with the expression acquisition modes of vector of answer data in above-mentioned D, details are not described herein.
After obtaining the similarity between other answer datas that answer data belongs to same sample problem, for answer number
Include the case where in above-mentioned E 1. according to the similarity between other answer datas for belonging to same sample problem, according to answer number
According to the similarity between other answer datas for belonging to same sample problem, the average similarity is calculated;For answer data
Belong to same sample problem other answer datas between similarity include the case where in above-mentioned E 2., from answer data with
Belong in the similarity between other answer datas of same sample problem, determines the smallest value as minimum similarity degree;For
3. similarity between answer data and other answer datas of the same sample problem of ownership is included the case where in above-mentioned E, from answering
In similarity between case data and other answer datas of the same sample problem of ownership, determine maximum value as maximum similar
Degree.
4., the quantity of other answer datas of sample problem that is belonged to of the answer data;
5., the order that is created in all answer datas of the sample problem belonged to of the answer data.Herein, may be used
It is determined with obtaining the time of each answer data creation first then according to the sequencing of each answer data creation time
The order that answer data is created in all answer datas of the sample problem belonged to.
After the feature vector for constructing each answer data, answer quality provided by the embodiments of the present application determines that model is instructed
Practice method further include:
S103: the feature vector using the answer data is input, the quality annotation information as output, to the answer
Quality determines that model is trained.In specific implementation, answer quality determines that model includes: Logic Regression Models, autoregression mould
Type, ARMA model, integrates rolling average autoregression model, the different side of broad sense autoregressive conditions at moving average model(MA model)
Differential mode type, deep learning model, decision-tree model, any one in Random Forest model.
When answer quality determines that model includes: Logic Regression Models, autoregression model, moving average model(MA model), autoregression shifting
It is true to answer quality when moving averaging model, integrate rolling average autoregression model, EC GARCH
The process that cover half type is trained are as follows: using the feature vector of answer data as the value of explanatory variable, and by quality annotation information
As the value of explained variable, the unknown parameter in model, which solves, to be determined to answer quality.
Specifically, following manner can be used, the unknown parameter in model, which solves, to be determined to answer quality: according to sample
The feature vector for all answer datas for including in this set constructs explanatory variable matrix, and corresponding according to each answer data
Quality annotation information, construct explained variable matrix, and the unknown parameter in model is determined according to answer quality, building ginseng
Then matrix number uses explanatory variable matrix and explained variable matrix, solve parameter matrix.
When answer quality determines that model includes: deep learning model, model, which is trained, to be determined to answer quality
Process are as follows: the feature vector of answer data is input in deep learning model, the quality determination results of answer data are obtained.
According to the quality determination results of each answer data and corresponding quality annotation information, deep learning model is trained.
Wherein, the process being trained to deep learning model is exactly the parameter of percentage regulation learning model, so that depth
Learning model is the quality determination results that answer data determines, the process that can be consistent as far as possible with quality annotation information.
When answer quality determines that model includes: Random Forest model, model, which is trained, to be determined to answer quality
Process are as follows: the feature vector using the answer data be input, the quality annotation information as output, construct at least one and determine
Plan tree constructs the Random Forest model based at least one described decision tree.
When specific implementation, when constructing every decision tree, first from the feature vector of answer data, really
Determine input of the multiple elements of any position as this decision tree, and from the sample set, selection is arbitrarily multiple to be answered
Case data, as the target training data of this decision tree, according to multiple members of any position determined for target training data
Input of the element as the decision tree, and the corresponding quality of target training data is determined into output of the model as the decision tree, structure
Build the decision tree.
For example, the feature vector of answer data includes 15 elements, respectively U1~U15.Answer data include: A1~
A1000 totally one thousand answer datas.
It is and U1~U5 is true using A1~A100 as the target training data of building M1 when constructing first decision tree M1
It is set to input when building M1, using the quality annotation information of A1~A100 as the output of M1, constructs M1;Construct second decision
When setting M2, using A1~A100 as the target training data of building M1, and input when U3~U8 is determined as constructing M2 is said, by A1
Output of the quality annotation information of~A100 as M2 constructs M2;Construct third decision tree M3 when, using A101~A200 as
U1~U5 is determined as constructing input when M3, by the quality annotation information of A101~A200 by the target training data for constructing M3
As the output of M3, M3 is constructed.When constructing the 4th decision tree M4, using A101~A200 as the target training number of building M4
According to input when U6~U10 to be determined as to building M4 is constructed using the quality annotation information of A101~A200 as the output of M4
M4.After constructing an at least decision tree, at least decision tree based on building constructs Random Forest model.
It should be noted that each sample problem is corresponding at least one answer data, there is usually one belong to most
Good answer, others are all non-optimum answers, can be answered non-optimal using the answer data for belonging to optimum answer as positive sample
Negative sample is made in the answer of case, and the quantity of negative sample may be much larger than the quantity of positive sample, this will lead to the unbalanced problem of classification.
In response to this, since the quantity of negative sample is much larger than the quantity of positive sample, in order to enable the quantity of positive sample and negative sample
Reach a more balanced state, lack sampling processing can be carried out for the negative sample in sample data.To negative sample into
When the processing of row lack sampling, it can be and extract negative sample identical with the quantity of positive sample out at random from negative sample, as instruction
The training sample used when practicing Random Forest model.
In addition, answer quality provided by the embodiments of the present application determines in model training after constructing Random Forest model,
Further include: it is based on every decision tree, the weight of each answer feature in every decision tree is determined using Geordie impurity level method
Want degree;According to the significance level of the answer feature each in each decision tree, the important of all answer features is determined
Degree.
Geordie impurity level refers to and applies certain result in set in set at random, a certain data item it is pre-
Period error rate.Geordie impurity level is smaller, and purity is higher, and the order degree of sample set is higher, obtained Random Forest model
The effect of classification is better.Based on the process, the Random Forest model of generation can be verified.If the random forest generated
The Geordie impurity level of model is relatively high, then it is assumed that the precision for being currently generated Random Forest model is lower.It can regenerate new
Random Forest model, the answer quality for having obtained meeting required precision determines model.
When answer quality determines that model includes: decision-tree model, it can be regarded as more special random gloomy
Woods model is all to make whole elements of the feature vector of all answer datas in sample set when constructing decision tree
Decision-tree model is constructed using the corresponding quality annotation information of each answer data as output for the input of decision-tree model.
Trained answer quality determines that model can learn the feature having to optimum answer as a result, and can determine that and answer
Whether case is optimum answer.
Embodiment two
Shown in Figure 2, the embodiment of the present application two also provides a kind of answer quality determination method, including S201~S202:
S201: the answer feature of the setting number of target answer is obtained, the feature vector of the target answer is constructed.Herein,
The generation method of the feature vector of target answer determines that method is similar with the feature vector of answer data, and details are not described herein.
S202: the feature vector of the target answer is input to through the answer matter provided by the embodiments of the present application
It measures the answer quality for determining that model training method is trained to determine in model, obtains the quality information of the target answer.
The embodiment of the present application first passes through the corresponding answer of multiple sample problems for including in the sample set of acquisition in advance
Data construct the feature vector for characterizing each answer data, and are input with the feature vector of answer data, with quality mark
Infusing information is output, determines that model is trained to answer quality, and answer quality is enabled to determine that model learns to most preferably answering
The feature that case has determines that model determines the quality of target answer by the answer quality, has higher accuracy;Meanwhile
As long as there is new target answer to generate, it will be able to directly determine whether newly generated target answer is optimum answer, have higher
Efficiency.
Based on the same inventive concept, it is additionally provided in the embodiment of the present application and determines that model training method is corresponding with answer quality
Answer quality determine model training apparatus, the principle and the application solved the problems, such as due to the device in the embodiment of the present application is implemented
The above-mentioned answer quality of example determines that model training method is similar, therefore the implementation of device may refer to the implementation of method, repeats place
It repeats no more.
Embodiment three
Shown in Figure 3, the embodiment of the present application three provides a kind of answer quality and determines model training apparatus, comprising:
Module 31 is obtained, includes that multiple sample problems are corresponding for obtaining sample set, in the sample set
Answer data, wherein each sample problem is corresponding at least one answer data, and each answer data has
Corresponding quality annotation information;
First eigenvector constructs module 32, for being directed to each answer data, obtains the corresponding setting of the answer data
The answer feature of number, constructs the feature vector of the answer data;
Training module 33, for using the feature vector of the answer data be input, the quality annotation information as output,
Model, which is trained, to be determined to the answer quality.
In a kind of optional embodiment, the answer quality determines that model is Random Forest model, and
Training module 33, for using following manner to determine that model is trained to the answer quality: with the answer
The feature vector of data is input, the quality annotation information is output, constructs at least one decision tree, based on described at least one
A decision tree constructs the Random Forest model.
In a kind of optional embodiment, the answer feature include it is following any one or it is a variety of: the answer data
Contents attribute, the answer data is provided the evaluation of user, the time attribute of the answer data, the answer data with
The degree of association, the answer data between its sample problem belonged to and belong to same sample problem other answer datas it
Between the degree of association.
In a kind of optional embodiment, the contents attribute of the answer data is included the case where for the answer feature:
The contents attribute of the answer data include it is following any one or it is a variety of: the unified money in the answer data
The quantity of picture in source finger URL number of labels, the answer data, described is answered the quantity of code snippet in the answer data
The readability of the length of case data, the answer data;
Include the case where providing the evaluation of the user of the answer data for the answer feature:
The evaluation for providing the user of the answer data includes any one following or combination: providing the answer data
User answer the scoring of other problems and/or voting results, provide the user of the answer data scoring and/or throwing putd question to
Ticket result;
Include the case where the time attribute of the answer data for the answer feature:
The time attribute of the answer data includes: the creation time of the corresponding sample problem of the answer data
Difference;
It include the feelings of the degree of association between the answer data and its sample problem belonged to for the answer feature
Condition:
The degree of association between the answer data and its sample problem belonged to includes: the answer data and it is returned
The similarity of the sample problem of category;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature
The degree of association the case where:
The degree of association between the answer data and other answer datas of the same sample problem of ownership includes following any
One or more kinds of: the average similarity of the answer data and other answer datas of the same sample problem of ownership described is answered
Case data and belongs to the minimum similarity degree of other answer datas of same sample problem, the answer data and belong to same problem
Other answer datas other answer datas of sample problem for being belonged to of maximum similarity, the answer data quantity,
The order that the answer data is created in all answer datas of the sample problem belonged to.
In a kind of optional embodiment, the contents attribute for the answer data includes the readability of the answer data
The case where, the readability of the answer data is obtained using following manner:
According to the quantity of paragraph in the answer data and the length of each paragraph, the readable of the answer data is determined
Property;
It include the answer data and its for the degree of association between the answer data and its sample problem belonged to
The case where similarity of the sample problem belonged to, obtains the answer data using following manner and its sample belonged to is asked
The similarity of topic:
The expression vector of the answer data constructed by term vector based on each word in the answer data, and
The expression vector of the sample problem constructed by the term vector of each word, determines institute in the sample problem that it is belonged to
State the similarity of answer data He its sample problem belonged to;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature
The degree of association, including following answer datas and the average similarity for other answer datas for belonging to same sample problem, institute
State the minimum similarity degrees of other answer datas of answer data and the same sample problem of ownership, the answer data with belong to it is same
The case where at least one of maximum similarity of other answer datas of problem, using following manner obtain the answer data and
Belong to the degree of association between other answer datas of same sample problem:
The expression vector of the answer data constructed by term vector based on each word in the answer data, and
The expression vector of other answer datas constructed by the term vector of each word, determines the answer data in other answer datas
With the similarity of other answer datas.
In a kind of optional embodiment, the device further include: significance level determining module 34 is used for:
Based on every decision tree, the weight of each answer feature in every decision tree is determined using Geordie impurity level method
Want degree;
According to the significance level of the answer feature each in each decision tree, the important of all answer features is determined
Degree.
The corresponding answer data of multiple sample problems that the embodiment of the present application includes in the sample set by acquisition,
The feature vector for characterizing each answer data is constructed, and is input with the feature vector of answer data, is believed with quality annotation
Breath is output, determines that model is trained to answer quality, and answer quality is enabled to determine that model learns to optimum answer to have
Standby feature determines that model determines the quality of answer by the answer quality, for newly generated answer, is able to determine whether
For optimum answer.
Example IV
As shown in figure 4, the schematic diagram of the electronic equipment 400 provided for the embodiment of the present application four, the electronic equipment 400: packet
Processor 41, memory 42 and bus 43 are included, the storage of memory 42 executes instruction, when described device operation, the place
It is communicated between reason device 41 and the memory 42 by bus 43, the processor 41 executes described execute instruction so that the dress
It sets and executes following method:
Sample set is obtained, includes the corresponding answer data of multiple sample problems in the sample set, wherein every
A sample problem is corresponding at least one answer data, and each answer data is believed with corresponding quality annotation
Breath;
For each answer data, the answer feature of the corresponding setting number of the answer data is obtained, the answer number is constructed
According to feature vector;
Feature vector using the answer data is input, the quality annotation information as output, to the answer quality
Determine that model is trained.
Optionally, in the method that the processor 41 executes, the answer quality determines that model is random forest mould
Type, and
It is described to the answer quality determine model be trained include: with the feature vector of the answer data be it is defeated
Enter, the quality annotation information is output, at least one decision tree is constructed, based at least one described decision tree, described in building
Random Forest model.
Optionally, in the method that the processor 41 executes, the answer feature include it is following any one or
A variety of: the contents attribute of the answer data, the evaluation of user for providing the answer data, the time of the answer data belong to
Property, the degree of association between the answer data and its sample problem belonged to, the answer data ask with same sample is belonged to
The degree of association between other answer datas of topic.
It optionally, include the answer data for the answer feature in the method that the processor 41 executes
Contents attribute the case where:
The contents attribute of the answer data include it is following any one or it is a variety of: the unified money in the answer data
The quantity of picture in source finger URL number of labels, the answer data, described is answered the quantity of code snippet in the answer data
The readability of the length of case data, the answer data;
Include the case where providing the evaluation of the user of the answer data for the answer feature: the answer number is provided
According to the evaluation of user include any one following or combinations: the user for providing the answer data answers commenting for other problems
Point and/or voting results, provide scoring and/or voting results that the user of the answer data puts question to;
Include the case where the time attribute of the answer data for the answer feature: the time of the answer data belongs to
Property includes: that the creation time of the corresponding sample problem of the answer data is poor;
It include the feelings of the degree of association between the answer data and its sample problem belonged to for the answer feature
Condition: the degree of association between the answer data and its sample problem belonged to includes: the answer data and it is belonged to
The similarity of sample problem;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature
The degree of association the case where: under the degree of association between the answer data and other answer datas of the same sample problem of ownership includes
State any one or it is a variety of: the average similarity of the answer data and other answer datas for belonging to same sample problem,
The answer data and belongs to the minimum similarity degree of other answer datas of same sample problem, the answer data and belong to same
Other answer datas of the sample problem that the maximum similarity of other answer datas of one problem, the answer data are belonged to
The order that quantity, the answer data are created in all answer datas of the sample problem belonged to.
Optionally, in the method that the processor 41 executes, the contents attribute for the answer data includes institute
The readable situation for stating answer data, the readability of the answer data is obtained using following manner: according to the answer number
According to the quantity of middle paragraph and the length of each paragraph, the readability of the answer data is determined;
It include the answer data and its for the degree of association between the answer data and its sample problem belonged to
The case where similarity of the sample problem belonged to, obtains the answer data using following manner and its sample belonged to is asked
The similarity of topic: the expression vector of the answer data constructed by the term vector based on each word in the answer data,
And in the sample problem that it is belonged to the sample problem constructed by the term vector of each word expression vector, really
The similarity of the fixed answer data and its sample problem belonged to;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature
The degree of association, including following answer datas and the average similarity for other answer datas for belonging to same sample problem, institute
State the minimum similarity degrees of other answer datas of answer data and the same sample problem of ownership, the answer data with belong to it is same
The case where at least one of maximum similarity of other answer datas of problem, using following manner obtain the answer data and
Belong to the degree of association between other answer datas of same sample problem: the term vector based on each word in the answer data
In the expression vector and other answer datas of the constructed answer data constructed by the term vector of each word other
The expression vector of answer data, determines the similarity of the answer data He other answer datas.
Optionally, in the method that the processor 41 executes, this method further include: be based on every decision tree, use
Geordie impurity level method determines the significance level of each answer feature in every decision tree;According to each in each decision tree
The significance level of the answer feature determines the significance level of all answer features.
The embodiment of the present application also provides a kind of computer readable storage medium, stored on the computer readable storage medium
There is computer program, which executes the step that above-mentioned answer quality determines model training method when being run by processor 41
Suddenly.Specifically, which can be general storage medium, such as mobile disk, hard disk, the calculating on the storage medium
When machine program is run, it is able to carry out above-mentioned answer quality and determines model training method, so that it is determined that answer quality and/or determination
Optimum answer.
Based on the same inventive concept, answer matter corresponding with answer quality determination method is additionally provided in the embodiment of the present application
Determining device is measured, the principle and the above-mentioned answer quality of the embodiment of the present application solved the problems, such as due to the device in the embodiment of the present application is true
Determine that method is similar, therefore the implementation of device may refer to the implementation of method, overlaps will not be repeated.
Embodiment five
Shown in Figure 5, the embodiment of the present application four provides a kind of answer quality determining device, comprising:
Second feature vector constructs module 51, and the answer feature of the setting number for obtaining target answer constructs the mesh
Mark the feature vector of answer;
Determining module 52, for being input to the feature vector of the target answer by claim 1-6 any one
The answer quality that the answer quality determines that model training method is trained determines in model, obtains the target answer
Quality information.
The embodiment of the present application first passes through the corresponding answer of multiple sample problems for including in the sample set of acquisition in advance
Data construct the feature vector for characterizing each answer data, and are input with the feature vector of answer data, with quality mark
Infusing information is output, determines that model is trained to answer quality, and answer quality is enabled to determine that model learns to most preferably answering
The feature that case has determines that model determines the quality of target answer by the answer quality, and can determine newly generated mesh
Mark whether answer is optimum answer.
Embodiment six
As shown in fig. 6, the schematic diagram of the electronic equipment 600 provided for the embodiment of the present application six, the electronic equipment 600: packet
Processor 61, memory 62 and bus 63 are included, the storage of memory 62 executes instruction, when described device operation, the place
It is communicated between reason device 61 and the memory 62 by bus 63, the processor 61 executes described execute instruction so that the dress
It sets and executes following method:
The answer feature for obtaining the setting number of target answer, constructs the feature vector of the target answer;
The feature vector of the target answer is input to, model is determined by answer quality provided by the embodiments of the present application
The answer quality that training method is trained determines in model, obtains the quality information of the target answer.
The embodiment of the present application also provides a kind of computer readable storage medium, stored on the computer readable storage medium
There is computer program, which executes the step that above-mentioned answer quality determines model training method when being run by processor 61
Suddenly.
Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium
Computer program when being run, above-mentioned answer quality determination method is able to carry out, to solve the current optimum answer that determines
Method has that efficiency and accuracy rate are low, and then achievees the effect that improve efficiency and accuracy rate that optimum answer determines.
Answer quality provided by the embodiment of the present application determines model training method, answer quality determination method and device
Computer program product, the computer readable storage medium including storing program code, the instruction that said program code includes
It can be used for executing previous methods method as described in the examples, specific implementation can be found in embodiment of the method, and details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description
It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.In the application
In provided several embodiments, it should be understood that disclosed systems, devices and methods, it can be real by another way
It is existing.The apparatus embodiments described above are merely exemplary, for example, the division of the unit, only a kind of logic function
It can divide, there may be another division manner in actual implementation, in another example, multiple units or components can combine or can collect
At another system is arrived, or some features can be ignored or not executed.Another point, shown or discussed mutual coupling
Conjunction or direct-coupling or communication connection can be the indirect coupling or communication connection by some communication interfaces, device or unit,
It can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.In addition, each functional unit in each embodiment of the application can integrate in one processing unit, it is also possible to each
Unit physically exists alone, and can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in the executable non-volatile computer-readable storage medium of a processor.Based on this understanding, the application
Technical solution substantially the part of the part that contributes to existing technology or the technical solution can be with software in other words
The form of product embodies, which is stored in a storage medium, including some instructions use so that
One computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment institute of the application
State all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only
Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. is various to deposit
Store up the medium of program code.
Finally, it should be noted that embodiment described above, the only specific embodiment of the application, to illustrate the application
Technical solution, rather than its limitations, the protection scope of the application is not limited thereto, although with reference to the foregoing embodiments to this Shen
It please be described in detail, those skilled in the art should understand that: anyone skilled in the art
Within the technical scope of the present application, it can still modify to technical solution documented by previous embodiment or can be light
It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make
The essence of corresponding technical solution is detached from the spirit and scope of the embodiment of the present application technical solution, should all cover the protection in the application
Within the scope of.Therefore, the protection scope of the application shall be subject to the protection scope of the claim.
Claims (10)
1. a kind of answer quality determines model training method characterized by comprising
Sample set is obtained, includes the corresponding answer data of multiple sample problems in the sample set, wherein Mei Gesuo
It states sample problem and is corresponding at least one answer data, each answer data has corresponding quality annotation information;
For each answer data, the answer feature of the corresponding setting number of the answer data is obtained, the answer data is constructed
Feature vector;
Feature vector using the answer data is input, the quality annotation information as output, is determined to the answer quality
Model is trained.
2. the method according to claim 1, wherein the answer quality determine model be Random Forest model,
And
It is described that determine that model is trained to the answer quality include: using the feature vector of the answer data as input, institute
Quality annotation information is stated as output, constructs at least one decision tree, based at least one described decision tree, is constructed described random gloomy
Woods model.
3. the method according to claim 1, wherein the answer feature include it is following any one or it is more
Kind: the contents attribute of the answer data, the evaluation of user for providing the answer data, the time of the answer data belong to
Property, the degree of association between the answer data and its sample problem belonged to, the answer data ask with same sample is belonged to
The degree of association between other answer datas of topic.
4. according to the method described in claim 3, it is characterized in that, including the interior of the answer data for the answer feature
The case where holding attribute:
The contents attribute of the answer data include it is following any one or it is a variety of: unified resource in the answer data is fixed
Position symbol number of labels, the quantity of picture in the answer data, the quantity of code snippet, the answer number in the answer data
According to length, the readability of the answer data;
Include the case where providing the evaluation of the user of the answer data for the answer feature:
The evaluation for providing the user of the answer data includes any one following or combination: providing the use of the answer data
Answer scoring and/or the ballot knot of the scoring of other problems and/or user's enquirement of voting results, the offer answer data in family
Fruit;
Include the case where the time attribute of the answer data for the answer feature:
The time attribute of the answer data includes: that the creation time of the corresponding sample problem of the answer data is poor;
Include the case where the degree of association between the answer data and its sample problem belonged to for the answer feature:
The degree of association between the answer data and its sample problem belonged to includes: the answer data and it is belonged to
The similarity of sample problem;
For the pass that the answer feature includes between the answer data and other answer datas of the same sample problem of ownership
The case where connection is spent:
The degree of association between the answer data and other answer datas of the same sample problem of ownership include it is following any one
It is or a variety of: average similarity, the answer number of the answer data and other answer datas for belonging to same sample problem
According to minimum similarity degree, the answer data and its for belonging to same problem with other answer datas for belonging to same sample problem
It is the quantity of other answer datas of the sample problem that the maximum similarity of his answer data, the answer data are belonged to, described
The order that answer data is created in all answer datas of the sample problem belonged to.
5. according to the method described in claim 4, it is characterized in that,
Include the case where the readable of the answer data for the contents attribute of the answer data, is obtained using following manner
The readability of the answer data:
According to the quantity of paragraph in the answer data and the length of each paragraph, the readability of the answer data is determined;
For the degree of association between the answer data and its sample problem belonged to include the answer data and it is returned
The case where similarity of the sample problem of category, obtains the answer data and its sample problem belonged to using following manner
Similarity:
The expression vector of the answer data constructed by term vector based on each word in the answer data and described
The expression vector of the sample problem constructed by the term vector of each word in its sample problem belonged to, determine described in answer
The similarity of case data and its sample problem belonged to;
For the pass that the answer feature includes between the answer data and other answer datas of the same sample problem of ownership
Connection degree, average similarity including following answer datas and other answer datas for belonging to same sample problem described are answered
Case data and belongs to the minimum similarity degree of other answer datas of same sample problem, the answer data and belong to same problem
At least one of the maximum similarity of other answer datas the case where, the answer data and ownership are obtained using following manner
The degree of association between other answer datas of same sample problem:
The expression vector of the answer data constructed by term vector based on each word in the answer data and other
The expression vector of other answer datas constructed by the term vector of each word, determines the answer data and its in answer data
The similarity of his answer data.
6. according to the method described in claim 2, it is characterized in that, this method further include:
Based on every decision tree, the important journey of each answer feature in every decision tree is determined using Geordie impurity level method
Degree;
According to the significance level of the answer feature each in each decision tree, the important journey of all answer features is determined
Degree.
7. a kind of answer quality determination method characterized by comprising
The answer feature for obtaining the setting number of target answer, constructs the feature vector of the target answer;
The feature vector of the target answer is input to and is determined by answer quality described in claim 1~6 any one
The answer quality that model training method is trained determines in model, obtains the quality information of the target answer.
8. a kind of answer quality determines model training apparatus, which is characterized in that the device includes:
Module is obtained, includes the corresponding answer number of multiple sample problems in the sample set for obtaining sample set
According to, wherein each sample problem is corresponding at least one answer data, and each answer data has corresponding
Quality annotation information;
First eigenvector constructs module, for being directed to each answer data, obtains the corresponding setting number of the answer data
Answer feature constructs the feature vector of the answer data;
Training module, for using the feature vector of the answer data be input, the quality annotation information as output, to described
Answer quality determines that model is trained.
9. a kind of answer quality determining device characterized by comprising
Second feature vector constructs module, and the answer feature of the setting number for obtaining target answer constructs the target answer
Feature vector;
Determining module, for being input to the feature vector of the target answer by as claimed in any one of claims 1 to 6
The answer quality that answer quality determines that model training method is trained determines in model, obtains the quality of the target answer
Information.
10. a kind of electronic equipment characterized by comprising processor, memory and bus, the memory are stored with described
The executable machine readable instructions of processor, when electronic equipment operation, by total between the processor and the memory
Line communication executes the answer quality as described in claim 1~6 is any when the machine readable instructions are executed by the processor
The step of determining model training method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811285467.XA CN109472305A (en) | 2018-10-31 | 2018-10-31 | Answer quality determines model training method, answer quality determination method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811285467.XA CN109472305A (en) | 2018-10-31 | 2018-10-31 | Answer quality determines model training method, answer quality determination method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109472305A true CN109472305A (en) | 2019-03-15 |
Family
ID=65666908
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811285467.XA Pending CN109472305A (en) | 2018-10-31 | 2018-10-31 | Answer quality determines model training method, answer quality determination method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109472305A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110119770A (en) * | 2019-04-28 | 2019-08-13 | 平安科技(深圳)有限公司 | Decision-tree model construction method, device, electronic equipment and medium |
CN110516027A (en) * | 2019-07-22 | 2019-11-29 | 北京达佳互联信息技术有限公司 | Update method, device, electronic equipment and the storage medium of information aggregate |
CN110674276A (en) * | 2019-09-23 | 2020-01-10 | 深圳前海微众银行股份有限公司 | Robot self-learning method, robot terminal, device and readable storage medium |
CN110704597A (en) * | 2019-09-29 | 2020-01-17 | 北京金山安全软件有限公司 | Dialogue system reliability verification method, model generation method and device |
CN110825930A (en) * | 2019-11-01 | 2020-02-21 | 北京邮电大学 | Method for automatically identifying correct answers in community question-answering forum based on artificial intelligence |
CN111090742A (en) * | 2019-12-19 | 2020-05-01 | 东软集团股份有限公司 | Question and answer pair evaluation method and device, storage medium and equipment |
CN111241258A (en) * | 2020-01-08 | 2020-06-05 | 泰康保险集团股份有限公司 | Data cleaning method and device, computer equipment and readable storage medium |
CN111783473A (en) * | 2020-07-14 | 2020-10-16 | 腾讯科技(深圳)有限公司 | Method and device for identifying best answer in medical question and answer and computer equipment |
CN111798285A (en) * | 2019-09-30 | 2020-10-20 | 北京京东尚科信息技术有限公司 | Information generation method and device |
CN112131354A (en) * | 2020-11-26 | 2020-12-25 | 广州华多网络科技有限公司 | Answer screening method and device, terminal equipment and computer readable storage medium |
CN113590790A (en) * | 2021-07-30 | 2021-11-02 | 北京壹心壹翼科技有限公司 | Question retrieval method, device, equipment and medium applied to multiple rounds of question answering |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079026A (en) * | 2007-07-02 | 2007-11-28 | 北京百问百答网络技术有限公司 | Text similarity, acceptation similarity calculating method and system and application system |
CN104573028A (en) * | 2015-01-14 | 2015-04-29 | 百度在线网络技术(北京)有限公司 | Intelligent question-answer implementing method and system |
CN106844530A (en) * | 2016-12-29 | 2017-06-13 | 北京奇虎科技有限公司 | Training method and device of a kind of question and answer to disaggregated model |
CN107608999A (en) * | 2017-07-17 | 2018-01-19 | 南京邮电大学 | A kind of Question Classification method suitable for automatically request-answering system |
CN107977676A (en) * | 2017-11-24 | 2018-05-01 | 北京神州泰岳软件股份有限公司 | Text similarity computing method and device |
CN108182175A (en) * | 2017-12-29 | 2018-06-19 | 中国银联股份有限公司 | A kind of text quality's index selection method and device |
CN108205684A (en) * | 2017-04-25 | 2018-06-26 | 北京市商汤科技开发有限公司 | Image disambiguation method, device, storage medium and electronic equipment |
WO2018147543A1 (en) * | 2017-02-08 | 2018-08-16 | 한국과학기술원 | Concept graph based query-response system and context search method using same |
-
2018
- 2018-10-31 CN CN201811285467.XA patent/CN109472305A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079026A (en) * | 2007-07-02 | 2007-11-28 | 北京百问百答网络技术有限公司 | Text similarity, acceptation similarity calculating method and system and application system |
CN104573028A (en) * | 2015-01-14 | 2015-04-29 | 百度在线网络技术(北京)有限公司 | Intelligent question-answer implementing method and system |
CN106844530A (en) * | 2016-12-29 | 2017-06-13 | 北京奇虎科技有限公司 | Training method and device of a kind of question and answer to disaggregated model |
WO2018147543A1 (en) * | 2017-02-08 | 2018-08-16 | 한국과학기술원 | Concept graph based query-response system and context search method using same |
CN108205684A (en) * | 2017-04-25 | 2018-06-26 | 北京市商汤科技开发有限公司 | Image disambiguation method, device, storage medium and electronic equipment |
CN107608999A (en) * | 2017-07-17 | 2018-01-19 | 南京邮电大学 | A kind of Question Classification method suitable for automatically request-answering system |
CN107977676A (en) * | 2017-11-24 | 2018-05-01 | 北京神州泰岳软件股份有限公司 | Text similarity computing method and device |
CN108182175A (en) * | 2017-12-29 | 2018-06-19 | 中国银联股份有限公司 | A kind of text quality's index selection method and device |
Non-Patent Citations (2)
Title |
---|
潘炜等: "《列表问答系统中的答案聚类重排序》", 《计算机应用与软件》 * |
蔡丽艳: "《数据挖掘算法及其应用研究》", 28 February 2013, 电子科技大学出版社 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110119770A (en) * | 2019-04-28 | 2019-08-13 | 平安科技(深圳)有限公司 | Decision-tree model construction method, device, electronic equipment and medium |
CN110119770B (en) * | 2019-04-28 | 2024-05-14 | 平安科技(深圳)有限公司 | Decision tree model construction method, device, electronic equipment and medium |
CN110516027A (en) * | 2019-07-22 | 2019-11-29 | 北京达佳互联信息技术有限公司 | Update method, device, electronic equipment and the storage medium of information aggregate |
CN110516027B (en) * | 2019-07-22 | 2022-04-22 | 北京达佳互联信息技术有限公司 | Information set updating method and device, electronic equipment and storage medium |
CN110674276A (en) * | 2019-09-23 | 2020-01-10 | 深圳前海微众银行股份有限公司 | Robot self-learning method, robot terminal, device and readable storage medium |
CN110704597A (en) * | 2019-09-29 | 2020-01-17 | 北京金山安全软件有限公司 | Dialogue system reliability verification method, model generation method and device |
CN110704597B (en) * | 2019-09-29 | 2022-07-29 | 北京金山安全软件有限公司 | Dialogue system reliability verification method, model generation method and device |
CN111798285A (en) * | 2019-09-30 | 2020-10-20 | 北京京东尚科信息技术有限公司 | Information generation method and device |
CN110825930A (en) * | 2019-11-01 | 2020-02-21 | 北京邮电大学 | Method for automatically identifying correct answers in community question-answering forum based on artificial intelligence |
CN111090742A (en) * | 2019-12-19 | 2020-05-01 | 东软集团股份有限公司 | Question and answer pair evaluation method and device, storage medium and equipment |
CN111090742B (en) * | 2019-12-19 | 2024-05-17 | 东软集团股份有限公司 | Question-answer pair evaluation method, question-answer pair evaluation device, storage medium and equipment |
CN111241258A (en) * | 2020-01-08 | 2020-06-05 | 泰康保险集团股份有限公司 | Data cleaning method and device, computer equipment and readable storage medium |
CN111783473A (en) * | 2020-07-14 | 2020-10-16 | 腾讯科技(深圳)有限公司 | Method and device for identifying best answer in medical question and answer and computer equipment |
CN111783473B (en) * | 2020-07-14 | 2024-02-13 | 腾讯科技(深圳)有限公司 | Method and device for identifying best answer in medical question and answer and computer equipment |
CN112131354A (en) * | 2020-11-26 | 2020-12-25 | 广州华多网络科技有限公司 | Answer screening method and device, terminal equipment and computer readable storage medium |
CN112131354B (en) * | 2020-11-26 | 2021-04-16 | 广州华多网络科技有限公司 | Answer screening method and device, terminal equipment and computer readable storage medium |
CN113590790A (en) * | 2021-07-30 | 2021-11-02 | 北京壹心壹翼科技有限公司 | Question retrieval method, device, equipment and medium applied to multiple rounds of question answering |
CN113590790B (en) * | 2021-07-30 | 2023-11-28 | 北京壹心壹翼科技有限公司 | Question retrieval method, device, equipment and medium applied to multi-round question and answer |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109472305A (en) | Answer quality determines model training method, answer quality determination method and device | |
CN112632385B (en) | Course recommendation method, course recommendation device, computer equipment and medium | |
CN110399541B (en) | Topic recommendation method and device based on deep learning and storage medium | |
Quoc Viet Hung et al. | An evaluation of aggregation techniques in crowdsourcing | |
US20190354887A1 (en) | Knowledge graph based learning content generation | |
CN107004371B (en) | Measurement to education content effect | |
CN110008397B (en) | Recommendation model training method and device | |
KR102265573B1 (en) | Method and system for reconstructing mathematics learning curriculum based on artificial intelligence | |
CN111008336A (en) | Content recommendation method, device and equipment and readable storage medium | |
CN103324636A (en) | System and method for recommending friend in social network | |
CN110019163A (en) | Method, system, equipment and the storage medium of prediction, the recommendation of characteristics of objects | |
CN107818491A (en) | Electronic installation, Products Show method and storage medium based on user's Internet data | |
CN112100221B (en) | Information recommendation method and device, recommendation server and storage medium | |
CN106537387B (en) | Retrieval/storage image associated with event | |
CN108304428A (en) | Information recommendation method and device | |
CN114037545A (en) | Client recommendation method, device, equipment and storage medium | |
CN113705792A (en) | Personalized recommendation method, device, equipment and medium based on deep learning model | |
CN111639485A (en) | Course recommendation method based on text similarity and related equipment | |
CN107665202B (en) | Method and device for constructing interest model and electronic equipment | |
Imran et al. | A framework to provide personalization in learning management systems through a recommender system approach | |
Nafea et al. | A novel algorithm for dynamic student profile adaptation based on learning styles | |
Huang et al. | Expert recommendation via tensor factorization with regularizing hierarchical topical relationships | |
CN106779929A (en) | A kind of Products Show method, device and computing device | |
CN111080025A (en) | Learning feature data processing method and device and electronic equipment | |
CN110020214A (en) | A kind of social networks streaming events detection system merging knowledge |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 101-8, 1st floor, building 31, area 1, 188 South Fourth Ring Road West, Fengtai District, Beijing Applicant after: Guoxin Youyi Data Co., Ltd Address before: 100070, No. 188, building 31, headquarters square, South Fourth Ring Road West, Fengtai District, Beijing Applicant before: SIC YOUE DATA Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190315 |