CN109472305A - Answer quality determines model training method, answer quality determination method and device - Google Patents

Answer quality determines model training method, answer quality determination method and device Download PDF

Info

Publication number
CN109472305A
CN109472305A CN201811285467.XA CN201811285467A CN109472305A CN 109472305 A CN109472305 A CN 109472305A CN 201811285467 A CN201811285467 A CN 201811285467A CN 109472305 A CN109472305 A CN 109472305A
Authority
CN
China
Prior art keywords
answer
answer data
data
quality
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811285467.XA
Other languages
Chinese (zh)
Inventor
朱月梅
郑凯
段立新
江建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guoxin Youe Data Co Ltd
Original Assignee
Guoxin Youe Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guoxin Youe Data Co Ltd filed Critical Guoxin Youe Data Co Ltd
Priority to CN201811285467.XA priority Critical patent/CN109472305A/en
Publication of CN109472305A publication Critical patent/CN109472305A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application provides a kind of answer quality to determine model training method, answer quality determination method and device, wherein, model training method includes: acquisition sample set, it include the corresponding answer data of multiple sample problems in the sample set, wherein, each sample problem is corresponding at least one answer data, and each answer data has corresponding quality annotation information;For each answer data, the answer feature of the corresponding setting number of the answer data is obtained, the feature vector of the answer data is constructed;Feature vector using the answer data is input, the quality annotation information as output, determines that model is trained to the answer quality.The embodiment of the present application can reach determining answer quality, improve the effect for recommending answer accuracy rate.

Description

Answer quality determines model training method, answer quality determination method and device
Technical field
This application involves machine learning techniques field, in particular to a kind of answer quality determine model training method, Answer quality determination method and device.
Background technique
Community's question and answer as a kind of very popular and practical Internet application, for user provide a RELEASE PROBLEM with Answer other people the platform of problem, such as Baidu is known, Sina's love is asked, the community Zhi Hudeng answer platform.People not only can be in society RELEASE PROBLEM is putd question to the information requirement to meet oneself in area's answer platform, but also can be returned by community's answer platform The problem of other users are putd question to is answered to share the knowledge of oneself;Furthermore the problem of user can also be accumulated to system answer library into Row retrieval, rapidly to meet oneself information requirement, etc..
In practical applications, there may be multiple and different answers for same problem, such as: multiple people provide respectively to be answered Case.And it is directed to same problem, the quality of answer is not also identical, such as: due to everyone degree of understanding, the knowledge to problem Deposit answers the differences such as attitude, causes answer quality also different.In order to facilitate user's use, community's answer platform is needed from institute A quality and the higher answer of accuracy are filtered out in some answers as the optimum answer of problem shows user.
Summary of the invention
A kind of answer quality of being designed to provide of the embodiment of the present application determines model training method, answer quality determination side Method and device can reach determining answer quality, improve the accuracy rate for recommending answer.
In a first aspect, the embodiment of the present application, which provides a kind of answer quality, determines model training method, comprising:
Sample set is obtained, includes the corresponding answer data of multiple sample problems in the sample set, wherein every A sample problem is corresponding at least one answer data, and each answer data is believed with corresponding quality annotation Breath;
For each answer data, the answer feature of the corresponding setting number of the answer data is obtained, the answer number is constructed According to feature vector;
Feature vector using the answer data is input, the quality annotation information as output, to the answer quality Determine that model is trained.
In a kind of optional embodiment, the answer quality determines that model is Random Forest model, and
It is described to the answer quality determine model be trained include: with the feature vector of the answer data be it is defeated Enter, the quality annotation information is output, at least one decision tree is constructed, based at least one described decision tree, described in building Random Forest model.
In a kind of optional embodiment, the answer feature include it is following any one or it is a variety of: the answer data Contents attribute, the answer data is provided the evaluation of user, the time attribute of the answer data, the answer data with The degree of association, the answer data between its sample problem belonged to and belong to same sample problem other answer datas it Between the degree of association.
A kind of optional embodiment includes the case where the contents attribute of the answer data: institute for the answer feature State answer data contents attribute include it is following any one or it is a variety of: the uniform resource locator mark in the answer data Sign quantity, the quantity of picture in the answer data, in the answer data code snippet quantity, the length of the answer data The readability of degree, the answer data;
Include the case where providing the evaluation of the user of the answer data for the answer feature:
The evaluation for providing the user of the answer data includes any one following or combination: providing the answer data User answer the scoring of other problems and/or voting results, provide the user of the answer data scoring and/or throwing putd question to Ticket result;
Include the case where the time attribute of the answer data for the answer feature: the time of the answer data belongs to Property includes: that the creation time of the corresponding sample problem of the answer data is poor;
It include the feelings of the degree of association between the answer data and its sample problem belonged to for the answer feature Condition: the degree of association between the answer data and its sample problem belonged to includes: the answer data and it is belonged to The similarity of sample problem;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature The degree of association the case where: under the degree of association between the answer data and other answer datas of the same sample problem of ownership includes State any one or it is a variety of: the average similarity of the answer data and other answer datas for belonging to same sample problem, The answer data and belongs to the minimum similarity degree of other answer datas of same sample problem, the answer data and belong to same Other answer datas of the sample problem that the maximum similarity of other answer datas of one problem, the answer data are belonged to The order that quantity, the answer data are created in all answer datas of the sample problem belonged to.
In a kind of optional embodiment, the contents attribute for the answer data includes the readability of the answer data The case where, the readability of the answer data is obtained using following manner: according to the quantity of paragraph in the answer data and The length of each paragraph determines the readability of the answer data;
It include the answer data and its for the degree of association between the answer data and its sample problem belonged to The case where similarity of the sample problem belonged to, obtains the answer data using following manner and its sample belonged to is asked The similarity of topic:
The expression vector of the answer data constructed by term vector based on each word in the answer data, and The expression vector of the sample problem constructed by the term vector of each word, determines institute in the sample problem that it is belonged to State the similarity of answer data He its sample problem belonged to;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature The degree of association, including following answer datas and the average similarity for other answer datas for belonging to same sample problem, institute State the minimum similarity degrees of other answer datas of answer data and the same sample problem of ownership, the answer data with belong to it is same The case where at least one of maximum similarity of other answer datas of problem, using following manner obtain the answer data and Belong to the degree of association between other answer datas of same sample problem:
The expression vector of the answer data constructed by term vector based on each word in the answer data, and The expression vector of other answer datas constructed by the term vector of each word, determines the answer data in other answer datas With the similarity of other answer datas.
In a kind of optional embodiment, this method further include: be based on every decision tree, determined using Geordie impurity level method The significance level of each answer feature in every decision tree;According to the important of the answer feature each in each decision tree Degree determines the significance level of all answer features.
Second aspect, the embodiment of the present application also provide a kind of answer quality determination method, comprising:
The answer feature for obtaining the setting number of target answer, constructs the feature vector of the target answer;
The feature vector of the target answer is input to true by the answer quality provided by the embodiments of the present application Determine the answer quality that model training method is trained to determine in model, obtains the quality information of the target answer.
The third aspect, the embodiment of the present application provide a kind of answer quality and determine model training apparatus, comprising:
Module is obtained, for obtaining sample set, including that multiple sample problems are corresponding in the sample set is answered Case data, wherein each sample problem is corresponding at least one answer data, and each answer data has pair The quality annotation information answered;
First eigenvector constructs module, for being directed to each answer data, obtains the corresponding setting number of the answer data Purpose answer feature, constructs the feature vector of the answer data;
Training module, for using the feature vector of the answer data be input, the quality annotation information as output, it is right The answer quality determines that model is trained.
In a kind of optional embodiment, the answer quality determines that model is Random Forest model, and
Training module, for using following manner to determine that model is trained to the answer quality: with the answer number According to feature vector be input, the quality annotation information is output, construct at least one decision tree, based on it is described at least one Decision tree constructs the Random Forest model.
In a kind of optional embodiment, the answer feature include it is following any one or it is a variety of: the answer data Contents attribute, the answer data is provided the evaluation of user, the time attribute of the answer data, the answer data with The degree of association, the answer data between its sample problem belonged to and belong to same sample problem other answer datas it Between the degree of association.
In a kind of optional embodiment, the contents attribute of the answer data is included the case where for the answer feature:
The contents attribute of the answer data include it is following any one or it is a variety of: the unified money in the answer data The quantity of picture in source finger URL number of labels, the answer data, described is answered the quantity of code snippet in the answer data The readability of the length of case data, the answer data;
Include the case where providing the evaluation of the user of the answer data for the answer feature:
The evaluation for providing the user of the answer data includes any one following or combination: providing the answer data User answer the scoring of other problems and/or voting results, provide the user of the answer data scoring and/or throwing putd question to Ticket result;
Include the case where the time attribute of the answer data for the answer feature: the time of the answer data belongs to Property includes: that the creation time of the corresponding sample problem of the answer data is poor;
It include the feelings of the degree of association between the answer data and its sample problem belonged to for the answer feature Condition: the degree of association between the answer data and its sample problem belonged to includes: the answer data and it is belonged to The similarity of sample problem;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature The degree of association the case where: under the degree of association between the answer data and other answer datas of the same sample problem of ownership includes State any one or it is a variety of: the average similarity of the answer data and other answer datas for belonging to same sample problem, The answer data and belongs to the minimum similarity degree of other answer datas of same sample problem, the answer data and belong to same Other answer datas of the sample problem that the maximum similarity of other answer datas of one problem, the answer data are belonged to The order that quantity, the answer data are created in all answer datas of the sample problem belonged to.
In a kind of optional embodiment, the contents attribute for the answer data includes the readability of the answer data The case where, the readability of the answer data is obtained using following manner: according to the quantity of paragraph in the answer data and The length of each paragraph determines the readability of the answer data;
It include the answer data and its for the degree of association between the answer data and its sample problem belonged to The case where similarity of the sample problem belonged to, obtains the answer data using following manner and its sample belonged to is asked The similarity of topic: the expression vector of the answer data constructed by the term vector based on each word in the answer data, And in the sample problem that it is belonged to the sample problem constructed by the term vector of each word expression vector, really The similarity of the fixed answer data and its sample problem belonged to;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature The degree of association, including following answer datas and the average similarity for other answer datas for belonging to same sample problem, institute State the minimum similarity degrees of other answer datas of answer data and the same sample problem of ownership, the answer data with belong to it is same The case where at least one of maximum similarity of other answer datas of problem, using following manner obtain the answer data and Belong to the degree of association between other answer datas of same sample problem:
The expression vector of the answer data constructed by term vector based on each word in the answer data, and The expression vector of other answer datas constructed by the term vector of each word, determines the answer data in other answer datas With the similarity of other answer datas.
In a kind of optional embodiment, the device further include: significance level determining module is used for, and is based on every decision tree, The significance level of each answer feature in every decision tree is determined using Geordie impurity level method;According in each decision tree The significance level of each answer feature determines the significance level of all answer features.
Fourth aspect, the embodiment of the present application also provide a kind of answer quality determining device, comprising:
Second feature vector constructs module, and the answer feature of the setting number for obtaining target answer constructs the target The feature vector of answer;
Determining module, for being input to the feature vector of the target answer by claim 1-6 any one institute The answer quality that the answer quality stated determines that model training method is trained determines in model, obtains the target answer Quality information.
5th aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: processor, memory and bus, it is described to deposit Reservoir is stored with the executable machine readable instructions of processor, when electronic equipment operation, the processor and the memory Between by bus communication, the machine readable instructions execute above-mentioned first aspect any possibility when being executed by the processor Embodiment in step.
6th aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer readable storage medium On be stored with computer program, which executes any possible implementation of above-mentioned first aspect when being run by processor Step in mode.
7th aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: processor, memory and bus, it is described to deposit Reservoir is stored with the executable machine readable instructions of processor, when electronic equipment operation, the processor and the memory Between by bus communication, the machine readable instructions execute above-mentioned second aspect any possibility when being executed by the processor Embodiment in step.
Eighth aspect, the embodiment of the present application also provide a kind of computer readable storage medium, the computer-readable storage medium Computer program is stored in matter, which executes above-mentioned second aspect any possible reality when being run by processor Apply the step in mode.
The corresponding answer data of multiple sample problems that the embodiment of the present application includes in the sample set by acquisition, The feature vector for characterizing each answer data is constructed, and is input with the feature vector of answer data, is believed with quality annotation Breath is output, determines that model is trained to answer quality, and answer quality is enabled to determine that model learns to optimum answer to have Standby feature determines that model determines the quality of answer by the answer quality;In this way, for newly generated answer, it also being capable of base Its quality is determined in trained model.
To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows the flow chart that a kind of answer quality provided by the embodiment of the present application one determines model training method;
Fig. 2 shows the flow charts of answer quality determination method provided by the embodiment of the present application two;
Fig. 3 shows the schematic diagram that a kind of answer quality provided by the embodiment of the present application three determines model training apparatus;
Fig. 4 shows the structural schematic diagram of a kind of electronic equipment 400 provided by the embodiment of the present application four.
Fig. 5 shows a kind of schematic diagram of answer quality determining device provided by the embodiment of the present application five;
Fig. 6 shows the schematic diagram of a kind of electronic equipment 600 provided by the embodiment of the present application six.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application Middle attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only It is some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is real The component for applying example can be arranged and be designed with a variety of different configurations.Therefore, below to the application's provided in the accompanying drawings The detailed description of embodiment is not intended to limit claimed scope of the present application, but is merely representative of the selected reality of the application Apply example.Based on embodiments herein, those skilled in the art institute obtained without making creative work There are other embodiments, shall fall in the protection scope of this application.
A kind of answer quality provided by the present application determines that model training, answer quality determine model and device, can be preparatory By the corresponding answer data of multiple sample problems for including in the sample set of acquisition, construct for characterizing each answer The feature vector of data, and be input with the feature vector of answer data, it is output with quality annotation information, it is true to answer quality Cover half type is trained, and answer quality is enabled to determine that model learns the feature having to optimum answer, passes through the answer matter Measure the quality for determining that model determines answer;In this way, the quality of newly generated answer also can determine for newly generated answer, Such as: determine optimum answer etc..
For convenient for understanding the present invention, model training side is determined to a kind of answer quality disclosed in this invention first Method describes in detail, it should be noted that the present invention can not only be applied in community's question answering system, additionally it is possible to be used for other Determine in answer quality or the scene of determining optimum answer.
Embodiment one
The answer quality that Fig. 1 shows the offer of the embodiment of the present application one determines the flow chart of model training method;The application The answer quality that embodiment one provides determines that model training method includes S101~S1103.
S101: obtaining sample set, includes the corresponding answer data of multiple sample problems in the sample set, In, each sample problem is corresponding at least one answer data, and each answer data has corresponding quality Markup information.
Sample set contains the answer data marked, for example, it may be obtained from community's answer platform The answer data marked, or obtain answer and then be labeled.Here, mark can be artificial mark, Such as: by the relevant personnel from the answer for corresponding to a certain problem, determining the quality of answer and/or select optimum answer, either By method of voting, i.e., is voted by the people of the problem of checking and associated answer answer, determine the quality and/or determination of answer Optimum answer.The mark of answer data can also be carried out by other means, such as: utilize neural network model and/or semanteme The means such as analysis are based on feature vector, calculate the correlation of answer and problem, determine the quality of answer and/or select and most preferably answer Case, etc..
When obtaining sample set, multiple sample problems first can be determined based on certain standard.In operation, may be used With based on the time (such as: recent effective question and answer), answer data quantity (such as: the answer of the problem needs to meet setting Quantity), the factors such as the markup information of answer (such as: the case of someone's answer has carried out effective mark) determine sample problem.
After sample problem has been determined, sample answer data can be determined.It herein, can for a sample problem It, can also be corresponding all from the sample problem using the corresponding all answers of the sample problem all as a part of sample set In answer, a part of a part of answer as sample set is screened.
As previously described, the quality annotation information of the corresponding answer data of sample problem can be and carry out answer number When according to acquisition, the information that answer data has been provided with is also possible to after obtaining answer data, marks for answer data Information.
S102: being directed to each answer data, obtains the answer feature of the corresponding setting number of the answer data, constructs this and answer The feature vector of case data.Specific implementation when, answer feature include it is following any one or it is a variety of:
A, the contents attribute of the answer data.Herein, the contents attribute of answer data is commonly used in characterization answer data The abundant degree of covered content;In general, answer data is covered under the premise of the content of answer data is without mistake Content it is abundanter, corresponding answer quality is higher.Therefore in some embodiments of the application, by the contents attribute of answer data As the feature for measuring answer data quality.Specifically, the contents attribute of answer data may include it is following any one or It is a variety of:
1.: uniform resource locator (Uniform Resource Locator, URL) number of tags in the answer data Amount.URL is the expression succinct to the position for the resource that can be obtained from internet and one kind of access method, in answer data In, the quantity of URL can characterize the abundant degree of the covered content of answer data to a certain extent, and to Mr. Yu The clear degree that a little contents or concept are described.
When obtaining the URL of answer data, key search method can be used, it is first determined commonly used key in URL Character, such as " http ", " ftp " are used to characterize the character of transport protocol, "/", " " etc. for indicate in URL different piece it Between the character etc. that separates;Then it according to determining key character, is retrieved from answer data, to obtain in answer data URL number of labels.
2., in the answer data picture quantity.The number of picture number, can also characterize answer to a certain extent The abundant degree of the covered content of data, and for clear degree that certain contents or concept are described.
3., in answer data code snippet quantity.
4., the length of answer data.It in some embodiments, can be using the quantity of word in answer data as answer Length;It can also length by the file size of answer data, as answer;Can also will own included in answer data The number of characters of content, the length as answer.
For the different representations of answer length, the length of answer data has different acquisition modes.For example, being directed to Using the quantity of word in answer data as the length of answer, word segmentation processing can be carried out to the content of answer data, obtain structure At the word collection of answer, the quantity of word is then concentrated by statistics word, obtains the length of answer data;For by answer number According to length of the file size as answer the case where, the file attribute of answer data can be read directly, obtain answer data File size;It, can be with for using the number of characters of all the elements included in answer data as the case where the length of answer Directly read the number of characters of all the elements included in answer data.
5., the readability of the answer data.In some embodiments of the application, the readability of answer data refers to reading The complexity of answer.For example, can be read using the length of paragraph longest in answer data as being used to characterize answer data It is difficult to the readability of degree;Or it can be using the average length of paragraph each in answer data as being used to characterize answer number According to the readability for reading the degree that is difficult to.
When obtaining the readability of answer data, the quantity of paragraph and each section can be determined in answer data first The length fallen determines the readability of answer data then according to the quantity of determining paragraph and the length of paragraph.Herein, section The length fallen can be the quantity for the word for including in the number of characters or paragraph of paragraph.
B, the evaluation of the user of the answer data is provided.Herein, it is considered that be good at the user of answer and/or enquirement more It is possible that giving mass higher answer data.Therefore in order to describe the quality of answer data, the answer number can will be provided According to user evaluation as measurement answer data quality a kind of answer feature.Following a kind of or combination conduct can be passed through The evaluation of the user for the data that furnish an answer:
1., provide state answer data user answer other problems scoring and/or voting results.This feature is for characterizing Rationality of the user for the data that furnish an answer when answering other problems.For example, can be with the use of the user for the data that furnish an answer Name in an account book or identity obtain from the database of internet platform as search condition and answer other problems with the user Scoring and/or voting results.
2., scoring and/or voting results that the user of the answer data puts question to are provided.This feature is answered for characterizing to provide The user of case data in terms of proposing significant problem and dynamics.When the method and acquisition user answer other problems of acquisition Rationality method it is similar, details are not described herein.
C, the time attribute of the answer data.Herein, generally by the creation of the corresponding sample answer of answer data Time difference, the time attribute as answer data.
When obtaining the time attribute of answer data, the corresponding sample of answer data can be obtained from internet platform The creation time of this answer, and according to the creation time of the corresponding sample answer of answer data, obtain answer data and its The creation time of corresponding sample answer is poor.
D, the degree of association between the answer data and its sample problem belonged to.Herein, answer data is generally used The similarity between sample problem belonged to it, to characterize answer evidence and the degree of association between sample problem.And similarity Higher, then the degree of association between answer data and sample problem is also higher;Similarity is lower, then answer data and sample problem Between the degree of association it is also lower.
Specifically, answer data and its can be determined based on the expression vector of answer data and the expression vector of sample data The similarity between sample problem belonged to indicates that vector can the mode based on semantic analysis and/or neural metwork training It determines, such as: word segmentation processing is carried out to answer data, sample problem, extracts keyword, keyword is subjected to word insertion, is obtained Obtained term vector is input in neural network training by corresponding term vector, obtain answer data, sample problem expression to Amount.
According to the expression vector of the sample problem for indicating vector and its ownership of answer data, between the two similar is calculated Degree.Herein, similarity is indicated by any one following measuring similarity: Euclidean distance, manhatton distance, Chebyshev away from From, Minkowski Distance, standardization Euclidean distance, mahalanobis distance, included angle cosine, Hamming distance, Jie Kade distance or outstanding block Moral similarity factor, related coefficient or correlation distance and comentropy, etc..
E, the degree of association between the answer data and other answer datas of the same sample problem of ownership.
Wherein, since the partial answer data for answering same sample problem would generally have certain relevance.Therefore, it answers The degree of association between case data and other answer datas of the same sample problem of ownership can characterize answer number to a certain extent According to quality.Under normal circumstances, the degree of association is bigger, then it is assumed that the quality of answer data is higher.The answer data and ownership are same The degree of association between other answer datas of one sample problem include it is following any one or it is a variety of:
1., the answer data and belong to same sample problem other answer datas average similarity.
2., the answer data and belong to same sample problem other answer datas minimum similarity degree.
3., the answer data with belong to same problem other answer datas maximum similarity.
In specific implementation, when the degree of association packet between answer data and other answer datas of the same sample problem of ownership When including in above-mentioned E in 1., 2. and 3. any one, answer data first can be obtained using following manner and be asked with the same sample of ownership Similarity between other answer datas of topic: it is answered described in constructed by the term vector based on each word in the answer data The expression of other answer datas constructed by the term vector of each word in the expression vector and other answer datas of case data Vector determines the similarity of the answer data He other answer datas.Herein, the acquisition modes of the expression vector of answer data Similar with the expression acquisition modes of vector of answer data in above-mentioned D, details are not described herein.
After obtaining the similarity between other answer datas that answer data belongs to same sample problem, for answer number Include the case where in above-mentioned E 1. according to the similarity between other answer datas for belonging to same sample problem, according to answer number According to the similarity between other answer datas for belonging to same sample problem, the average similarity is calculated;For answer data Belong to same sample problem other answer datas between similarity include the case where in above-mentioned E 2., from answer data with Belong in the similarity between other answer datas of same sample problem, determines the smallest value as minimum similarity degree;For 3. similarity between answer data and other answer datas of the same sample problem of ownership is included the case where in above-mentioned E, from answering In similarity between case data and other answer datas of the same sample problem of ownership, determine maximum value as maximum similar Degree.
4., the quantity of other answer datas of sample problem that is belonged to of the answer data;
5., the order that is created in all answer datas of the sample problem belonged to of the answer data.Herein, may be used It is determined with obtaining the time of each answer data creation first then according to the sequencing of each answer data creation time The order that answer data is created in all answer datas of the sample problem belonged to.
After the feature vector for constructing each answer data, answer quality provided by the embodiments of the present application determines that model is instructed Practice method further include:
S103: the feature vector using the answer data is input, the quality annotation information as output, to the answer Quality determines that model is trained.In specific implementation, answer quality determines that model includes: Logic Regression Models, autoregression mould Type, ARMA model, integrates rolling average autoregression model, the different side of broad sense autoregressive conditions at moving average model(MA model) Differential mode type, deep learning model, decision-tree model, any one in Random Forest model.
When answer quality determines that model includes: Logic Regression Models, autoregression model, moving average model(MA model), autoregression shifting It is true to answer quality when moving averaging model, integrate rolling average autoregression model, EC GARCH The process that cover half type is trained are as follows: using the feature vector of answer data as the value of explanatory variable, and by quality annotation information As the value of explained variable, the unknown parameter in model, which solves, to be determined to answer quality.
Specifically, following manner can be used, the unknown parameter in model, which solves, to be determined to answer quality: according to sample The feature vector for all answer datas for including in this set constructs explanatory variable matrix, and corresponding according to each answer data Quality annotation information, construct explained variable matrix, and the unknown parameter in model is determined according to answer quality, building ginseng Then matrix number uses explanatory variable matrix and explained variable matrix, solve parameter matrix.
When answer quality determines that model includes: deep learning model, model, which is trained, to be determined to answer quality Process are as follows: the feature vector of answer data is input in deep learning model, the quality determination results of answer data are obtained. According to the quality determination results of each answer data and corresponding quality annotation information, deep learning model is trained.
Wherein, the process being trained to deep learning model is exactly the parameter of percentage regulation learning model, so that depth Learning model is the quality determination results that answer data determines, the process that can be consistent as far as possible with quality annotation information.
When answer quality determines that model includes: Random Forest model, model, which is trained, to be determined to answer quality Process are as follows: the feature vector using the answer data be input, the quality annotation information as output, construct at least one and determine Plan tree constructs the Random Forest model based at least one described decision tree.
When specific implementation, when constructing every decision tree, first from the feature vector of answer data, really Determine input of the multiple elements of any position as this decision tree, and from the sample set, selection is arbitrarily multiple to be answered Case data, as the target training data of this decision tree, according to multiple members of any position determined for target training data Input of the element as the decision tree, and the corresponding quality of target training data is determined into output of the model as the decision tree, structure Build the decision tree.
For example, the feature vector of answer data includes 15 elements, respectively U1~U15.Answer data include: A1~ A1000 totally one thousand answer datas.
It is and U1~U5 is true using A1~A100 as the target training data of building M1 when constructing first decision tree M1 It is set to input when building M1, using the quality annotation information of A1~A100 as the output of M1, constructs M1;Construct second decision When setting M2, using A1~A100 as the target training data of building M1, and input when U3~U8 is determined as constructing M2 is said, by A1 Output of the quality annotation information of~A100 as M2 constructs M2;Construct third decision tree M3 when, using A101~A200 as U1~U5 is determined as constructing input when M3, by the quality annotation information of A101~A200 by the target training data for constructing M3 As the output of M3, M3 is constructed.When constructing the 4th decision tree M4, using A101~A200 as the target training number of building M4 According to input when U6~U10 to be determined as to building M4 is constructed using the quality annotation information of A101~A200 as the output of M4 M4.After constructing an at least decision tree, at least decision tree based on building constructs Random Forest model.
It should be noted that each sample problem is corresponding at least one answer data, there is usually one belong to most Good answer, others are all non-optimum answers, can be answered non-optimal using the answer data for belonging to optimum answer as positive sample Negative sample is made in the answer of case, and the quantity of negative sample may be much larger than the quantity of positive sample, this will lead to the unbalanced problem of classification. In response to this, since the quantity of negative sample is much larger than the quantity of positive sample, in order to enable the quantity of positive sample and negative sample Reach a more balanced state, lack sampling processing can be carried out for the negative sample in sample data.To negative sample into When the processing of row lack sampling, it can be and extract negative sample identical with the quantity of positive sample out at random from negative sample, as instruction The training sample used when practicing Random Forest model.
In addition, answer quality provided by the embodiments of the present application determines in model training after constructing Random Forest model, Further include: it is based on every decision tree, the weight of each answer feature in every decision tree is determined using Geordie impurity level method Want degree;According to the significance level of the answer feature each in each decision tree, the important of all answer features is determined Degree.
Geordie impurity level refers to and applies certain result in set in set at random, a certain data item it is pre- Period error rate.Geordie impurity level is smaller, and purity is higher, and the order degree of sample set is higher, obtained Random Forest model The effect of classification is better.Based on the process, the Random Forest model of generation can be verified.If the random forest generated The Geordie impurity level of model is relatively high, then it is assumed that the precision for being currently generated Random Forest model is lower.It can regenerate new Random Forest model, the answer quality for having obtained meeting required precision determines model.
When answer quality determines that model includes: decision-tree model, it can be regarded as more special random gloomy Woods model is all to make whole elements of the feature vector of all answer datas in sample set when constructing decision tree Decision-tree model is constructed using the corresponding quality annotation information of each answer data as output for the input of decision-tree model.
Trained answer quality determines that model can learn the feature having to optimum answer as a result, and can determine that and answer Whether case is optimum answer.
Embodiment two
Shown in Figure 2, the embodiment of the present application two also provides a kind of answer quality determination method, including S201~S202:
S201: the answer feature of the setting number of target answer is obtained, the feature vector of the target answer is constructed.Herein, The generation method of the feature vector of target answer determines that method is similar with the feature vector of answer data, and details are not described herein.
S202: the feature vector of the target answer is input to through the answer matter provided by the embodiments of the present application It measures the answer quality for determining that model training method is trained to determine in model, obtains the quality information of the target answer.
The embodiment of the present application first passes through the corresponding answer of multiple sample problems for including in the sample set of acquisition in advance Data construct the feature vector for characterizing each answer data, and are input with the feature vector of answer data, with quality mark Infusing information is output, determines that model is trained to answer quality, and answer quality is enabled to determine that model learns to most preferably answering The feature that case has determines that model determines the quality of target answer by the answer quality, has higher accuracy;Meanwhile As long as there is new target answer to generate, it will be able to directly determine whether newly generated target answer is optimum answer, have higher Efficiency.
Based on the same inventive concept, it is additionally provided in the embodiment of the present application and determines that model training method is corresponding with answer quality Answer quality determine model training apparatus, the principle and the application solved the problems, such as due to the device in the embodiment of the present application is implemented The above-mentioned answer quality of example determines that model training method is similar, therefore the implementation of device may refer to the implementation of method, repeats place It repeats no more.
Embodiment three
Shown in Figure 3, the embodiment of the present application three provides a kind of answer quality and determines model training apparatus, comprising:
Module 31 is obtained, includes that multiple sample problems are corresponding for obtaining sample set, in the sample set Answer data, wherein each sample problem is corresponding at least one answer data, and each answer data has Corresponding quality annotation information;
First eigenvector constructs module 32, for being directed to each answer data, obtains the corresponding setting of the answer data The answer feature of number, constructs the feature vector of the answer data;
Training module 33, for using the feature vector of the answer data be input, the quality annotation information as output, Model, which is trained, to be determined to the answer quality.
In a kind of optional embodiment, the answer quality determines that model is Random Forest model, and
Training module 33, for using following manner to determine that model is trained to the answer quality: with the answer The feature vector of data is input, the quality annotation information is output, constructs at least one decision tree, based on described at least one A decision tree constructs the Random Forest model.
In a kind of optional embodiment, the answer feature include it is following any one or it is a variety of: the answer data Contents attribute, the answer data is provided the evaluation of user, the time attribute of the answer data, the answer data with The degree of association, the answer data between its sample problem belonged to and belong to same sample problem other answer datas it Between the degree of association.
In a kind of optional embodiment, the contents attribute of the answer data is included the case where for the answer feature:
The contents attribute of the answer data include it is following any one or it is a variety of: the unified money in the answer data The quantity of picture in source finger URL number of labels, the answer data, described is answered the quantity of code snippet in the answer data The readability of the length of case data, the answer data;
Include the case where providing the evaluation of the user of the answer data for the answer feature:
The evaluation for providing the user of the answer data includes any one following or combination: providing the answer data User answer the scoring of other problems and/or voting results, provide the user of the answer data scoring and/or throwing putd question to Ticket result;
Include the case where the time attribute of the answer data for the answer feature:
The time attribute of the answer data includes: the creation time of the corresponding sample problem of the answer data Difference;
It include the feelings of the degree of association between the answer data and its sample problem belonged to for the answer feature Condition:
The degree of association between the answer data and its sample problem belonged to includes: the answer data and it is returned The similarity of the sample problem of category;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature The degree of association the case where:
The degree of association between the answer data and other answer datas of the same sample problem of ownership includes following any One or more kinds of: the average similarity of the answer data and other answer datas of the same sample problem of ownership described is answered Case data and belongs to the minimum similarity degree of other answer datas of same sample problem, the answer data and belong to same problem Other answer datas other answer datas of sample problem for being belonged to of maximum similarity, the answer data quantity, The order that the answer data is created in all answer datas of the sample problem belonged to.
In a kind of optional embodiment, the contents attribute for the answer data includes the readability of the answer data The case where, the readability of the answer data is obtained using following manner:
According to the quantity of paragraph in the answer data and the length of each paragraph, the readable of the answer data is determined Property;
It include the answer data and its for the degree of association between the answer data and its sample problem belonged to The case where similarity of the sample problem belonged to, obtains the answer data using following manner and its sample belonged to is asked The similarity of topic:
The expression vector of the answer data constructed by term vector based on each word in the answer data, and The expression vector of the sample problem constructed by the term vector of each word, determines institute in the sample problem that it is belonged to State the similarity of answer data He its sample problem belonged to;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature The degree of association, including following answer datas and the average similarity for other answer datas for belonging to same sample problem, institute State the minimum similarity degrees of other answer datas of answer data and the same sample problem of ownership, the answer data with belong to it is same The case where at least one of maximum similarity of other answer datas of problem, using following manner obtain the answer data and Belong to the degree of association between other answer datas of same sample problem:
The expression vector of the answer data constructed by term vector based on each word in the answer data, and The expression vector of other answer datas constructed by the term vector of each word, determines the answer data in other answer datas With the similarity of other answer datas.
In a kind of optional embodiment, the device further include: significance level determining module 34 is used for:
Based on every decision tree, the weight of each answer feature in every decision tree is determined using Geordie impurity level method Want degree;
According to the significance level of the answer feature each in each decision tree, the important of all answer features is determined Degree.
The corresponding answer data of multiple sample problems that the embodiment of the present application includes in the sample set by acquisition, The feature vector for characterizing each answer data is constructed, and is input with the feature vector of answer data, is believed with quality annotation Breath is output, determines that model is trained to answer quality, and answer quality is enabled to determine that model learns to optimum answer to have Standby feature determines that model determines the quality of answer by the answer quality, for newly generated answer, is able to determine whether For optimum answer.
Example IV
As shown in figure 4, the schematic diagram of the electronic equipment 400 provided for the embodiment of the present application four, the electronic equipment 400: packet Processor 41, memory 42 and bus 43 are included, the storage of memory 42 executes instruction, when described device operation, the place It is communicated between reason device 41 and the memory 42 by bus 43, the processor 41 executes described execute instruction so that the dress It sets and executes following method:
Sample set is obtained, includes the corresponding answer data of multiple sample problems in the sample set, wherein every A sample problem is corresponding at least one answer data, and each answer data is believed with corresponding quality annotation Breath;
For each answer data, the answer feature of the corresponding setting number of the answer data is obtained, the answer number is constructed According to feature vector;
Feature vector using the answer data is input, the quality annotation information as output, to the answer quality Determine that model is trained.
Optionally, in the method that the processor 41 executes, the answer quality determines that model is random forest mould Type, and
It is described to the answer quality determine model be trained include: with the feature vector of the answer data be it is defeated Enter, the quality annotation information is output, at least one decision tree is constructed, based at least one described decision tree, described in building Random Forest model.
Optionally, in the method that the processor 41 executes, the answer feature include it is following any one or A variety of: the contents attribute of the answer data, the evaluation of user for providing the answer data, the time of the answer data belong to Property, the degree of association between the answer data and its sample problem belonged to, the answer data ask with same sample is belonged to The degree of association between other answer datas of topic.
It optionally, include the answer data for the answer feature in the method that the processor 41 executes Contents attribute the case where:
The contents attribute of the answer data include it is following any one or it is a variety of: the unified money in the answer data The quantity of picture in source finger URL number of labels, the answer data, described is answered the quantity of code snippet in the answer data The readability of the length of case data, the answer data;
Include the case where providing the evaluation of the user of the answer data for the answer feature: the answer number is provided According to the evaluation of user include any one following or combinations: the user for providing the answer data answers commenting for other problems Point and/or voting results, provide scoring and/or voting results that the user of the answer data puts question to;
Include the case where the time attribute of the answer data for the answer feature: the time of the answer data belongs to Property includes: that the creation time of the corresponding sample problem of the answer data is poor;
It include the feelings of the degree of association between the answer data and its sample problem belonged to for the answer feature Condition: the degree of association between the answer data and its sample problem belonged to includes: the answer data and it is belonged to The similarity of sample problem;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature The degree of association the case where: under the degree of association between the answer data and other answer datas of the same sample problem of ownership includes State any one or it is a variety of: the average similarity of the answer data and other answer datas for belonging to same sample problem, The answer data and belongs to the minimum similarity degree of other answer datas of same sample problem, the answer data and belong to same Other answer datas of the sample problem that the maximum similarity of other answer datas of one problem, the answer data are belonged to The order that quantity, the answer data are created in all answer datas of the sample problem belonged to.
Optionally, in the method that the processor 41 executes, the contents attribute for the answer data includes institute The readable situation for stating answer data, the readability of the answer data is obtained using following manner: according to the answer number According to the quantity of middle paragraph and the length of each paragraph, the readability of the answer data is determined;
It include the answer data and its for the degree of association between the answer data and its sample problem belonged to The case where similarity of the sample problem belonged to, obtains the answer data using following manner and its sample belonged to is asked The similarity of topic: the expression vector of the answer data constructed by the term vector based on each word in the answer data, And in the sample problem that it is belonged to the sample problem constructed by the term vector of each word expression vector, really The similarity of the fixed answer data and its sample problem belonged to;
It include between the answer data and other answer datas of the same sample problem of ownership for the answer feature The degree of association, including following answer datas and the average similarity for other answer datas for belonging to same sample problem, institute State the minimum similarity degrees of other answer datas of answer data and the same sample problem of ownership, the answer data with belong to it is same The case where at least one of maximum similarity of other answer datas of problem, using following manner obtain the answer data and Belong to the degree of association between other answer datas of same sample problem: the term vector based on each word in the answer data In the expression vector and other answer datas of the constructed answer data constructed by the term vector of each word other The expression vector of answer data, determines the similarity of the answer data He other answer datas.
Optionally, in the method that the processor 41 executes, this method further include: be based on every decision tree, use Geordie impurity level method determines the significance level of each answer feature in every decision tree;According to each in each decision tree The significance level of the answer feature determines the significance level of all answer features.
The embodiment of the present application also provides a kind of computer readable storage medium, stored on the computer readable storage medium There is computer program, which executes the step that above-mentioned answer quality determines model training method when being run by processor 41 Suddenly.Specifically, which can be general storage medium, such as mobile disk, hard disk, the calculating on the storage medium When machine program is run, it is able to carry out above-mentioned answer quality and determines model training method, so that it is determined that answer quality and/or determination Optimum answer.
Based on the same inventive concept, answer matter corresponding with answer quality determination method is additionally provided in the embodiment of the present application Determining device is measured, the principle and the above-mentioned answer quality of the embodiment of the present application solved the problems, such as due to the device in the embodiment of the present application is true Determine that method is similar, therefore the implementation of device may refer to the implementation of method, overlaps will not be repeated.
Embodiment five
Shown in Figure 5, the embodiment of the present application four provides a kind of answer quality determining device, comprising:
Second feature vector constructs module 51, and the answer feature of the setting number for obtaining target answer constructs the mesh Mark the feature vector of answer;
Determining module 52, for being input to the feature vector of the target answer by claim 1-6 any one The answer quality that the answer quality determines that model training method is trained determines in model, obtains the target answer Quality information.
The embodiment of the present application first passes through the corresponding answer of multiple sample problems for including in the sample set of acquisition in advance Data construct the feature vector for characterizing each answer data, and are input with the feature vector of answer data, with quality mark Infusing information is output, determines that model is trained to answer quality, and answer quality is enabled to determine that model learns to most preferably answering The feature that case has determines that model determines the quality of target answer by the answer quality, and can determine newly generated mesh Mark whether answer is optimum answer.
Embodiment six
As shown in fig. 6, the schematic diagram of the electronic equipment 600 provided for the embodiment of the present application six, the electronic equipment 600: packet Processor 61, memory 62 and bus 63 are included, the storage of memory 62 executes instruction, when described device operation, the place It is communicated between reason device 61 and the memory 62 by bus 63, the processor 61 executes described execute instruction so that the dress It sets and executes following method:
The answer feature for obtaining the setting number of target answer, constructs the feature vector of the target answer;
The feature vector of the target answer is input to, model is determined by answer quality provided by the embodiments of the present application The answer quality that training method is trained determines in model, obtains the quality information of the target answer.
The embodiment of the present application also provides a kind of computer readable storage medium, stored on the computer readable storage medium There is computer program, which executes the step that above-mentioned answer quality determines model training method when being run by processor 61 Suddenly.
Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium Computer program when being run, above-mentioned answer quality determination method is able to carry out, to solve the current optimum answer that determines Method has that efficiency and accuracy rate are low, and then achievees the effect that improve efficiency and accuracy rate that optimum answer determines.
Answer quality provided by the embodiment of the present application determines model training method, answer quality determination method and device Computer program product, the computer readable storage medium including storing program code, the instruction that said program code includes It can be used for executing previous methods method as described in the examples, specific implementation can be found in embodiment of the method, and details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.In the application In provided several embodiments, it should be understood that disclosed systems, devices and methods, it can be real by another way It is existing.The apparatus embodiments described above are merely exemplary, for example, the division of the unit, only a kind of logic function It can divide, there may be another division manner in actual implementation, in another example, multiple units or components can combine or can collect At another system is arrived, or some features can be ignored or not executed.Another point, shown or discussed mutual coupling Conjunction or direct-coupling or communication connection can be the indirect coupling or communication connection by some communication interfaces, device or unit, It can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.In addition, each functional unit in each embodiment of the application can integrate in one processing unit, it is also possible to each Unit physically exists alone, and can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in the executable non-volatile computer-readable storage medium of a processor.Based on this understanding, the application Technical solution substantially the part of the part that contributes to existing technology or the technical solution can be with software in other words The form of product embodies, which is stored in a storage medium, including some instructions use so that One computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment institute of the application State all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. is various to deposit Store up the medium of program code.
Finally, it should be noted that embodiment described above, the only specific embodiment of the application, to illustrate the application Technical solution, rather than its limitations, the protection scope of the application is not limited thereto, although with reference to the foregoing embodiments to this Shen It please be described in detail, those skilled in the art should understand that: anyone skilled in the art Within the technical scope of the present application, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of the embodiment of the present application technical solution, should all cover the protection in the application Within the scope of.Therefore, the protection scope of the application shall be subject to the protection scope of the claim.

Claims (10)

1. a kind of answer quality determines model training method characterized by comprising
Sample set is obtained, includes the corresponding answer data of multiple sample problems in the sample set, wherein Mei Gesuo It states sample problem and is corresponding at least one answer data, each answer data has corresponding quality annotation information;
For each answer data, the answer feature of the corresponding setting number of the answer data is obtained, the answer data is constructed Feature vector;
Feature vector using the answer data is input, the quality annotation information as output, is determined to the answer quality Model is trained.
2. the method according to claim 1, wherein the answer quality determine model be Random Forest model, And
It is described that determine that model is trained to the answer quality include: using the feature vector of the answer data as input, institute Quality annotation information is stated as output, constructs at least one decision tree, based at least one described decision tree, is constructed described random gloomy Woods model.
3. the method according to claim 1, wherein the answer feature include it is following any one or it is more Kind: the contents attribute of the answer data, the evaluation of user for providing the answer data, the time of the answer data belong to Property, the degree of association between the answer data and its sample problem belonged to, the answer data ask with same sample is belonged to The degree of association between other answer datas of topic.
4. according to the method described in claim 3, it is characterized in that, including the interior of the answer data for the answer feature The case where holding attribute:
The contents attribute of the answer data include it is following any one or it is a variety of: unified resource in the answer data is fixed Position symbol number of labels, the quantity of picture in the answer data, the quantity of code snippet, the answer number in the answer data According to length, the readability of the answer data;
Include the case where providing the evaluation of the user of the answer data for the answer feature:
The evaluation for providing the user of the answer data includes any one following or combination: providing the use of the answer data Answer scoring and/or the ballot knot of the scoring of other problems and/or user's enquirement of voting results, the offer answer data in family Fruit;
Include the case where the time attribute of the answer data for the answer feature:
The time attribute of the answer data includes: that the creation time of the corresponding sample problem of the answer data is poor;
Include the case where the degree of association between the answer data and its sample problem belonged to for the answer feature:
The degree of association between the answer data and its sample problem belonged to includes: the answer data and it is belonged to The similarity of sample problem;
For the pass that the answer feature includes between the answer data and other answer datas of the same sample problem of ownership The case where connection is spent:
The degree of association between the answer data and other answer datas of the same sample problem of ownership include it is following any one It is or a variety of: average similarity, the answer number of the answer data and other answer datas for belonging to same sample problem According to minimum similarity degree, the answer data and its for belonging to same problem with other answer datas for belonging to same sample problem It is the quantity of other answer datas of the sample problem that the maximum similarity of his answer data, the answer data are belonged to, described The order that answer data is created in all answer datas of the sample problem belonged to.
5. according to the method described in claim 4, it is characterized in that,
Include the case where the readable of the answer data for the contents attribute of the answer data, is obtained using following manner The readability of the answer data:
According to the quantity of paragraph in the answer data and the length of each paragraph, the readability of the answer data is determined;
For the degree of association between the answer data and its sample problem belonged to include the answer data and it is returned The case where similarity of the sample problem of category, obtains the answer data and its sample problem belonged to using following manner Similarity:
The expression vector of the answer data constructed by term vector based on each word in the answer data and described The expression vector of the sample problem constructed by the term vector of each word in its sample problem belonged to, determine described in answer The similarity of case data and its sample problem belonged to;
For the pass that the answer feature includes between the answer data and other answer datas of the same sample problem of ownership Connection degree, average similarity including following answer datas and other answer datas for belonging to same sample problem described are answered Case data and belongs to the minimum similarity degree of other answer datas of same sample problem, the answer data and belong to same problem At least one of the maximum similarity of other answer datas the case where, the answer data and ownership are obtained using following manner The degree of association between other answer datas of same sample problem:
The expression vector of the answer data constructed by term vector based on each word in the answer data and other The expression vector of other answer datas constructed by the term vector of each word, determines the answer data and its in answer data The similarity of his answer data.
6. according to the method described in claim 2, it is characterized in that, this method further include:
Based on every decision tree, the important journey of each answer feature in every decision tree is determined using Geordie impurity level method Degree;
According to the significance level of the answer feature each in each decision tree, the important journey of all answer features is determined Degree.
7. a kind of answer quality determination method characterized by comprising
The answer feature for obtaining the setting number of target answer, constructs the feature vector of the target answer;
The feature vector of the target answer is input to and is determined by answer quality described in claim 1~6 any one The answer quality that model training method is trained determines in model, obtains the quality information of the target answer.
8. a kind of answer quality determines model training apparatus, which is characterized in that the device includes:
Module is obtained, includes the corresponding answer number of multiple sample problems in the sample set for obtaining sample set According to, wherein each sample problem is corresponding at least one answer data, and each answer data has corresponding Quality annotation information;
First eigenvector constructs module, for being directed to each answer data, obtains the corresponding setting number of the answer data Answer feature constructs the feature vector of the answer data;
Training module, for using the feature vector of the answer data be input, the quality annotation information as output, to described Answer quality determines that model is trained.
9. a kind of answer quality determining device characterized by comprising
Second feature vector constructs module, and the answer feature of the setting number for obtaining target answer constructs the target answer Feature vector;
Determining module, for being input to the feature vector of the target answer by as claimed in any one of claims 1 to 6 The answer quality that answer quality determines that model training method is trained determines in model, obtains the quality of the target answer Information.
10. a kind of electronic equipment characterized by comprising processor, memory and bus, the memory are stored with described The executable machine readable instructions of processor, when electronic equipment operation, by total between the processor and the memory Line communication executes the answer quality as described in claim 1~6 is any when the machine readable instructions are executed by the processor The step of determining model training method.
CN201811285467.XA 2018-10-31 2018-10-31 Answer quality determines model training method, answer quality determination method and device Pending CN109472305A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811285467.XA CN109472305A (en) 2018-10-31 2018-10-31 Answer quality determines model training method, answer quality determination method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811285467.XA CN109472305A (en) 2018-10-31 2018-10-31 Answer quality determines model training method, answer quality determination method and device

Publications (1)

Publication Number Publication Date
CN109472305A true CN109472305A (en) 2019-03-15

Family

ID=65666908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811285467.XA Pending CN109472305A (en) 2018-10-31 2018-10-31 Answer quality determines model training method, answer quality determination method and device

Country Status (1)

Country Link
CN (1) CN109472305A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110119770A (en) * 2019-04-28 2019-08-13 平安科技(深圳)有限公司 Decision-tree model construction method, device, electronic equipment and medium
CN110516027A (en) * 2019-07-22 2019-11-29 北京达佳互联信息技术有限公司 Update method, device, electronic equipment and the storage medium of information aggregate
CN110674276A (en) * 2019-09-23 2020-01-10 深圳前海微众银行股份有限公司 Robot self-learning method, robot terminal, device and readable storage medium
CN110704597A (en) * 2019-09-29 2020-01-17 北京金山安全软件有限公司 Dialogue system reliability verification method, model generation method and device
CN110825930A (en) * 2019-11-01 2020-02-21 北京邮电大学 Method for automatically identifying correct answers in community question-answering forum based on artificial intelligence
CN111090742A (en) * 2019-12-19 2020-05-01 东软集团股份有限公司 Question and answer pair evaluation method and device, storage medium and equipment
CN111241258A (en) * 2020-01-08 2020-06-05 泰康保险集团股份有限公司 Data cleaning method and device, computer equipment and readable storage medium
CN111783473A (en) * 2020-07-14 2020-10-16 腾讯科技(深圳)有限公司 Method and device for identifying best answer in medical question and answer and computer equipment
CN111798285A (en) * 2019-09-30 2020-10-20 北京京东尚科信息技术有限公司 Information generation method and device
CN112131354A (en) * 2020-11-26 2020-12-25 广州华多网络科技有限公司 Answer screening method and device, terminal equipment and computer readable storage medium
CN113590790A (en) * 2021-07-30 2021-11-02 北京壹心壹翼科技有限公司 Question retrieval method, device, equipment and medium applied to multiple rounds of question answering

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079026A (en) * 2007-07-02 2007-11-28 北京百问百答网络技术有限公司 Text similarity, acceptation similarity calculating method and system and application system
CN104573028A (en) * 2015-01-14 2015-04-29 百度在线网络技术(北京)有限公司 Intelligent question-answer implementing method and system
CN106844530A (en) * 2016-12-29 2017-06-13 北京奇虎科技有限公司 Training method and device of a kind of question and answer to disaggregated model
CN107608999A (en) * 2017-07-17 2018-01-19 南京邮电大学 A kind of Question Classification method suitable for automatically request-answering system
CN107977676A (en) * 2017-11-24 2018-05-01 北京神州泰岳软件股份有限公司 Text similarity computing method and device
CN108182175A (en) * 2017-12-29 2018-06-19 中国银联股份有限公司 A kind of text quality's index selection method and device
CN108205684A (en) * 2017-04-25 2018-06-26 北京市商汤科技开发有限公司 Image disambiguation method, device, storage medium and electronic equipment
WO2018147543A1 (en) * 2017-02-08 2018-08-16 한국과학기술원 Concept graph based query-response system and context search method using same

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079026A (en) * 2007-07-02 2007-11-28 北京百问百答网络技术有限公司 Text similarity, acceptation similarity calculating method and system and application system
CN104573028A (en) * 2015-01-14 2015-04-29 百度在线网络技术(北京)有限公司 Intelligent question-answer implementing method and system
CN106844530A (en) * 2016-12-29 2017-06-13 北京奇虎科技有限公司 Training method and device of a kind of question and answer to disaggregated model
WO2018147543A1 (en) * 2017-02-08 2018-08-16 한국과학기술원 Concept graph based query-response system and context search method using same
CN108205684A (en) * 2017-04-25 2018-06-26 北京市商汤科技开发有限公司 Image disambiguation method, device, storage medium and electronic equipment
CN107608999A (en) * 2017-07-17 2018-01-19 南京邮电大学 A kind of Question Classification method suitable for automatically request-answering system
CN107977676A (en) * 2017-11-24 2018-05-01 北京神州泰岳软件股份有限公司 Text similarity computing method and device
CN108182175A (en) * 2017-12-29 2018-06-19 中国银联股份有限公司 A kind of text quality's index selection method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
潘炜等: "《列表问答系统中的答案聚类重排序》", 《计算机应用与软件》 *
蔡丽艳: "《数据挖掘算法及其应用研究》", 28 February 2013, 电子科技大学出版社 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110119770A (en) * 2019-04-28 2019-08-13 平安科技(深圳)有限公司 Decision-tree model construction method, device, electronic equipment and medium
CN110119770B (en) * 2019-04-28 2024-05-14 平安科技(深圳)有限公司 Decision tree model construction method, device, electronic equipment and medium
CN110516027A (en) * 2019-07-22 2019-11-29 北京达佳互联信息技术有限公司 Update method, device, electronic equipment and the storage medium of information aggregate
CN110516027B (en) * 2019-07-22 2022-04-22 北京达佳互联信息技术有限公司 Information set updating method and device, electronic equipment and storage medium
CN110674276A (en) * 2019-09-23 2020-01-10 深圳前海微众银行股份有限公司 Robot self-learning method, robot terminal, device and readable storage medium
CN110704597A (en) * 2019-09-29 2020-01-17 北京金山安全软件有限公司 Dialogue system reliability verification method, model generation method and device
CN110704597B (en) * 2019-09-29 2022-07-29 北京金山安全软件有限公司 Dialogue system reliability verification method, model generation method and device
CN111798285A (en) * 2019-09-30 2020-10-20 北京京东尚科信息技术有限公司 Information generation method and device
CN110825930A (en) * 2019-11-01 2020-02-21 北京邮电大学 Method for automatically identifying correct answers in community question-answering forum based on artificial intelligence
CN111090742A (en) * 2019-12-19 2020-05-01 东软集团股份有限公司 Question and answer pair evaluation method and device, storage medium and equipment
CN111090742B (en) * 2019-12-19 2024-05-17 东软集团股份有限公司 Question-answer pair evaluation method, question-answer pair evaluation device, storage medium and equipment
CN111241258A (en) * 2020-01-08 2020-06-05 泰康保险集团股份有限公司 Data cleaning method and device, computer equipment and readable storage medium
CN111783473A (en) * 2020-07-14 2020-10-16 腾讯科技(深圳)有限公司 Method and device for identifying best answer in medical question and answer and computer equipment
CN111783473B (en) * 2020-07-14 2024-02-13 腾讯科技(深圳)有限公司 Method and device for identifying best answer in medical question and answer and computer equipment
CN112131354A (en) * 2020-11-26 2020-12-25 广州华多网络科技有限公司 Answer screening method and device, terminal equipment and computer readable storage medium
CN112131354B (en) * 2020-11-26 2021-04-16 广州华多网络科技有限公司 Answer screening method and device, terminal equipment and computer readable storage medium
CN113590790A (en) * 2021-07-30 2021-11-02 北京壹心壹翼科技有限公司 Question retrieval method, device, equipment and medium applied to multiple rounds of question answering
CN113590790B (en) * 2021-07-30 2023-11-28 北京壹心壹翼科技有限公司 Question retrieval method, device, equipment and medium applied to multi-round question and answer

Similar Documents

Publication Publication Date Title
CN109472305A (en) Answer quality determines model training method, answer quality determination method and device
CN112632385B (en) Course recommendation method, course recommendation device, computer equipment and medium
CN110399541B (en) Topic recommendation method and device based on deep learning and storage medium
Quoc Viet Hung et al. An evaluation of aggregation techniques in crowdsourcing
US20190354887A1 (en) Knowledge graph based learning content generation
CN107004371B (en) Measurement to education content effect
CN110008397B (en) Recommendation model training method and device
KR102265573B1 (en) Method and system for reconstructing mathematics learning curriculum based on artificial intelligence
CN111008336A (en) Content recommendation method, device and equipment and readable storage medium
CN103324636A (en) System and method for recommending friend in social network
CN110019163A (en) Method, system, equipment and the storage medium of prediction, the recommendation of characteristics of objects
CN107818491A (en) Electronic installation, Products Show method and storage medium based on user's Internet data
CN112100221B (en) Information recommendation method and device, recommendation server and storage medium
CN106537387B (en) Retrieval/storage image associated with event
CN108304428A (en) Information recommendation method and device
CN114037545A (en) Client recommendation method, device, equipment and storage medium
CN113705792A (en) Personalized recommendation method, device, equipment and medium based on deep learning model
CN111639485A (en) Course recommendation method based on text similarity and related equipment
CN107665202B (en) Method and device for constructing interest model and electronic equipment
Imran et al. A framework to provide personalization in learning management systems through a recommender system approach
Nafea et al. A novel algorithm for dynamic student profile adaptation based on learning styles
Huang et al. Expert recommendation via tensor factorization with regularizing hierarchical topical relationships
CN106779929A (en) A kind of Products Show method, device and computing device
CN111080025A (en) Learning feature data processing method and device and electronic equipment
CN110020214A (en) A kind of social networks streaming events detection system merging knowledge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 101-8, 1st floor, building 31, area 1, 188 South Fourth Ring Road West, Fengtai District, Beijing

Applicant after: Guoxin Youyi Data Co., Ltd

Address before: 100070, No. 188, building 31, headquarters square, South Fourth Ring Road West, Fengtai District, Beijing

Applicant before: SIC YOUE DATA Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190315