CN106844530A - Training method and device of a kind of question and answer to disaggregated model - Google Patents
Training method and device of a kind of question and answer to disaggregated model Download PDFInfo
- Publication number
- CN106844530A CN106844530A CN201611249261.2A CN201611249261A CN106844530A CN 106844530 A CN106844530 A CN 106844530A CN 201611249261 A CN201611249261 A CN 201611249261A CN 106844530 A CN106844530 A CN 106844530A
- Authority
- CN
- China
- Prior art keywords
- answer
- question
- data
- feature
- keyword
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
Training method and device of a kind of question and answer to disaggregated model are the embodiment of the invention provides, the method includes:Question and answer are obtained to data;From the question and answer to extracting data question and answer to feature;Tag along sort is marked to data to the question and answer to the quality of data according to the question and answer;Question and answer are trained to disaggregated model to feature and the tag along sort using the question and answer.Using question and answer to a large amount of training sets of quality automatic marking of data, training question and answer are classified to disaggregated model, i.e. forecast quality divides, avoid artificial strategy, it is few so as to avoid the characteristic information that artificial strategy utilizes, user's active feedback rate is low, rely on the subjective judgement of quizmaster, advertisement cheating is serious, the problems such as user causes tactful unstable to the question and answer of data and history to new question and answer to the feedback information imbalance of data, history question and answer to data and the new question and answer for producing in data, all obtaining preferable predictablity rate.
Description
Technical field
The present invention relates to the technical field of computer disposal, training method of more particularly to a kind of question and answer to disaggregated model
With a kind of question and answer to the trainer of disaggregated model.
Background technology
At present, there are many interactive answer platforms, user proposes the problem of oneself on answer platform, asks on network
Answer platform to start other users to answer, solve the query of quizmaster.
Answer platform have accumulated substantial amounts of user, produce the question and answer of magnanimity to data (i.e. question and answer), wherein, question and answer
There is height to have the quality of data low, a low-quality question and answer are relatively low to the value of data, influence Consumer's Experience, and it is high-quality
Question and answer, to data, are the significant data resources of answer platform.
To excavate high-quality question and answer to data, traditional method is based on artificial policy calculation quality point, by carrying
The person of asking or other users design a strategy to judge quality of the question and answer to data to the feedback information of answer.
For example, setting interactive button on answer platform, praise label and step on label, for other users interaction, work as quizmaster
Answer is set to " optimum answer ", or, praise the quantity that label is clicked and exceed when stepping on quantity that label is clicked, can sentence
Fixed this answer is a preferable answer of quality.
But, the characteristic information that artificial strategy is utilized is few, and user's active feedback rate is low, relies on the subjective judgement of quizmaster,
Advertisement cheating is serious, and user causes plan to the question and answer of data and history to new question and answer to the feedback information imbalance of data
It is slightly unstable, cause question and answer relatively low to the accuracy rate of data.
Especially, to data, because lacking user feedback, question and answer are lower to the accuracy rate of data for the new question and answer for producing.
The content of the invention
In view of the above problems, it is proposed that the present invention so as to provide one kind overcome above mentioned problem or at least in part solve on
State the training method and a kind of accordingly question and answer trainer to disaggregated model of a kind of question and answer to disaggregated model of problem.
According to one aspect of the present invention, there is provided a kind of question and answer to the training method of disaggregated model, including:
Question and answer are obtained to data;
From the question and answer to extracting data question and answer to feature;
Tag along sort is marked to data to the question and answer to the quality of data according to the question and answer;
Question and answer are trained to disaggregated model to feature and the tag along sort using the question and answer.
Alternatively, the question and answer include following one or more to feature:
Quizmaster's feature, answerer's feature, question and answer are to text semantic feature, question and answer to numerical characteristic, user feedback feature.
Alternatively, the question and answer include question and answer to data, and the question and answer include question and answer pair to text semantic feature
Pairing feature;
It is described from the question and answer to extracting data question and answer to feature the step of include:
Search the word pair of the lexical item co-occurrence in lexical item and the answer in described problem;
The quantity of the word pair of the co-occurrence is counted, as question and answer to pairing feature.
Alternatively, the question and answer include question and answer to data, and the question and answer include question and answer pair to text semantic feature
Minimal routing distance;
It is described from the question and answer to extracting data question and answer to feature the step of include:
Keyword is extracted from described problem, key to the issue set of words is generated;
Keyword is extracted from the answer, answer keyword set is generated;
Calculate similarity between described problem keyword set and the answer keyword set;
The similarity is accumulated, question and answer is obtained to Minimal routing distance.
Alternatively, the question and answer include question and answer to data, and the question and answer include question and answer pair to text semantic feature
Sentence similarity;
It is described from the question and answer to extracting data question and answer to feature the step of include:
Described problem is converted into the first sentence vector;
The answer is converted into the second sentence vector;
Calculate that first sentence is vectorial and similarity between second sentence vector, it is similar to sentence as question and answer
Degree.
Alternatively, it is described the step of mark tag along sort to data to the question and answer to the quality of data according to the question and answer
Including:
The search record data recorded when searching the search question and answer to data;
Tag along sort is marked to data to the question and answer according to the search record data.
Alternatively, it is described to be wrapped according to described search the step of record data marks tag along sort to the question and answer to data
Include:
Excavate average click weight of the question and answer to data under search keyword;
Excavate last time of the question and answer to data under search keyword and click on weight;
Weight is clicked on using the average click weight and the last time and is fitted continuous score value;
Tag along sort is turned to by the continuous score value is discrete.
Alternatively, it is described excavate the question and answer to data under search keyword average click weight the step of include:
Record address of the question and answer to the affiliated webpage of data;
Calculate click score value of the address under specified search keyword;
Click score value distributed intelligence of the address under specified search keyword is calculated using the click score value;
Average click of the question and answer to data under search keyword is calculated using click score value distributed intelligence to weigh
Weight.
Alternatively, it is described calculate the address under specified search keyword click score value the step of include:
Count number of clicks of the address under specified keyword;
The searching times of the keyword that statistics is specified;
Click of the address under specified search keyword is calculated using the number of clicks and the searching times
Score value.
Alternatively, the step of last time for excavating the question and answer to data under search keyword clicks on weight is wrapped
Include:
Record address of the question and answer to the affiliated webpage of data;
Calculate the address and click on score value for the last time under specified search keyword;
Score value is clicked on using the last time and calculates last time point of the question and answer to data under search keyword
Hit weight;
Alternatively, described calculating the step of score value is clicked in the address for the last time under specified search keyword is wrapped
Include:
Count the number of clicks of address last time under specified keyword;
The searching times of the keyword that statistics is specified;
The address is calculated with the searching times using the number of clicks of the last time crucial in specified search
Score value is clicked under word for the last time.
Alternatively, in the step that to the quality of data the question and answer are marked with tag along sort according to the question and answer to data
After rapid, methods described also includes:
The question and answer are normalized to feature.
Alternatively, it is described to include the step of be normalized to feature to the question and answer:
Statistics is per one-dimensional question and answer to the average value and standard deviation of feature;
The average value will be subtracted to feature per one-dimensional question and answer, divided by the standard deviation.
Alternatively, in the step that to the quality of data the question and answer are marked with tag along sort according to the question and answer to data
After rapid, methods described also includes:
Data are adjusted to current question and answer to the tag along sort of data according to neighbouring question and answer.
Alternatively, the neighbouring question and answer of the basis are adjusted to current question and answer to data to the tag along sort of data
Step includes:
The question and answer are clustered to data;
For each question and answer to data, the question and answer of the N number of neighbour after selection cluster are to data;
Current question and answer are calculated to the question and answer of data and the neighbour to the distance between data;
Tag along sort is fitted based on the distance again.
Alternatively, also include:
Recognize the question and answer to feature for the question and answer to the significance level of disaggregated model;
M question and answer of significance level highest are extended to feature, the question and answer after being extended are returned and performed to feature
It is described the step of mark tag along sort to data to the question and answer to the quality of data according to the question and answer.
According to another aspect of the present invention, there is provided a kind of question and answer to the trainer of disaggregated model, including:
Question and answer are suitable to obtain question and answer to data to data acquisition module;
Question and answer are suitable to from the question and answer to extracting data question and answer to feature to characteristic extracting module;
Tag along sort labeling module, is suitable to mark data the question and answer quality of data according to the question and answer and classifies
Label;
Model training module, is suitable for use with the question and answer and trains question and answer to disaggregated model to feature and the tag along sort.
Alternatively, the question and answer include following one or more to feature:
Quizmaster's feature, answerer's feature, question and answer are to text semantic feature, question and answer to numerical characteristic, user feedback feature.
Alternatively, the question and answer include question and answer to data, and the question and answer include question and answer pair to text semantic feature
Pairing feature;
The question and answer are further adapted for characteristic extracting module:
Search the word pair of the lexical item co-occurrence in lexical item and the answer in described problem;
The quantity of the word pair of the co-occurrence is counted, as question and answer to pairing feature.
Alternatively, the question and answer include question and answer to data, and the question and answer include question and answer pair to text semantic feature
Minimal routing distance;
The question and answer are further adapted for characteristic extracting module:
Keyword is extracted from described problem, key to the issue set of words is generated;
Keyword is extracted from the answer, answer keyword set is generated;
Calculate similarity between described problem keyword set and the answer keyword set;
The similarity is accumulated, question and answer is obtained to Minimal routing distance.
Alternatively, the question and answer include question and answer to data, and the question and answer include question and answer pair to text semantic feature
Sentence similarity;
The question and answer are further adapted for characteristic extracting module:
Described problem is converted into the first sentence vector;
The answer is converted into the second sentence vector;
Calculate that first sentence is vectorial and similarity between second sentence vector, it is similar to sentence as question and answer
Degree.
Alternatively, the tag along sort labeling module is further adapted for:
The search record data recorded when searching the search question and answer to data;
Tag along sort is marked to data to the question and answer according to the search record data.
Alternatively, the tag along sort labeling module is further adapted for:
Excavate average click weight of the question and answer to data under search keyword;
Excavate last time of the question and answer to data under search keyword and click on weight;
Weight is clicked on using the average click weight and the last time and is fitted continuous score value;
Tag along sort is turned to by the continuous score value is discrete.
Alternatively, the tag along sort labeling module is further adapted for:
Record address of the question and answer to the affiliated webpage of data;
Calculate click score value of the address under specified search keyword;
Click score value distributed intelligence of the address under specified search keyword is calculated using the click score value;
Average click of the question and answer to data under search keyword is calculated using click score value distributed intelligence to weigh
Weight.
Alternatively, the tag along sort labeling module is further adapted for:
Count number of clicks of the address under specified keyword;
The searching times of the keyword that statistics is specified;
Click of the address under specified search keyword is calculated using the number of clicks and the searching times
Score value.
Alternatively, the tag along sort labeling module is further adapted for:
Record address of the question and answer to the affiliated webpage of data;
Calculate the address and click on score value for the last time under specified search keyword;
Score value is clicked on using the last time and calculates last time point of the question and answer to data under search keyword
Hit weight;
Alternatively, the tag along sort labeling module is further adapted for:
Count the number of clicks of address last time under specified keyword;
The searching times of the keyword that statistics is specified;
The address is calculated with the searching times using the number of clicks of the last time crucial in specified search
Score value is clicked under word for the last time.
Alternatively, also include:
Normalization module, is suitable to be normalized feature the question and answer.
Alternatively, the normalization module is further adapted for:
Statistics is per one-dimensional question and answer to the average value and standard deviation of feature;
The average value will be subtracted to feature per one-dimensional question and answer, divided by the standard deviation.
Alternatively, also include:
Tag along sort adjusting module, is suitable to according to neighbouring question and answer to data to current question and answer to the tag along sort of data
It is adjusted.
Alternatively, the tag along sort adjusting module is further adapted for:
The question and answer are clustered to data;
For each question and answer to data, the question and answer of the N number of neighbour after selection cluster are to data;
Current question and answer are calculated to the question and answer of data and the neighbour to the distance between data;
Tag along sort is fitted based on the distance again.
Alternatively, also include:
Significance level identification module, be suitable to recognize the question and answer to feature for the question and answer to the important journey of disaggregated model
Degree;
Question and answer are suitable to be extended feature M question and answer of significance level highest to feature expansion module, are extended
Question and answer afterwards call the model training module to feature, return.
The embodiment of the present invention proposes the quality point computational methods based on machine learning, and comprehensive utilization question and answer are to each of data
The question and answer of dimension are planted to feature, using question and answer to a large amount of training sets of quality automatic marking of data, question and answer is trained to disaggregated model
Classified, i.e. forecast quality point, it is to avoid artificial strategy, it is few so as to avoid the characteristic information that artificial strategy utilizes, use householder
Dynamic feedback rates are low, rely on the subjective judgement of quizmaster, and advertisement cheating is serious, and user is to new question and answer to data and history
The problems such as question and answer cause tactful unstable to the feedback information imbalance of data, data and new generation are asked in the question and answer of history
Answer questions in data, all obtain preferable predictablity rate.
Described above is only the general introduction of technical solution of the present invention, in order to better understand technological means of the invention,
And can be practiced according to the content of specification, and in order to allow the above and other objects of the present invention, feature and advantage can
Become apparent, below especially exemplified by specific embodiment of the invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, various other advantages and benefit is common for this area
Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention
Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
The step of Fig. 1 shows a kind of question and answer according to an embodiment of the invention to the training method of disaggregated model flow
Figure;
The step of Fig. 2 shows another question and answer according to an embodiment of the invention to the training method of disaggregated model is flowed
Cheng Tu;
Fig. 3 shows a kind of structural frames of trainer of the question and answer according to an embodiment of the invention to disaggregated model
Figure;And
Fig. 4 shows the structural frames of trainer of another question and answer according to an embodiment of the invention to disaggregated model
Figure.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
Limited.Conversely, there is provided these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure
Complete conveys to those skilled in the art.
Reference picture 1, shows a kind of step of training method of the question and answer according to an embodiment of the invention to disaggregated model
Rapid flow chart, specifically may include steps of:
Step 101, obtains question and answer to data.
Question and answer are to data (Questin&Answer, Q&A), including question and answer.
For example, problem " Mountain Everest is how high" with " 8844 meters " of answer one question and answer of composition to data.
Because question and answer have one or more answers to data, therefore, a problem can group with one or more answers
Into one or more question and answer to data.
Step 102, from the question and answer to extracting data question and answer to feature.
In embodiments of the present invention, by Feature Engineering, question and answer are embodied to feature from question and answer to extracting data question and answer
To the information of data characteristics.
In implementing, question and answer include following one or more to feature:
1st, quizmaster's feature
Quizmaster is characterized as the feature of the user (i.e. quizmaster) of proposition problem, for example:
Answer_count_questioner | Quizmaster answers answer quantity |
Question_posted_count | Quizmaster asks a question quantity |
bestA_count_questioner | The optimum answer quantity that quizmaster is answered |
bestA_ratio_questioner | The optimum answer accounting that quizmaster is answered |
2nd, answerer's feature
Answerer is characterized as the feature of the user (i.e. answerer) for answering a question, for example:
bestA_ratio_answerer | Optimum answer accounting of the answerer within a season |
A_count_answerer | Answer quantity of the answerer within a season |
bestA_ratio_answerer | Optimum answer quantity of the answerer within a season |
Q_count_answerer | Answerer asks a question quantity within a season |
Status_answerer | Identity of the answerer on question and answer website |
Accept_percent_answerer | The answer of answerer is adopted rate on question and answer website |
3rd, question and answer are to text semantic feature
Question and answer are characterized as semantic feature of the question and answer to data to text semantic.
In an example of the embodiment of the present invention, question and answer include question and answer to pairing feature to text semantic feature
(topic_focus_count_qa), then in this example, step 102 can include following sub-step:
Sub-step 1021, searches the word pair of the lexical item co-occurrence in lexical item and the answer in described problem;
Sub-step 1022, counts the quantity of the word pair of the co-occurrence, as question and answer to pairing feature.
Question and answer are characterized in a numerical characteristic to pairing, refer to the quantity of the word pair of problem and answer co-occurrence.
One pairing dictionary of generation when excavating, in substantial amounts of question and answer in data, entity lexical item in statistical problem and
The word pair of lexical item co-occurrence in the lexical items, with answer such as focus lexical item.
For example, in problem " Mountain Everest is how high ", " Mountain Everest " is problem main body (i.e. entity lexical item), " has
It is many high " be " 8848 ", " 8848 meters " in problem focus (i.e. focus lexical item), with answer be high frequency co-occurrence word pair.
Because problem has various way to put questions, therefore, this is characterized as the quantity of co-occurrence word pair, for example:
Lexical item in problem | Lexical item in answer | Statistical indicator |
Mountain Everest | 8848 | 2.759 2.951 5.710 211 3466 1230 |
Mountain Everest | 8844 | 10.255 10.752 21.007 408 3466 1419 |
Mountain Everest | 8848 meters | 0.477 0.534 1.011 78 3466 231 |
Mountain Everest | 8844 meters | 0.282 0.316 0.598 73 3466 134 |
It is how many | 8848 meters | 0.000 0.000 0.000 1 45878 231 |
It is how many | 8848 | 0.000 0.000 0.000 2 45878 1230 |
It is how many | 008848 meter | 0.000 0.000 0.000 2 45878 3 |
In another example of the embodiment of the present invention, question and answer include question and answer to Minimal routing distance text semantic feature
(Word_mover_distance), then in this example, step 102 can include following sub-step:
Sub-step 1023, extracts keyword from described problem, generates key to the issue set of words;
Sub-step 1024, extracts keyword from the answer, generates answer keyword set;
Sub-step 1025, calculates similarity between described problem keyword set and the answer keyword set;
Sub-step 1026, the similarity is accumulated, and obtains question and answer to Minimal routing distance.
In this example, question and answer can be between key to the issue set of words and answer keyword set to Minimal routing distance
Cartesian product cumulative and.
The similarity (such as cosine similarity) of lexical item two-by-two in first computational problem keyword set and answer keyword set,
A numerical value is summed into again.
For example, selecting preceding 5 lexical items in key to the issue set of words, the selection of answer keyword set is first 15, calculates 75 pairs
The cosine similarity of lexical item, accumulates together and obtain the question and answer to Minimal routing distance.
In another example of the embodiment of the present invention, question and answer include question and answer to sentence similarity text semantic feature
(Cosine_sim_qa), then in this example, step 102 can include following sub-step:
Sub-step 1027, the first sentence vector is converted to by described problem;
Sub-step 1028, the second sentence vector is converted to by the answer;
Sub-step 1029, calculates that first sentence is vectorial and similarity between the second sentence vector, used as asking
Answer questions sentence similarity.
In this example, used as sentence vector, answer is used as sentence vector for problem, you can between two sentences vectors of calculating
Similarity (such as cosine similarity).
3rd, question and answer are to numerical characteristic
Question and answer are characterized as the feature of the digitized information of question and answer to numeral, for example:
4th, user feedback feature.
User feedback is characterized as other users (non-quizmaster, answerer) to question and answer to the feature of the feedback information of data.
Certainly, above-mentioned judgement processing method is intended only as example, when the embodiment of the present invention is implemented, can be according to actual feelings
Condition sets other question and answer to feature, and the embodiment of the present invention is not any limitation as to this.In addition, in addition to above-mentioned question and answer are to feature, this
Art personnel can also be according to actual needs using other question and answer to feature, and the embodiment of the present invention is not also limited this
System.
The question and answer are marked tag along sort by step 103 to the quality of data according to the question and answer to data.
In the embodiment of the present invention, question and answer can be divided into multiple class to the quality of data, multiple classification are corresponded to respectively
Label, using quality as a polytypic problem.
For example, quality is set up separately being set to three class:It is good, general, poor, three tag along sorts are corresponded to respectively:4、2、0.
In one embodiment of the invention, step 103 can include following sub-step:
Sub-step 1031, the search record data recorded when searching the search question and answer to data;
The question and answer are marked tag along sort by sub-step 1032 according to the search record data to data.
In embodiments of the present invention, because user searches asking for question and answer website when search engine is scanned for, often
Data are answered questions as Search Results, operation of the record user to the question and answer to data can form search record data, and storage exists
In the daily record session log of search engine.
Because the behavior of user can to a certain extent embody quality of the question and answer to data, therefore, it can by searching
Rope record data marks tag along sort to question and answer to data.
In one embodiment of the invention, sub-step 1032 can further include following sub-step:
Sub-step 10321, excavates average click weight of the question and answer to data under search keyword (query)
(avg_click_docwei)。
In an example of the embodiment of the present invention, sub-step 10321 can further include following sub-step:
Sub-step 103211, records address of the question and answer to webpage belonging to data (pair), such as URL (Uniform
Resource Locator, URL).
It should be noted that a question and answer are a document to data, i.e., one URL.
Sub-step 103212, calculates click score value (score) of the address under specified search keyword (query).
In a kind of calculation, number of clicks of the address (URL) under specified keyword (query) can be counted
(counting of click, i.e. query_url_pair)
The searching times (search_count) of the keyword (query) that statistics is specified.
Click score value of the address under specified search keyword is calculated using number of clicks and searching times, for example, point
The ratio of the product hit between number of times and number of clicks, the product and searching times, as click score value, i.e. score=
click*click/search_count。
Sub-step 103213, the point of address (URL) under specified search keyword is calculated using score value (score) is clicked on
Hit score value distributed intelligence (dwei).
For example, dwei=score/norm*100, wherein, norm is normalization factor
Sub-step 103214, the question and answer are calculated to data under search keyword using the click score value distributed intelligence
Average click weight (avg_click_docwei).
For example,Wherein, n is click on the keyword of the address (URL)
(query) quantity.
Sub-step 10322, excavates last time of the question and answer to data under search keyword and clicks on weight (last_
click_docwei)。
In an example of the embodiment of the present invention, sub-step 10322 can further include following sub-step:
Sub-step 103221, records address (URL) of the question and answer to webpage belonging to data (pair).
Sub-step 103222, calculates the address and clicks on score value (last_ for the last time under specified search keyword
click_score)。
In a kind of calculation, the point of address (URL) last time under specified keyword (query) can be counted
Hit number of times (last_click).
The searching times (search_count) of the keyword (query) that statistics is specified.
Address is calculated using the number of clicks (last_click) of last time with searching times (search_count) to exist
Score value (last_click_score) is clicked on for the last time under the search keyword (query) specified.
For example, the product between the number of clicks of last time and the number of clicks of last time, the product and search time
Several ratio, as click score value, i.e. last_click_score=last_click*last_click/search_count.
Sub-step 103223, clicks on score value (last_click_score) and calculates the question and answer pair using the last time
Last time of the data under search keyword clicks on weight (last_click_docwei).
For example, clicking on score value to last time configures default weight, you can obtain last time and click on weight, such as
Last_click_docwei=0.60*last_click_score.
Sub-step 10323, clicks on weight and is fitted continuous score value using the average click weight and the last time
(QA_score)。
In implementing, will averagely click on weight and click on the continuous score value of acquisition by weight is added with last time, i.e.,
QA_score=avg_click_docwei+last_click_docwei.
Sub-step 10324, turns to tag along sort (label) by the continuous score value is discrete.
Value after continuous score value (QA_score) discretization is referred to, you can as tag along sort (label).
For example, continuous score value (QA_score) is discretized into 4,2 or 0, represent question and answer to the quality of data preferably, typically
Or it is poor.
Step 104, question and answer are trained to disaggregated model using the question and answer to feature and the tag along sort.
Because random forest (Random Forest, RF) is a class Ensemble Learning Algorithms, to missing data and nonequilibrium
Data are more sane, therefore, in embodiments of the present invention, Random Forest model can be selected to question and answer to feature and tag along sort
To disaggregated model, the question and answer can be used for classifying data question and answer and (divide quality shelves training question and answer to disaggregated model
It is secondary), new question and answer to the question and answer of data and history to data in can obtain preferable effect.
Certainly, in addition to random forest, question and answer can also be trained to disaggregated model using other modes, for example, SVM
(Support Vector Machine, SVMs), CNN (Convolutional Neural Network, convolutional Neural
Network), etc., the embodiment of the present invention is not any limitation as to this.
The embodiment of the present invention proposes the quality point computational methods based on machine learning, and comprehensive utilization question and answer are to each of data
The question and answer of dimension are planted to feature, using question and answer to a large amount of training sets of quality automatic marking of data, question and answer is trained to disaggregated model
Classified, i.e. forecast quality point, it is to avoid artificial strategy, it is few so as to avoid the characteristic information that artificial strategy utilizes, use householder
Dynamic feedback rates are low, rely on the subjective judgement of quizmaster, and advertisement cheating is serious, and user is to new question and answer to data and history
The problems such as question and answer cause tactful unstable to the feedback information imbalance of data, data and new generation are asked in the question and answer of history
Answer questions in data, all obtain preferable predictablity rate.
Reference picture 2, shows another question and answer according to an embodiment of the invention to the training method of disaggregated model
Flow chart of steps, specifically may include steps of:
Step 201, obtains question and answer to data.
Step 202, from the question and answer to extracting data question and answer to feature.
The question and answer are marked tag along sort by step 203 to the quality of data according to the question and answer to data.
The question and answer are normalized by step 204 to feature.
In embodiments of the present invention, question and answer are standardized to multidimensional (such as 24 dimension) feature of data, to per one-dimensional spy
Levy and normalize.
In implementing, the every one-dimensional question and answer of statistics, will be per one-dimensional question and answer to feature to the average value and standard deviation of feature
Subtract average value, be used to be used during model prediction divided by standard deviation, preservation average value and standard deviation.
Normalization in the embodiment of the present invention can make that random noise information is positive and negative to offset, and strengthen the effect of validity feature,
Effectively the model such as training random forest, obtains more preferable generalization ability.
Data are adjusted by step 205 to current question and answer according to neighbouring question and answer to the tag along sort of data.
Due to there may be noise in the click behavior of user, the tag along sort (label) being fitted to also likely to be present makes an uproar
Sound, therefore, in embodiments of the present invention, the distribution of tag along sort (label) can be finely tuned.
In implementing, with similar question and answer to the question and answer of feature to data, its continuous score value (QA_score) also close to,
Because threshold value selection is improper label may be caused different during discretization, therefore, it can by the question and answer of neighbour to data to current
Question and answer the tag along sort of data is adjusted.
In one embodiment of the invention, step 205 can include following sub-step:
Sub-step 2051, the question and answer are clustered to data;
Sub-step 2052, for each question and answer to data, the question and answer of the N number of neighbour after selection cluster are to data;
Sub-step 2053, calculates current question and answer to the question and answer of data and the neighbour to the distance between data;
Sub-step 2054, tag along sort is fitted based on the distance again.
In embodiments of the present invention, it is possible to use KNN (the closest nodes of k-Nearest Neighbor algorithm, K
Algorithm) scheduling algorithm, question and answer are clustered to data.
To each question and answer to data, (N is positive integer to selection N, and such as 100) question and answer of individual neighbour, to data, calculate question and answer
To the question and answer of data and neighbour to the distance (such as Euclidean distance) of data.
Scheduling algorithm is weighted using the Gaussian kernel based on Euclidean distance, the value of tag along sort (label) is fitted again, then it is discrete
Chemical conversion tag along sort, effectively reduces the noise information in tag along sort (label).
Step 206, question and answer are trained to disaggregated model using the question and answer to feature and the tag along sort.
In one example, about 5,000 ten thousand question and answer can be collected to data, 500,000 question and answer logarithms are therefrom randomly choosed
According to for training Random Forest model.
In Random Forest model, 200 trees, the depth 50 of tree, the oobrmse (out-of-bag of model are used
Estimate, the method for weighing the predicated error of RF models) about 0.652314, in new question and answer to data and old question and answer logarithm
According to upper prediction Average Accuracy up to 81%.
Step 207, recognize the question and answer to feature for the question and answer to the significance level of disaggregated model.
M question and answer of significance level highest are extended by step 208 to feature, the question and answer after being extended to feature,
Return and perform step 206.
For question and answer to disaggregated model, importance of each question and answer to feature can be analyzed.
In one example, 10 important question and answer are as shown in the table to feature:
Wherein, answerer's feature and question and answer to text semantic feature mostly in this 10 important question and answer in feature, it is right
Prediction question and answer play effective effect to the quality (classifying) of data.
To feature, (question and answer before extending are to spy using 24 basic question and answer for Random Forest model in step 206
Levy), model in new question and answer to data and old question and answer to the prediction Average Accuracy in data up to 81%.
Basic question and answer are extended to feature by the way of cartesian product conversion, M (M is positive integer) individual expansion is obtained
The question and answer of exhibition represent and basic question and answer are to the interaction effect between feature, express basic question and answer to feature and expand to feature
The question and answer of exhibition expand generalization ability of the question and answer to disaggregated model to the synergy between feature, so as to improve question and answer to dividing
The predictablity rate of class model.
If the preceding 10 important question and answer of selection do cartesian product conversion to feature, 45 question and answer of extension are to feature, part
The question and answer of extension are as follows to feature:
Basic question and answer to feature and the question and answer of extension to feature totally 69 features, the random depth woods model of re -training, its
Its parameter constant, the oobrmse about 0.414505 of model, model prediction Average Accuracy increases by 3 percentage points, reaches 84%.
Question and answer after feature extension are averagely accurate to the prediction in data to data and old question and answer in new question and answer to model
True rate up to 84%, better than the conventional method based on artificial strategy in old question and answer to the accuracy rate 74% in data, and, pass
System method cannot be applied to prediction of the new question and answer to data.
For embodiment of the method, in order to be briefly described, therefore it is all expressed as a series of combination of actions, but this area
Technical staff should know that the embodiment of the present invention is not limited by described sequence of movement, because implementing according to the present invention
Example, some steps can sequentially or simultaneously be carried out using other.Secondly, those skilled in the art should also know, specification
Described in embodiment belong to preferred embodiment, necessary to the involved action not necessarily embodiment of the present invention.
Reference picture 3, shows a kind of knot of trainer of the question and answer according to an embodiment of the invention to disaggregated model
Structure block diagram, can specifically include such as lower module:
Question and answer are suitable to obtain question and answer to data to data acquisition module 301;
Question and answer are suitable to from the question and answer to extracting data question and answer to feature to characteristic extracting module 302;
Tag along sort labeling module 303, is suitable to mark data the question and answer quality of data according to the question and answer
Tag along sort;
Model training module 304, is suitable for use with the question and answer and question and answer is trained to feature and the tag along sort to classification mould
Type.
In implementing, the question and answer include following one or more to feature:
Quizmaster's feature, answerer's feature, question and answer are to text semantic feature, question and answer to numerical characteristic, user feedback feature.
In an example of the embodiment of the present invention, the question and answer include question and answer to data, and the question and answer are to text
This semantic feature includes question and answer to pairing feature;
The question and answer are further adapted for characteristic extracting module 302:
Search the word pair of the lexical item co-occurrence in lexical item and the answer in described problem;
The quantity of the word pair of the co-occurrence is counted, as question and answer to pairing feature.
In another example of the embodiment of the present invention, the question and answer include question and answer, the question and answer pair to data
Text semantic feature includes question and answer to Minimal routing distance;
The question and answer are further adapted for characteristic extracting module 302:
Keyword is extracted from described problem, key to the issue set of words is generated;
Keyword is extracted from the answer, answer keyword set is generated;
Calculate similarity between described problem keyword set and the answer keyword set;
The similarity is accumulated, question and answer is obtained to Minimal routing distance.
In another example of the embodiment of the present invention, the question and answer include question and answer, the question and answer pair to data
Text semantic feature includes question and answer to sentence similarity;
The question and answer are further adapted for characteristic extracting module 302:
Described problem is converted into the first sentence vector;
The answer is converted into the second sentence vector;
Calculate that first sentence is vectorial and similarity between second sentence vector, it is similar to sentence as question and answer
Degree.
In one embodiment of the invention, the tag along sort labeling module 303 is further adapted for:
The search record data recorded when searching the search question and answer to data;
Tag along sort is marked to data to the question and answer according to the search record data.
In one embodiment of the invention, the tag along sort labeling module 303 is further adapted for:
Excavate average click weight of the question and answer to data under search keyword;
Excavate last time of the question and answer to data under search keyword and click on weight;
Weight is clicked on using the average click weight and the last time and is fitted continuous score value;
Tag along sort is turned to by the continuous score value is discrete.
In one embodiment of the invention, the tag along sort labeling module 303 is further adapted for:
Record address of the question and answer to the affiliated webpage of data;
Calculate click score value of the address under specified search keyword;
Click score value distributed intelligence of the address under specified search keyword is calculated using the click score value;
Average click of the question and answer to data under search keyword is calculated using click score value distributed intelligence to weigh
Weight.
In one embodiment of the invention, the tag along sort labeling module 303 is further adapted for:
Count number of clicks of the address under specified keyword;
The searching times of the keyword that statistics is specified;
Click of the address under specified search keyword is calculated using the number of clicks and the searching times
Score value.
In one embodiment of the invention, the tag along sort labeling module 303 is further adapted for:
Record address of the question and answer to the affiliated webpage of data;
Calculate the address and click on score value for the last time under specified search keyword;
Score value is clicked on using the last time and calculates last time point of the question and answer to data under search keyword
Hit weight;
In one embodiment of the invention, the tag along sort labeling module 303 is further adapted for:
Count the number of clicks of address last time under specified keyword;
The searching times of the keyword that statistics is specified;
The address is calculated with the searching times using the number of clicks of the last time crucial in specified search
Score value is clicked under word for the last time.
Reference picture 4, shows another question and answer according to an embodiment of the invention to the trainer of disaggregated model
Structured flowchart, can specifically include such as lower module:
Question and answer are suitable to obtain question and answer to data to data acquisition module 401;
Question and answer are suitable to from the question and answer to extracting data question and answer to feature to characteristic extracting module 402;
Tag along sort labeling module 403, is suitable to mark data the question and answer quality of data according to the question and answer
Tag along sort;
Normalization module 404, is suitable to be normalized feature the question and answer.
Tag along sort adjusting module 405, is suitable to according to neighbouring question and answer to data the classification to data to current question and answer
Label is adjusted.
Model training module 406, is suitable for use with the question and answer and question and answer is trained to feature and the tag along sort to classification mould
Type.
Significance level identification module 407, be suitable to recognize the question and answer to feature for the question and answer to the weight of disaggregated model
Want degree;
Question and answer are suitable to be extended feature M question and answer of significance level highest to feature expansion module 408, obtain
To feature, the model training module 406 is called in return to question and answer after extension.
In one embodiment of the invention, the normalization module 404 is further adapted for:
Statistics is per one-dimensional question and answer to the average value and standard deviation of feature;
The average value will be subtracted to feature per one-dimensional question and answer, divided by the standard deviation.
In one embodiment of the invention, the tag along sort adjusting module 405 is further adapted for:
The question and answer are clustered to data;
For each question and answer to data, the question and answer of the N number of neighbour after selection cluster are to data;
Current question and answer are calculated to the question and answer of data and the neighbour to the distance between data;
Tag along sort is fitted based on the distance again.
For device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, it is related
Part is illustrated referring to the part of embodiment of the method.
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein.
Various general-purpose systems can also be used together with based on teaching in this.As described above, construct required by this kind of system
Structure be obvious.Additionally, the present invention is not also directed to any certain programmed language.It is understood that, it is possible to use it is various
Programming language realizes the content of invention described herein, and the description done to language-specific above is to disclose this hair
Bright preferred forms.
In specification mentioned herein, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention
Example can be put into practice in the case of without these details.In some instances, known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify one or more that the disclosure and helping understands in each inventive aspect, exist
Above to the description of exemplary embodiment of the invention in, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor
The application claims of shield features more more than the feature being expressly recited in each claim.More precisely, such as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, and wherein each claim is in itself
All as separate embodiments of the invention.
Those skilled in the art are appreciated that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment
Unit or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or
Sub-component.In addition at least some in such feature and/or process or unit exclude each other, can use any
Combine to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed appoint
Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power
Profit is required, summary and accompanying drawing) disclosed in each feature can the alternative features of or similar purpose identical, equivalent by offer carry out generation
Replace.
Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments
In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment required for protection is appointed
One of meaning mode can be used in any combination.
All parts embodiment of the invention can be realized with hardware, or be run with one or more processor
Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that can use in practice
Microprocessor or digital signal processor (DSP) are set realizing the training of question and answer according to embodiments of the present invention to disaggregated model
The some or all functions of some or all parts in standby.The present invention is also implemented as described here for performing
Method some or all equipment or program of device (for example, computer program and computer program product).This
Sample realizes that program of the invention can be stored on a computer-readable medium, or can have one or more signal
Form.Such signal can be downloaded from internet website and obtained, or be provided on carrier signal, or with any other
Form is provided.
It should be noted that above-described embodiment the present invention will be described rather than limiting the invention, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol being located between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not
Element listed in the claims or step.Word "a" or "an" before element is not excluded the presence of as multiple
Element.The present invention can come real by means of the hardware for including some different elements and by means of properly programmed computer
It is existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch
To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame
Claim.
Claims (10)
1. a kind of question and answer are to the training method of disaggregated model, including:
Question and answer are obtained to data;
From the question and answer to extracting data question and answer to feature;
Tag along sort is marked to data to the question and answer to the quality of data according to the question and answer;
Question and answer are trained to disaggregated model to feature and the tag along sort using the question and answer.
2. the method for claim 1, it is characterised in that the question and answer include following one or more to feature:
Quizmaster's feature, answerer's feature, question and answer are to text semantic feature, question and answer to numerical characteristic, user feedback feature.
3. the method as described in claim any one of 1-2, it is characterised in that the question and answer include question and answer to data,
The question and answer include question and answer to pairing feature to text semantic feature;
It is described from the question and answer to extracting data question and answer to feature the step of include:
Search the word pair of the lexical item co-occurrence in lexical item and the answer in described problem;
The quantity of the word pair of the co-occurrence is counted, as question and answer to pairing feature.
4. the method as described in claim any one of 1-3, it is characterised in that the question and answer include question and answer to data,
The question and answer include question and answer to Minimal routing distance text semantic feature;
It is described from the question and answer to extracting data question and answer to feature the step of include:
Keyword is extracted from described problem, key to the issue set of words is generated;
Keyword is extracted from the answer, answer keyword set is generated;
Calculate similarity between described problem keyword set and the answer keyword set;
The similarity is accumulated, question and answer is obtained to Minimal routing distance.
5. the method as described in claim any one of 1-4, it is characterised in that the question and answer include question and answer to data,
The question and answer include question and answer to sentence similarity text semantic feature;
It is described from the question and answer to extracting data question and answer to feature the step of include:
Described problem is converted into the first sentence vector;
The answer is converted into the second sentence vector;
Calculate that first sentence is vectorial and similarity between second sentence vector, as question and answer to sentence similarity.
6. the method as described in claim any one of 1-5, it is characterised in that it is described according to the question and answer to the quality pair of data
The step of question and answer mark tag along sort to data includes:
The search record data recorded when searching the search question and answer to data;
Tag along sort is marked to data to the question and answer according to the search record data.
7. the method as described in claim any one of 1-6, it is characterised in that it is described according to the search record data to described
The step of question and answer mark tag along sort to data includes:
Excavate average click weight of the question and answer to data under search keyword;
Excavate last time of the question and answer to data under search keyword and click on weight;
Weight is clicked on using the average click weight and the last time and is fitted continuous score value;
Tag along sort is turned to by the continuous score value is discrete.
8. the method as described in claim any one of 1-7, it is characterised in that the excavation question and answer are closed to data in search
The step of average click weight under keyword, includes:
Record address of the question and answer to the affiliated webpage of data;
Calculate click score value of the address under specified search keyword;
Click score value distributed intelligence of the address under specified search keyword is calculated using the click score value;
Average click weight of the question and answer to data under search keyword is calculated using click score value distributed intelligence.
9. the method as described in claim any one of 1-8, it is characterised in that closed in specified search the calculating address
The step of click score value under keyword, includes:
Count number of clicks of the address under specified keyword;
The searching times of the keyword that statistics is specified;
Click score value of the address under specified search keyword is calculated using the number of clicks and the searching times.
10. a kind of question and answer are to the trainer of disaggregated model, including:
Question and answer are suitable to obtain question and answer to data to data acquisition module;
Question and answer are suitable to from the question and answer to extracting data question and answer to feature to characteristic extracting module;
Tag along sort labeling module, is suitable to mark contingency table to data to the question and answer to the quality of data according to the question and answer
Sign;
Model training module, is suitable for use with the question and answer and trains question and answer to disaggregated model to feature and the tag along sort.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611249261.2A CN106844530A (en) | 2016-12-29 | 2016-12-29 | Training method and device of a kind of question and answer to disaggregated model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611249261.2A CN106844530A (en) | 2016-12-29 | 2016-12-29 | Training method and device of a kind of question and answer to disaggregated model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106844530A true CN106844530A (en) | 2017-06-13 |
Family
ID=59114191
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611249261.2A Pending CN106844530A (en) | 2016-12-29 | 2016-12-29 | Training method and device of a kind of question and answer to disaggregated model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106844530A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108932289A (en) * | 2018-05-23 | 2018-12-04 | 北京华建蓝海科技有限责任公司 | One kind being based on the problem of information extraction and deep learning answer treatment method and system |
CN109308319A (en) * | 2018-08-21 | 2019-02-05 | 深圳中兴网信科技有限公司 | File classification method, document sorting apparatus and computer readable storage medium |
CN109472305A (en) * | 2018-10-31 | 2019-03-15 | 国信优易数据有限公司 | Answer quality determines model training method, answer quality determination method and device |
CN109543153A (en) * | 2018-11-13 | 2019-03-29 | 成都数联铭品科技有限公司 | A kind of sequence labelling system and method |
CN109635088A (en) * | 2018-12-13 | 2019-04-16 | 深圳市思迪信息技术股份有限公司 | The training method and device of robot long article notebook data chat |
CN109783617A (en) * | 2018-12-11 | 2019-05-21 | 平安科技(深圳)有限公司 | For replying model training method, device, equipment and the storage medium of problem |
CN109840274A (en) * | 2018-12-28 | 2019-06-04 | 北京百度网讯科技有限公司 | Data processing method and device, storage medium |
CN109995756A (en) * | 2019-02-26 | 2019-07-09 | 西安电子科技大学 | Online single classification active machine learning method for information system intrusion detection |
CN110046230A (en) * | 2018-12-18 | 2019-07-23 | 阿里巴巴集团控股有限公司 | Generate the method for recommending words art set, the method and apparatus for recommending words art |
CN110377721A (en) * | 2019-07-26 | 2019-10-25 | 京东方科技集团股份有限公司 | Automatic question-answering method, device, storage medium and electronic equipment |
CN110442689A (en) * | 2019-06-25 | 2019-11-12 | 平安科技(深圳)有限公司 | A kind of question and answer relationship sort method, device, computer equipment and storage medium |
CN111095234A (en) * | 2017-09-15 | 2020-05-01 | 国际商业机器公司 | Training data update |
CN111125387A (en) * | 2019-12-12 | 2020-05-08 | 科大讯飞股份有限公司 | Multimedia list generation and naming method and device, electronic equipment and storage medium |
CN111259918A (en) * | 2018-11-30 | 2020-06-09 | 重庆小雨点小额贷款有限公司 | Method and device for labeling intention label, server and storage medium |
CN111340218A (en) * | 2020-02-24 | 2020-06-26 | 支付宝(杭州)信息技术有限公司 | Method and system for training problem recognition model |
CN111382264A (en) * | 2018-12-27 | 2020-07-07 | 阿里巴巴集团控股有限公司 | Session quality evaluation method and device and electronic equipment |
CN111914062A (en) * | 2020-07-13 | 2020-11-10 | 上海乐言信息科技有限公司 | Long text question-answer pair generation system based on keywords |
US10878197B2 (en) | 2018-11-27 | 2020-12-29 | International Business Machines Corporation | Self-learning user interface with image-processed QA-pair corpus |
CN114490965A (en) * | 2021-12-23 | 2022-05-13 | 北京百度网讯科技有限公司 | Question processing method and device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101566998A (en) * | 2009-05-26 | 2009-10-28 | 华中师范大学 | Chinese question-answering system based on neural network |
CN102681992A (en) * | 2011-03-07 | 2012-09-19 | 腾讯科技(深圳)有限公司 | Method and system for data hierarchy |
CN103577557A (en) * | 2013-10-21 | 2014-02-12 | 北京奇虎科技有限公司 | Device and method for determining capturing frequency of network resource point |
-
2016
- 2016-12-29 CN CN201611249261.2A patent/CN106844530A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101566998A (en) * | 2009-05-26 | 2009-10-28 | 华中师范大学 | Chinese question-answering system based on neural network |
CN102681992A (en) * | 2011-03-07 | 2012-09-19 | 腾讯科技(深圳)有限公司 | Method and system for data hierarchy |
CN103577557A (en) * | 2013-10-21 | 2014-02-12 | 北京奇虎科技有限公司 | Device and method for determining capturing frequency of network resource point |
Non-Patent Citations (1)
Title |
---|
崔敏君: ""多特征层次化答案质量评价方法研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111095234A (en) * | 2017-09-15 | 2020-05-01 | 国际商业机器公司 | Training data update |
CN108932289B (en) * | 2018-05-23 | 2021-10-15 | 北京华健蓝海医疗科技有限责任公司 | Question answer processing method and system based on information extraction and deep learning |
CN108932289A (en) * | 2018-05-23 | 2018-12-04 | 北京华建蓝海科技有限责任公司 | One kind being based on the problem of information extraction and deep learning answer treatment method and system |
CN109308319A (en) * | 2018-08-21 | 2019-02-05 | 深圳中兴网信科技有限公司 | File classification method, document sorting apparatus and computer readable storage medium |
CN109472305A (en) * | 2018-10-31 | 2019-03-15 | 国信优易数据有限公司 | Answer quality determines model training method, answer quality determination method and device |
CN109543153A (en) * | 2018-11-13 | 2019-03-29 | 成都数联铭品科技有限公司 | A kind of sequence labelling system and method |
CN109543153B (en) * | 2018-11-13 | 2023-08-18 | 成都数联铭品科技有限公司 | Sequence labeling system and method |
US10878197B2 (en) | 2018-11-27 | 2020-12-29 | International Business Machines Corporation | Self-learning user interface with image-processed QA-pair corpus |
CN111259918B (en) * | 2018-11-30 | 2023-06-20 | 重庆小雨点小额贷款有限公司 | Method and device for labeling intention labels, server and storage medium |
CN111259918A (en) * | 2018-11-30 | 2020-06-09 | 重庆小雨点小额贷款有限公司 | Method and device for labeling intention label, server and storage medium |
CN109783617A (en) * | 2018-12-11 | 2019-05-21 | 平安科技(深圳)有限公司 | For replying model training method, device, equipment and the storage medium of problem |
CN109783617B (en) * | 2018-12-11 | 2024-01-26 | 平安科技(深圳)有限公司 | Model training method, device, equipment and storage medium for replying to questions |
CN109635088A (en) * | 2018-12-13 | 2019-04-16 | 深圳市思迪信息技术股份有限公司 | The training method and device of robot long article notebook data chat |
CN110046230B (en) * | 2018-12-18 | 2023-06-23 | 创新先进技术有限公司 | Method for generating recommended speaking collection, and recommended speaking method and device |
CN110046230A (en) * | 2018-12-18 | 2019-07-23 | 阿里巴巴集团控股有限公司 | Generate the method for recommending words art set, the method and apparatus for recommending words art |
CN111382264A (en) * | 2018-12-27 | 2020-07-07 | 阿里巴巴集团控股有限公司 | Session quality evaluation method and device and electronic equipment |
CN111382264B (en) * | 2018-12-27 | 2023-06-09 | 阿里巴巴集团控股有限公司 | Session quality evaluation method and device and electronic equipment |
CN109840274A (en) * | 2018-12-28 | 2019-06-04 | 北京百度网讯科技有限公司 | Data processing method and device, storage medium |
CN109840274B (en) * | 2018-12-28 | 2021-11-30 | 北京百度网讯科技有限公司 | Data processing method and device and storage medium |
CN109995756B (en) * | 2019-02-26 | 2022-02-01 | 西安电子科技大学 | Online single-classification active machine learning method for information system intrusion detection |
CN109995756A (en) * | 2019-02-26 | 2019-07-09 | 西安电子科技大学 | Online single classification active machine learning method for information system intrusion detection |
CN110442689A (en) * | 2019-06-25 | 2019-11-12 | 平安科技(深圳)有限公司 | A kind of question and answer relationship sort method, device, computer equipment and storage medium |
US11475068B2 (en) | 2019-07-26 | 2022-10-18 | Beijing Boe Technology Development Co., Ltd. | Automatic question answering method and apparatus, storage medium and server |
CN110377721A (en) * | 2019-07-26 | 2019-10-25 | 京东方科技集团股份有限公司 | Automatic question-answering method, device, storage medium and electronic equipment |
CN111125387A (en) * | 2019-12-12 | 2020-05-08 | 科大讯飞股份有限公司 | Multimedia list generation and naming method and device, electronic equipment and storage medium |
CN111340218A (en) * | 2020-02-24 | 2020-06-26 | 支付宝(杭州)信息技术有限公司 | Method and system for training problem recognition model |
CN111914062A (en) * | 2020-07-13 | 2020-11-10 | 上海乐言信息科技有限公司 | Long text question-answer pair generation system based on keywords |
CN114490965B (en) * | 2021-12-23 | 2022-11-08 | 北京百度网讯科技有限公司 | Question processing method and device, electronic equipment and storage medium |
CN114490965A (en) * | 2021-12-23 | 2022-05-13 | 北京百度网讯科技有限公司 | Question processing method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106844530A (en) | Training method and device of a kind of question and answer to disaggregated model | |
CN103425635B (en) | Method and apparatus are recommended in a kind of answer | |
US8346701B2 (en) | Answer ranking in community question-answering sites | |
CN108073568A (en) | keyword extracting method and device | |
CN105808590B (en) | Search engine implementation method, searching method and device | |
CN106294783A (en) | A kind of video recommendation method and device | |
CN105045875B (en) | Personalized search and device | |
Singh et al. | Sentiment analysis of textual reviews; Evaluating machine learning, unsupervised and SentiWordNet approaches | |
CN108182279A (en) | Object classification method, device and computer equipment based on text feature | |
CN106547871A (en) | Method and apparatus is recalled based on the Search Results of neutral net | |
KR20160055930A (en) | Systems and methods for actively composing content for use in continuous social communication | |
CN110364146A (en) | Audio recognition method, device, speech recognition apparatus and storage medium | |
WO2020135642A1 (en) | Model training method and apparatus employing generative adversarial network | |
CN105117398A (en) | Software development problem automatic answering method based on crowdsourcing | |
CN108322317A (en) | A kind of account identification correlating method and server | |
CN102637179B (en) | Method and device for determining lexical item weighting functions and searching based on functions | |
CN110134792A (en) | Text recognition method, device, electronic equipment and storage medium | |
Lin et al. | Learning comment generation by leveraging user-generated data | |
CN109145083A (en) | A kind of candidate answers choosing method based on deep learning | |
Leopairote et al. | Software quality in use characteristic mining from customer reviews | |
Arai et al. | Predicting quality of answer in collaborative Q/A community | |
KR101621735B1 (en) | Recommended search word providing method and system | |
Kane et al. | Do the communities we choose shape our political beliefs? A study of the politicization of topics in online social groups | |
CN106997340A (en) | The generation of dictionary and the Document Classification Method and device using dictionary | |
CN110633410A (en) | Information processing method and device, storage medium, and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170613 |
|
RJ01 | Rejection of invention patent application after publication |