CN108960574A - Quality determination method, device, server and the storage medium of question and answer - Google Patents

Quality determination method, device, server and the storage medium of question and answer Download PDF

Info

Publication number
CN108960574A
CN108960574A CN201810580409.3A CN201810580409A CN108960574A CN 108960574 A CN108960574 A CN 108960574A CN 201810580409 A CN201810580409 A CN 201810580409A CN 108960574 A CN108960574 A CN 108960574A
Authority
CN
China
Prior art keywords
answer
content
vector
question
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810580409.3A
Other languages
Chinese (zh)
Inventor
姚后清
孟子扬
吴广发
田彤
施鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810580409.3A priority Critical patent/CN108960574A/en
Publication of CN108960574A publication Critical patent/CN108960574A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management

Abstract

The embodiment of the invention discloses a kind of quality determination method of question and answer, device, server and storage mediums.Wherein, this method comprises: determining that the vector of problem content in question and answer data indicates and the vector of answer content indicates;It is indicated to the vector of the question and answer quality analysis mode input described problem content constructed in advance and the vector of the answer content indicates, obtain the qualitative data of the question and answer data.More effective question and answer quality score may be implemented in the embodiment of the present invention, solves the problems, such as that user experience effect is poor due to the answer of a large amount of low-qualitys is demonstrated in the prior art, improves the accuracy rate of the quality score of question and answer.

Description

Quality determination method, device, server and the storage medium of question and answer
Technical field
The present embodiments relate to field of computer technology more particularly to a kind of quality determination methods of question and answer, device, clothes Business device and storage medium.
Background technique
It is constantly progressive with the development of science and technology with Internet technology, interactive knowledge question sharing platform based on search Have become people's life and neutralizes a kind of important channel for obtaining in work and sharing knowledge.
User oneself targetedly proposes that problem, other users solve the problems, such as.Meanwhile the answer of these problems again can be into One step is supplied to other users for having similar query as search result, achievees the effect that share knowledge.It is asked to provide knowledge The sharing effect for answering sharing platform needs to carry out quality analysis to knowledge question, rejects the answer data of low-quality, promoted high-quality time That answers shows ratio.
Currently, to adopt, (including quizmaster adopts, machine is adopted and manages mostly for the question and answer production of knowledge question sharing platform Reason person adopts) as finally push shows foundation and state is permanent.Due to quizmaster itself drawback, user's cheating, machine The reasons such as device accuracy rate and timeliness cause a large amount of low-quality answers to be demonstrated, seriously affect the search experience of user.
Summary of the invention
The embodiment of the invention provides a kind of quality determination method of question and answer, device, server and storage medium, Ke Yishi Now more effective question and answer quality score.
In a first aspect, the embodiment of the invention provides a kind of quality determination methods of question and answer, comprising:
Determine that the vector of problem content in question and answer data indicates and the vector of answer content indicates;
It is indicated and the answer content to the vector of the question and answer quality analysis mode input described problem content constructed in advance Vector indicate, obtain the qualitative data of the question and answer data.
Second aspect, the embodiment of the invention also provides a kind of quality determining device of question and answer, which includes:
Vector module, for determining that the vector of problem content in question and answer data indicates and the vector of answer content indicates;
Quality module, for the vector of the question and answer quality analysis mode input described problem content constructed in advance indicate and The vector of the answer content indicates, obtains the qualitative data of the question and answer data.
The third aspect, the embodiment of the invention also provides a kind of server, the server includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes the quality determination method of question and answer as described above.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer Program, the program realize the quality determination method of question and answer as described above when being executed by processor.
The vector of problem content into the question and answer quality analysis mode input question and answer data constructed in advance of the embodiment of the present invention It indicates and the vector of answer content indicates, obtain the qualitative data of question and answer data.Since question and answer quality analysis model is to instruct in advance It perfects, more effective question and answer quality score may be implemented, solve and used due to the answer of a large amount of low-qualitys is demonstrated in the prior art The problem of family experience effect difference, improves the accuracy rate of the quality score of question and answer.
Detailed description of the invention
Fig. 1 is the flow chart of the quality determination method of the question and answer in the embodiment of the present invention one;
Fig. 2 is the schematic diagram to grade sequence learning model in the embodiment of the present invention one;
Fig. 3 is the flow chart of the quality determination method of the question and answer in the embodiment of the present invention two;
Fig. 4 is the schematic diagram of the correlation analysis in the embodiment of the present invention two;
Fig. 5 is the acquisition flow chart of the clustering cluster in the embodiment of the present invention two;
Fig. 6 is the determination flow chart of the polymerization similarity in the embodiment of the present invention two;
Fig. 7 is the schematic diagram of the behavior feedback model in the embodiment of the present invention two;
Fig. 8 is the overall flow schematic diagram that the quality of the question and answer in the embodiment of the present invention three determines;
Fig. 9 is the structural schematic diagram of the quality determining device of the question and answer in the embodiment of the present invention four;
Figure 10 is the structural schematic diagram of the server in the embodiment of the present invention five.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is the flow chart of the quality determination method of the question and answer in the embodiment of the present invention one, and the present embodiment is applicable to reality The situation that the quality of existing question and answer determines, this method can be executed by the quality determining device of question and answer, which can use software And/or the mode of hardware is realized, for example, the device is configured in server.As shown in Figure 1, this method can specifically include:
S110, determine that the vector of problem content in question and answer data indicates and the vector of answer content indicates.
The question and answer data in knowledge question sharing platform are obtained, knowledge question sharing platform all has magnanimity mostly at present Stock of knowledge and powerful user's original content (User Generated Content, UGC) production capacity can make it quickly Cover the knowledge requirement in each field.
Optionally it is determined that the vector of problem content indicates in question and answer data, comprising: to Recognition with Recurrent Neural Network trained in advance The vector of each word indicates in mode input problem content, and the vector for obtaining problem content indicates.
Since the problems in question and answer data content is generally shorter, the Recognition with Recurrent Neural Network model in the present embodiment is preferably wide Adopted recurrent neural networks (General Regression Neural Network, GRNN), can be extracted in problem by GRNN Positional relationship between the sequential structure of appearance, core knowledge point and word, enriches problem characteristic.
Optionally it is determined that the vector of answer content indicates, comprising: returned to convolutional neural networks mode input trained in advance The vector for answering each word in content indicates that the vector for obtaining answer content indicates.
It include more knowledge point, this implementation since answer content is generally long for the answer content in question and answer data The pass in answer content can be extracted in example using convolutional neural networks (Convolutional Neural Network, CNN) Vector is answered in key knowledge point, compression.
S120, to the vector of the question and answer quality analysis mode input described problem content constructed in advance indicate and the answer The vector of content indicates, obtains the qualitative data of the question and answer data.
After the vector of problem content in question and answer data indicates and the vector of answer content indicates to determine, it can will ask The vector of topic content indicates and the vector of answer content indicates to input in the question and answer quality analysis model constructed in advance, obtains question and answer The qualitative data of data.
Optionally, the building of the question and answer quality analysis model, comprising: determine that the vector of sample problem content indicates;Really The vector of first answer content of the fixed sample problem content indicates and the second answer content of the sample problem content Vector indicate;The vector of the sample problem content is indicated, in the vector expression and the second answer of the first answer content The vector table of appearance is shown as the input to grade sequence learning model, the row of first answer content and second answer content Sequence result is trained as the output to grade sequence learning model, obtains question and answer quality analysis model.
Question and answer quality analysis model can be based on to grade sequence learning model (i.e. Pairwise model) structure in the present embodiment It builds, Fig. 2 is the schematic diagram to grade sequence learning model in the embodiment of the present invention one.As shown in Fig. 2, by training corpus Sample problem content is input in the embeding layer in figure (Embedding) in sample question and answer data, passes through preparatory trained GNN Model and average pondization indicate that the vector for obtaining sample problem content indicates;The sample problem content corresponding first is answered Content and the second answer content are separately input to the embeding layer in figure, are indicated by filter, CNN model and maximum pondization, point The vector for not obtaining the first answer content and the second answer content indicates;The vector of sample problem content is indicated, first answers The vector of content indicates and the vector table of the second answer content is shown as passing through splicing to the input of grade sequence learning model Layer (Concat Layer), full articulamentum (Full Collect Layer) and activation primitive (Tanh Layer) obtain sequence knot Fruit, and compared with the ranking results manually marked, to realize the training to grade sequence learning model.
Illustratively, two answers corresponding with the problem of a problem can be inputted to trained grade sequence study mould In type, the result of output is that one of answer quality in two answers is answered greater than another.
Question and answer quality analysis model is based on the available different answer content of ranking results to grade sequence learning model Quality score, as qualitative data.Optionally, the quality score grade of answer content is divided from need satisfaction angle, Can be divided into it is selected answer, high quality is answered, it is common answer, low quality is answered and five big grades of cheating, satisfaction degree according to It is secondary to successively decrease.For example, the most preceding answer content of ranking results is selected answer, which can completely meet user's need It asks, has the knowledge extending and authority, and have good reading experience.
Above-mentioned training corpus can be obtained by the question and answer data in knowledge question sharing platform, due to absolute high quality The more difficult building of corpus, the sequence of sample question and answer and its answer content can be obtained in the present embodiment using Pairwise method As a result corpus is constituted.I.e. answer content comparison angle selected, such as A answer answered than B it is good, then quality-ordered be A it is big In B.This mode can obtain a large amount of corpus, including it is long answer to it is long answer, long answer is answered, short is answered to growing back to short Answer, it is short answer to it is short answer and related answers to it is uncorrelated answer etc..Quality, correlation, length can be considered in the generation of corpus With user behavior etc., in the case where answering relevant situation, answers longer, sort more forward;Considered based on length, it is short time related The sorting position answered is before uncorrelated long answer.Illustratively, corpus A can be a problem and its answer, answer sequence It is followed successively by high-quality answer, adopts answer, common answer and random high-quality answer, answer and sort or thumb up sequence permutation.
The technical solution of the present embodiment, the problem content into the question and answer quality analysis mode input question and answer data constructed in advance Vector indicate and answer content vector indicate, obtain the qualitative data of question and answer data.Since question and answer quality analysis model is Be in advance based on to grade sequence learning model it is trained, can be realized more on the basis of comparing and sorting to answer content Effective question and answer quality score, solves in the prior art that user experience effect difference is asked because a large amount of low-qualitys are answered and are demonstrated due to Topic, improves the accuracy rate of the quality score of question and answer.
Based on the above technical solution, optionally, the method also includes: determine in the question and answer data in problem Hold the degree of correlation with answer content;Qualitative data and the degree of correlation according to the question and answer data, to the question and answer data Qualitative data be modified.
Optionally it is determined that in the question and answer data problem content and answer content the degree of correlation, comprising: according to problem content In crucial Word similarity between the keyword for including and the keyword for including in answer content, problem content generic with return The polymerization similarity between content generic is answered, the label between the label of problem content and the label of answer content is similar At least one of in Topic Similarity between the theme of the theme and answer content of degree and problem content, it determines in problem Hold the degree of correlation with answer content.
Optionally, the determination for polymerizeing similarity between problem content generic and answer content generic, comprising: Determine the term vector for each word for including in corpus;The term vector of each word is obtained gathering belonging to each word as clustering processing Class cluster;Problem content institute is obtained according to clustering cluster belonging to word in clustering cluster belonging to word in problem content and answer content Belong to and polymerize similarity between classification and answer content generic.
Optionally, the method also includes: obtain the user feedback behavioral data of answer content, and feedback behavioral data The credit rank of owning user;According to the feedback behavioral data and the credit rank to the qualitative data of answer content into Row amendment.
Embodiment two
Fig. 3 is the flow chart of the quality determination method of the question and answer in the embodiment of the present invention two.The present embodiment is in above-mentioned implementation On the basis of example, the quality determination method of above-mentioned question and answer has been advanced optimized.Correspondingly, the method for the present embodiment specifically includes:
S210, determine that the vector of problem content in question and answer data indicates and the vector of answer content indicates.
The question and answer data in knowledge question sharing platform are obtained, to Recognition with Recurrent Neural Network mode input problem trained in advance The vector of each word indicates in content, and the vector for obtaining problem content indicates;It is defeated to convolutional neural networks model trained in advance The vector for entering each word in answer content indicates that the vector for obtaining answer content indicates.
S220, to the vector of the question and answer quality analysis mode input described problem content constructed in advance indicate and the answer The vector of content indicates, obtains the qualitative data of the question and answer data.
After the vector of problem content in question and answer data indicates and the vector of answer content indicates to determine, it can will ask The vector of topic content indicates and the vector of answer content indicates to input in the question and answer quality analysis model constructed in advance, obtains question and answer The qualitative data of data.Wherein, question and answer quality analysis model can be based on to grade sequence learning model (i.e. Pairwise model) Building.
S230, the degree of correlation for determining problem content and answer content in the question and answer data.
Wherein it is determined that in the question and answer data problem content and answer content the degree of correlation, as to question and answer data carry out Various dimensions ground correlation analysis.Fig. 4 is the schematic diagram of the correlation analysis in the embodiment of the present invention two, as shown in figure 4, correlation Analysis can be divided into semantic dependency and knowledge point covering, may include polymerization similarity, LDA (Latent in semantic dependency Dirichlet Allocation) Topic Similarity and Keywords matching (i.e. crucial Word similarity), knowledge point covering can wrap Include label similarity and a kind of covering of (text classifier) knowledge point Fasttext.
Specifically, it is determined that in the question and answer data problem content and answer content the degree of correlation, may include: according to problem Crucial Word similarity between the keyword for including in content and the keyword for including in answer content, problem content generic It polymerize similarity, the label phase between the label of problem content and the label of answer content between answer content generic Like the theme and answer content of degree and problem content theme between Topic Similarity at least one of, determine problem The degree of correlation of content and answer content.
Crucial Word similarity between the keyword for including in problem content and the keyword for including in answer content is really It is fixed, it may include: Jie Kade (Jaccard) similarity of determining keyword, the main keyword including in computational problem content The hit frequency in answer content and its problem content accounting;Calculate binary (Bigram) algorithm based on individual character The Jaccard similarity of Jaccard similarity and ternary (Trigram) algorithm based on individual character, different to the similar of individual character Degree calculates, and can eliminate the problem of participle error tape.
The tag system for determining the label similarity between the label of problem content and the label of answer content is about 2500 The vector of dimension can be obtained using depth supervised learning model training.Between the theme of problem content and the theme of answer content Topic Similarity determination, may include: using natural language processing (Natural Language Processing, NLP) In LDA topic model obtain the theme vector of problem content and answer content, and calculate LDA Topic Similarity.
Optionally, the determination for polymerizeing similarity between problem content generic and answer content generic, comprising: Determine the term vector for each word for including in corpus;The term vector of each word is obtained gathering belonging to each word as clustering processing Class cluster;Problem content institute is obtained according to clustering cluster belonging to word in clustering cluster belonging to word in problem content and answer content Belong to and polymerize similarity between classification and answer content generic.
Specifically, each word in corpus can be based on the semantic analysis of Word2vec term vector model, will be with semantic Or semantic relevant word is aggregated in same cluster, detailed process is referring to Fig. 5.Fig. 5 is the clustering cluster in the embodiment of the present invention two Acquisition flow chart, the acquisition of clustering cluster may include: in knowledge based question sharing platform question and answer data building The training corpus of Word2vec term vector model, and keyword is carried out to word segmentation result and glues word (repairing participle mistake);It will instruction Practice and term vector model is trained in corpus input word vector model, to obtain trained term vector model;Based on instruction The term vector model perfected can determine the term vector of each word in corpus, and make K mean value (K- to the term vector of each word Means) clustering processing obtains clustering cluster belonging to each word;It can be based on bag of words to the word of each word in different clustering clusters Vector is handled;Terminate.
According to the acquisition process of clustering cluster shown in fig. 5, in available problem content clustering cluster belonging to word with return Clustering cluster belonging to word in content is answered, to obtain polymerizeing between problem content generic and answer content generic Similarity, detailed process are as shown in Figure 6.Fig. 6 is the determination flow chart of the polymerization similarity in the embodiment of the present invention two, based on asking Inscribe in content that clustering cluster belonging to word constructs aggregated vector in clustering cluster and answer content belonging to word, and calculate polymerization to The Pearson correlation coefficient of amount, the match condition of available problem content and answer content in clustering cluster is to get the problem of arriving It polymerize similarity between content generic and answer content generic.
S240, the qualitative data according to the question and answer data and the degree of correlation, to the mass number of the question and answer data According to being modified.
It determines in question and answer data after the degree of correlation of problem content and answer content, it can be according to the degree of correlation to question and answer number According to qualitative data be modified, solve to be based only on caused by semantic analysis and error occur to the quality score of question and answer data The problem of.
S250, the user feedback behavioral data for obtaining answer content, and the credit grade of feedback behavioral data owning user Not.
Wherein, feedback behavioral data may include browsing behavior data and answer behavioral data.These two types of behavioral datas pair The behavior answered may include explicit behavior and implicit behavior, for example, share, praise, stepping on, reporting, quizmaster adopts, answer error correction and It clicks the behaviors such as relevant issues and belongs to explicit behavior, comment answers, click more answer, page turning, chain in click, finds answering newly Case checks that the behaviors such as best answers user and click relevant knowledge belong to implicit behavior.Due to explicit behavior quality relatively Height generally obtains explicit behavioral data, but explicit behavioral data can not solve covering problem than sparse, therefore increase hidden Coverage rate can be improved in formula behavioral data.
The credit rank of user can be drawn a portrait according to user in industry, education level, rank, historical behavior data and Weight in historical usage is given a mark to determine, the behavior confidence level of the user of different credit ranks is different.
S260, the qualitative data of answer content is repaired according to the feedback behavioral data and the credit rank Just.
Obtain answer content user feedback behavioral data, and feedback behavioral data owning user credit rank it Afterwards, it can establish behavior feedback model, and Behavior-based control feedback model is modified the qualitative data of answer content.Fig. 7 is The schematic diagram of behavior feedback model in the embodiment of the present invention two, according to the browsing behavior number of the user of the answer content got Behavior feedback is modeled according to, the credit rank of answering behavioral data and user, that is, feature when extracting long, short-time characteristic and when Sequence characteristics, and feature normalizing is carried out, on-time model and off-line model are obtained, to obtain final behavior feedback model.
The technical solution of the present embodiment, the problem content into the question and answer quality analysis mode input question and answer data constructed in advance Vector indicate and the vector of answer content indicates, obtain the qualitative data of question and answer data, and according to determining problem content with The degree of correlation of answer content is modified the qualitative data of question and answer data and according to the user feedback of answer content got Behavioral data and corresponding credit rank are modified the qualitative data of answer content.Since question and answer quality analysis model is Be in advance based on to grade sequence learning model it is trained, can be realized more on the basis of comparing and sorting to answer content Effective question and answer quality score, solves in the prior art that user experience effect difference is asked because a large amount of low-qualitys are answered and are demonstrated due to Topic, also, quality is modified according to the various dimensions feedback data of various dimensions correlation and user, further improve question and answer Quality score accuracy rate.
Embodiment three
Fig. 8 is the overall flow schematic diagram that the quality of the question and answer in the embodiment of the present invention three determines.The present embodiment can more than Based on stating embodiment, the overall flow determined to the quality of question and answer is further detailed.This method can specifically include:
S310, demand analysis.
In the present embodiment, demand analysis first can be carried out to problem before the quality of question and answer determines, which can To include that signature analysis and demand understand.Specifically, segmenting to problem, matter is extracted based on the model constructed in advance respectively The features such as amount, cheating and knowledge point;And the demand type of refinement problem is combed, for example whether needing depth question and answer, whether belonging to It is open to discuss, whether belong to certain special dimension demands and whether practise fraud.
S320, the quality of question and answer are determining.
Specifically, the quality determination of question and answer may include rule-based analysis, mass of foundation analysis, correlation analysis and row For feedback.Rule-based analysis may include: using rule (such as replylen, whether comprising abusing word, colloquial style and low-quality Amount mode etc.) identification low quality answer;Text rule (such as text size and keyword incidence relation etc.) can be combined simultaneously Tentatively estimate whether meet the needs of in S310, and corresponding weight is set for subsequent quality analysis.
Mass of foundation analysis may include: into the question and answer quality analysis mode input question and answer data constructed in advance in problem The vector of appearance indicates and the vector of answer content indicates, obtains the qualitative data of question and answer data.
Correlation analysis may include: the degree of correlation according to the problem content and answer content that determine, to question and answer data Qualitative data is modified.
The user feedback behavioral data that may include: the answer content that foundation is got and corresponding credit are fed back in behavior Rank is modified the qualitative data of answer content.
The quality amendment of S330, question and answer.
After the quality of question and answer determines, the quality of question and answer can also be modified with multi-angle, may include being based on readding Experience, authoritative analysis and timeliness analysis is read to be modified.Specifically, typesetting and Rich Media based on question and answer are from can be readability right The quality of question and answer is modified;Carrying out question and answer based on the authority (such as doctor and programmer) and high-order user for answering user Quality amendment;Timeliness identification is carried out to question and answer data, and carries out the quality amendment of question and answer based on timeliness.
S340, question and answer quality.
It is final to determine that question and answer quality, quality score grade be divided from need satisfaction angle, be divided into selected time It answers, high quality is answered, common answer, low quality is answered and five big grades of cheating, satisfaction degree successively successively decrease.
The technical solution of the present embodiment, question and answer quality determine before to problem carry out demand analysis, and it is rule-based, Text quality, correlation and behavior feedback are determined and correct to the quality of question and answer, are also based on reading experience, authority and timeliness Property analysis the quality of question and answer is further corrected.The present embodiment realizes the quality score method of the question and answer data of system, By correcting the accuracy rate for further improving quality and determining with multi-angle, can be pushed back as answer platform to other platforms Answer the foundation of content.
Example IV
Fig. 9 is the structural schematic diagram of the quality determining device of the question and answer in the embodiment of the present invention four, and the present embodiment is applicable In the situation for realizing that the quality of question and answer determines.The executable present invention of the quality determining device of question and answer provided by the embodiment of the present invention The quality determination method of question and answer provided by any embodiment has the corresponding functional module of execution method and beneficial effect.Such as Shown in Fig. 9, which is specifically included:
Vector module 410, for determining vector expression and the vector table of answer content of problem content in question and answer data Show;
Quality module 420, for the vector table to the question and answer quality analysis mode input described problem content constructed in advance Showing indicates with the vector of the answer content, obtains the qualitative data of the question and answer data.
Optionally, which further includes building module, and the building module is used to construct the question and answer quality analysis model, Include:
Determine that the vector of sample problem content indicates;
Determine the vector of the first answer content of the sample problem content indicate and the sample problem content the The vector of two answer contents indicates;
The vector of the sample problem content is indicated, the vector of the first answer content indicates and the second answer content Vector table be shown as to grade sort learning model input, the sequence of first answer content and second answer content As a result it is trained as the output to grade sequence learning model, obtains question and answer quality analysis model.
Optionally, vector module 410 is specifically used for:
Into Recognition with Recurrent Neural Network mode input problem content trained in advance, the vector of each word is indicated, is obtained in problem The vector of appearance indicates.
Optionally, vector module 410 is also used to:
Into convolutional neural networks mode input answer content trained in advance, the vector of each word is indicated, is obtained in answer The vector of appearance indicates.
Optionally, which further includes degree of correlation module, and the degree of correlation module includes:
Degree of correlation determination unit, for determining the degree of correlation of problem content and answer content in the question and answer data;
Degree of correlation amending unit, for according to the question and answer data qualitative data and the degree of correlation, ask described The qualitative data of answer evidence is modified.
Optionally, degree of correlation determination unit is specifically used for:
Crucial Word similarity in foundation problem content between the keyword for including and the keyword for including in answer content, It polymerize similarity between problem content generic and answer content generic, the label of problem content and answer content In the Topic Similarity between the theme of label similarity and problem content and the theme of answer content between label extremely One item missing determines the degree of correlation of problem content and answer content.
Optionally, degree of correlation determination unit is also used to:
Determine the term vector for each word for including in corpus;
Clustering cluster belonging to each word is obtained as clustering processing to the term vector of each word;
It is obtained in problem according to clustering cluster belonging to word in clustering cluster belonging to word in problem content and answer content Hold and polymerize similarity between generic and answer content generic.
Optionally, which further includes feedback modifiers module, and the feedback modifiers module is used for:
Obtain the user feedback behavioral data of answer content, and the credit rank of feedback behavioral data owning user;
The qualitative data of answer content is modified according to the feedback behavioral data and the credit rank.
The technical solution of the present embodiment, the problem content into the question and answer quality analysis mode input question and answer data constructed in advance Vector indicate and answer content vector indicate, obtain the qualitative data of question and answer data.Since question and answer quality analysis model is Be in advance based on to grade sequence learning model it is trained, can be realized more on the basis of comparing and sorting to answer content Effective question and answer quality score, solves in the prior art that user experience effect difference is asked because a large amount of low-qualitys are answered and are demonstrated due to Topic, improves the accuracy rate of the quality score of question and answer.
Embodiment five
Figure 10 is the structural schematic diagram of the server in the embodiment of the present invention five.Figure 10, which is shown, to be suitable for being used to realizing this hair The block diagram of the exemplary servers 512 of bright embodiment.The server 512 that Figure 10 is shown is only an example, should not be to this The function and use scope of inventive embodiments bring any restrictions.
As shown in Figure 10, server 512 is showed in the form of generic server.The component of server 512 may include but Be not limited to: one or more processor 516, storage device 528 connect different system components (including storage device 528 and place Manage device 516) bus 518.
Bus 518 indicates one of a few class bus structures or a variety of, including storage device bus or storage device control Device processed, peripheral bus, graphics acceleration port, processor or total using the local of any bus structures in a variety of bus structures Line.For example, these architectures include but is not limited to industry standard architecture (Industry Subversive Alliance, ISA) bus, microchannel architecture (Micro Channel Architecture, MAC) bus is enhanced Isa bus, Video Electronics Standards Association (Video Electronics Standards Association, VESA) local are total Line and peripheral component interconnection (Peripheral Component Interconnect, PCI) bus.
Server 512 typically comprises a variety of computer system readable media.These media can be it is any being capable of bedding and clothing The usable medium that business device 512 accesses, including volatile and non-volatile media, moveable and immovable medium.
Storage device 528 may include the computer system readable media of form of volatile memory, such as arbitrary access Memory (Random Access Memory, RAM) 530 and/or cache memory 532.Server 512 can be further Including other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example, storage system System 534 can be used for reading and writing immovable, non-volatile magnetic media (Figure 10 do not show, commonly referred to as " hard disk drive "). Although being not shown in Figure 10, the disk drive for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided Device, and to removable anonvolatile optical disk, such as CD-ROM (Compact Disc Read-Only Memory, CD- ROM), digital video disk (Digital Video Disc-Read Only Memory, DVD-ROM) or other optical mediums) read-write CD drive.In these cases, each driver can pass through one or more data media interfaces and bus 518 It is connected.Storage device 528 may include at least one program product, which has one group of (for example, at least one) program Module, these program modules are configured to perform the function of various embodiments of the present invention.
Program/utility 540 with one group of (at least one) program module 542 can store in such as storage dress It sets in 528, such program module 542 includes but is not limited to operating system, one or more application program, other program moulds It may include the realization of network environment in block and program data, each of these examples or certain combination.Program module 542 usually execute function and/or method in embodiment described in the invention.
Server 512 can also be with one or more external equipments 514 (such as keyboard, direction terminal, display 524 etc.) Communication, can also be enabled a user to one or more terminal interact with the server 512 communicate, and/or with make the clothes Any terminal (such as network interface card, modem etc.) that business device 512 can be communicated with one or more of the other computing terminal Communication.This communication can be carried out by input/output (I/O) interface 522.Also, server 512 can also be suitable by network Orchestration 520 and one or more network (such as local area network (Local Area Network, LAN), wide area network (Wide Area Network, WAN) and/or public network, such as internet) communication.As shown in Figure 10, network adapter 520 passes through bus 518 It is communicated with other modules of server 512.It should be understood that although not shown in the drawings, can be used in conjunction with server 512 other hard Part and/or software module, including but not limited to: microcode, terminal driver, redundant processor, external disk drive array, magnetic Disk array (Redundant Arrays of Independent Disks, RAID) system, tape drive and data backup Storage system etc..
The program that processor 516 is stored in storage device 528 by operation, thereby executing various function application and number According to processing, such as realize the quality determination method of question and answer provided by the embodiment of the present invention, this method comprises:
Determine that the vector of problem content in question and answer data indicates and the vector of answer content indicates;
It is indicated and the answer content to the vector of the question and answer quality analysis mode input described problem content constructed in advance Vector indicate, obtain the qualitative data of the question and answer data.
Embodiment six
The embodiment of the present invention six additionally provides a kind of computer readable storage medium, is stored thereon with computer program, should The quality determination method of the question and answer as provided by the embodiment of the present invention is realized when program is executed by processor, this method comprises:
Determine that the vector of problem content in question and answer data indicates and the vector of answer content indicates;
It is indicated and the answer content to the vector of the question and answer quality analysis mode input described problem content constructed in advance Vector indicate, obtain the qualitative data of the question and answer data.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device Using or it is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on remote computer or terminal completely on the remote computer on the user computer.It is relating to And in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or extensively Domain net (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as provided using Internet service Quotient is connected by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims (18)

1. a kind of quality determination method of question and answer characterized by comprising
Determine that the vector of problem content in question and answer data indicates and the vector of answer content indicates;
To the vector of the question and answer quality analysis mode input described problem content constructed in advance indicate and the answer content to Amount indicates, obtains the qualitative data of the question and answer data.
2. the method according to claim 1, wherein the building of the question and answer quality analysis model, comprising:
Determine that the vector of sample problem content indicates;
Determine the vector of the first answer content of the sample problem content indicates and second time of the sample problem content The vector for answering content indicates;
The vector of the sample problem content is indicated, the vector of the first answer content indicate and the second answer content to Scale is shown as the input to grade sequence learning model, the ranking results of first answer content and second answer content It is trained as the output to grade sequence learning model, obtains question and answer quality analysis model.
3. being wrapped the method according to claim 1, wherein determining that the vector of problem content in question and answer data indicates It includes:
Into Recognition with Recurrent Neural Network mode input problem content trained in advance, the vector of each word is indicated, obtains problem content Vector indicates.
4. the method according to claim 1, wherein the vector of definite response content indicates, comprising:
Into convolutional neural networks mode input answer content trained in advance, the vector of each word is indicated, obtains answer content Vector indicates.
5. the method according to claim 1, wherein further include:
Determine the degree of correlation of problem content and answer content in the question and answer data;
Qualitative data and the degree of correlation according to the question and answer data, repair the qualitative data of the question and answer data Just.
6. according to the method described in claim 5, it is characterized in that, determining problem content and answer content in the question and answer data The degree of correlation, comprising:
According to the crucial Word similarity in problem content between the keyword for including and the keyword for including in answer content, problem It polymerize similarity, the label of problem content and the label of answer content between content generic and answer content generic Between label similarity and problem content theme and answer content theme between Topic Similarity at least one , determine the degree of correlation of problem content and answer content.
7. according to the method described in claim 6, it is characterized in that, problem content generic and answer content generic it Between polymerization similarity determination, comprising:
Determine the term vector for each word for including in corpus;
Clustering cluster belonging to each word is obtained as clustering processing to the term vector of each word;
Problem content institute is obtained according to clustering cluster belonging to word in clustering cluster belonging to word in problem content and answer content Belong to and polymerize similarity between classification and answer content generic.
8. the method according to claim 1, wherein further include:
Obtain the user feedback behavioral data of answer content, and the credit rank of feedback behavioral data owning user;
The qualitative data of answer content is modified according to the feedback behavioral data and the credit rank.
9. a kind of quality determining device of question and answer characterized by comprising
Vector module, for determining that the vector of problem content in question and answer data indicates and the vector of answer content indicates;
Quality module, for the vector of the question and answer quality analysis mode input described problem content constructed in advance indicate and it is described The vector of answer content indicates, obtains the qualitative data of the question and answer data.
10. device according to claim 9, which is characterized in that further include building module, the building module is for constructing The question and answer quality analysis model, comprising:
Determine that the vector of sample problem content indicates;
Determine the vector of the first answer content of the sample problem content indicates and second time of the sample problem content The vector for answering content indicates;
The vector of the sample problem content is indicated, the vector of the first answer content indicate and the second answer content to Scale is shown as the input to grade sequence learning model, the ranking results of first answer content and second answer content It is trained as the output to grade sequence learning model, obtains question and answer quality analysis model.
11. device according to claim 9, which is characterized in that the vector module is specifically used for:
Into Recognition with Recurrent Neural Network mode input problem content trained in advance, the vector of each word is indicated, obtains problem content Vector indicates.
12. device according to claim 9, which is characterized in that the vector module is also used to:
Into convolutional neural networks mode input answer content trained in advance, the vector of each word is indicated, obtains answer content Vector indicates.
13. device according to claim 9, which is characterized in that it further include degree of correlation module, the degree of correlation module packet It includes:
Degree of correlation determination unit, for determining the degree of correlation of problem content and answer content in the question and answer data;
Degree of correlation amending unit, for according to the question and answer data qualitative data and the degree of correlation, to the question and answer number According to qualitative data be modified.
14. device according to claim 13, which is characterized in that the degree of correlation determination unit is specifically used for:
According to the crucial Word similarity in problem content between the keyword for including and the keyword for including in answer content, problem It polymerize similarity, the label of problem content and the label of answer content between content generic and answer content generic Between label similarity and problem content theme and answer content theme between Topic Similarity at least one , determine the degree of correlation of problem content and answer content.
15. device according to claim 14, which is characterized in that the degree of correlation determination unit is also used to:
Determine the term vector for each word for including in corpus;
Clustering cluster belonging to each word is obtained as clustering processing to the term vector of each word;
Problem content institute is obtained according to clustering cluster belonging to word in clustering cluster belonging to word in problem content and answer content Belong to and polymerize similarity between classification and answer content generic.
16. device according to claim 9, which is characterized in that it further include feedback modifiers module, the feedback modifiers module For:
Obtain the user feedback behavioral data of answer content, and the credit rank of feedback behavioral data owning user;
The qualitative data of answer content is modified according to the feedback behavioral data and the credit rank.
17. a kind of server, which is characterized in that the server includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as the quality determination method of question and answer described in any one of claims 1-8.
18. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The quality determination method such as question and answer described in any one of claims 1-8 is realized when execution.
CN201810580409.3A 2018-06-07 2018-06-07 Quality determination method, device, server and the storage medium of question and answer Pending CN108960574A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810580409.3A CN108960574A (en) 2018-06-07 2018-06-07 Quality determination method, device, server and the storage medium of question and answer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810580409.3A CN108960574A (en) 2018-06-07 2018-06-07 Quality determination method, device, server and the storage medium of question and answer

Publications (1)

Publication Number Publication Date
CN108960574A true CN108960574A (en) 2018-12-07

Family

ID=64493639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810580409.3A Pending CN108960574A (en) 2018-06-07 2018-06-07 Quality determination method, device, server and the storage medium of question and answer

Country Status (1)

Country Link
CN (1) CN108960574A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657127A (en) * 2018-12-17 2019-04-19 北京百度网讯科技有限公司 A kind of answer acquisition methods, device, server and storage medium
CN110164447A (en) * 2019-04-03 2019-08-23 苏州驰声信息科技有限公司 A kind of spoken language methods of marking and device
CN111382264A (en) * 2018-12-27 2020-07-07 阿里巴巴集团控股有限公司 Session quality evaluation method and device and electronic equipment
CN111444724A (en) * 2020-03-23 2020-07-24 腾讯科技(深圳)有限公司 Medical question-answer quality testing method and device, computer equipment and storage medium
WO2020181800A1 (en) * 2019-03-12 2020-09-17 平安科技(深圳)有限公司 Apparatus and method for predicting score for question and answer content, and storage medium
CN111783473A (en) * 2020-07-14 2020-10-16 腾讯科技(深圳)有限公司 Method and device for identifying best answer in medical question and answer and computer equipment
WO2020237872A1 (en) * 2019-05-24 2020-12-03 平安科技(深圳)有限公司 Method and apparatus for testing accuracy of semantic analysis model, storage medium, and device
CN115048944A (en) * 2022-08-16 2022-09-13 之江实验室 Open domain dialogue reply method and system based on theme enhancement

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102346753A (en) * 2010-08-01 2012-02-08 青岛理工大学 Semi-supervised text clustering method and device fusing pairwise constraints and keywords
CN104573000A (en) * 2015-01-07 2015-04-29 北京云知声信息技术有限公司 Sequential learning based automatic questions and answers device and method
CN106095872A (en) * 2016-06-07 2016-11-09 北京高地信息技术有限公司 Answer sort method and device for Intelligent Answer System
CN107203600A (en) * 2017-05-12 2017-09-26 浙江大学 It is a kind of to utilize the evaluation method for portraying cause and effect dependence and sequential influencing mechanism enhancing answer quality-ordered
CN107368547A (en) * 2017-06-28 2017-11-21 西安交通大学 A kind of intelligent medical automatic question-answering method based on deep learning
CN107391729A (en) * 2017-08-02 2017-11-24 掌阅科技股份有限公司 Sort method, electronic equipment and the computer-readable storage medium of user comment
CN107507073A (en) * 2017-09-14 2017-12-22 中国人民解放军信息工程大学 Based on the service recommendation method for trusting extension and the sequence study of list level
US20180039702A1 (en) * 2016-08-04 2018-02-08 Facebook, Inc. Systems and methods for providing feed preference surveys in a social networking system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102346753A (en) * 2010-08-01 2012-02-08 青岛理工大学 Semi-supervised text clustering method and device fusing pairwise constraints and keywords
CN104573000A (en) * 2015-01-07 2015-04-29 北京云知声信息技术有限公司 Sequential learning based automatic questions and answers device and method
CN106095872A (en) * 2016-06-07 2016-11-09 北京高地信息技术有限公司 Answer sort method and device for Intelligent Answer System
US20180039702A1 (en) * 2016-08-04 2018-02-08 Facebook, Inc. Systems and methods for providing feed preference surveys in a social networking system
CN107203600A (en) * 2017-05-12 2017-09-26 浙江大学 It is a kind of to utilize the evaluation method for portraying cause and effect dependence and sequential influencing mechanism enhancing answer quality-ordered
CN107368547A (en) * 2017-06-28 2017-11-21 西安交通大学 A kind of intelligent medical automatic question-answering method based on deep learning
CN107391729A (en) * 2017-08-02 2017-11-24 掌阅科技股份有限公司 Sort method, electronic equipment and the computer-readable storage medium of user comment
CN107507073A (en) * 2017-09-14 2017-12-22 中国人民解放军信息工程大学 Based on the service recommendation method for trusting extension and the sequence study of list level

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YI TAY 等: "《Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering》", 《ARXIV》 *
应文豪 等: "《一种利用语义相似度改进问答摘要的方法》", 《北京大学学报(自然科学版)》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657127A (en) * 2018-12-17 2019-04-19 北京百度网讯科技有限公司 A kind of answer acquisition methods, device, server and storage medium
CN109657127B (en) * 2018-12-17 2021-04-20 北京百度网讯科技有限公司 Answer obtaining method, device, server and storage medium
CN111382264A (en) * 2018-12-27 2020-07-07 阿里巴巴集团控股有限公司 Session quality evaluation method and device and electronic equipment
CN111382264B (en) * 2018-12-27 2023-06-09 阿里巴巴集团控股有限公司 Session quality evaluation method and device and electronic equipment
WO2020181800A1 (en) * 2019-03-12 2020-09-17 平安科技(深圳)有限公司 Apparatus and method for predicting score for question and answer content, and storage medium
CN110164447B (en) * 2019-04-03 2021-07-27 苏州驰声信息科技有限公司 Spoken language scoring method and device
CN110164447A (en) * 2019-04-03 2019-08-23 苏州驰声信息科技有限公司 A kind of spoken language methods of marking and device
WO2020237872A1 (en) * 2019-05-24 2020-12-03 平安科技(深圳)有限公司 Method and apparatus for testing accuracy of semantic analysis model, storage medium, and device
CN111444724A (en) * 2020-03-23 2020-07-24 腾讯科技(深圳)有限公司 Medical question-answer quality testing method and device, computer equipment and storage medium
CN111783473A (en) * 2020-07-14 2020-10-16 腾讯科技(深圳)有限公司 Method and device for identifying best answer in medical question and answer and computer equipment
CN111783473B (en) * 2020-07-14 2024-02-13 腾讯科技(深圳)有限公司 Method and device for identifying best answer in medical question and answer and computer equipment
CN115048944A (en) * 2022-08-16 2022-09-13 之江实验室 Open domain dialogue reply method and system based on theme enhancement
CN115048944B (en) * 2022-08-16 2022-12-20 之江实验室 Open domain dialogue reply method and system based on theme enhancement

Similar Documents

Publication Publication Date Title
CN108960574A (en) Quality determination method, device, server and the storage medium of question and answer
CN110377759B (en) Method and device for constructing event relation graph
Deng et al. Introducing shared-hidden-layer autoencoders for transfer learning and their application in acoustic emotion recognition
TW202009749A (en) Human-machine dialog method, device, electronic apparatus and computer readable medium
WO2022095380A1 (en) Ai-based virtual interaction model generation method and apparatus, computer device and storage medium
US11729120B2 (en) Generating responses in automated chatting
CN108363790A (en) For the method, apparatus, equipment and storage medium to being assessed
CN105022754B (en) Object classification method and device based on social network
CN103577989B (en) A kind of information classification approach and information classifying system based on product identification
CN107273861A (en) A kind of subjective question marking methods of marking, device and terminal device
CN107832432A (en) A kind of search result ordering method, device, server and storage medium
CN107844533A (en) A kind of intelligent Answer System and analysis method
CN109284502B (en) Text similarity calculation method and device, electronic equipment and storage medium
CN109992781B (en) Text feature processing method and device and storage medium
CN111694940A (en) User report generation method and terminal equipment
CN112819023A (en) Sample set acquisition method and device, computer equipment and storage medium
Deng et al. Linked source and target domain subspace feature transfer learning--exemplified by speech emotion recognition
CN111694937A (en) Interviewing method and device based on artificial intelligence, computer equipment and storage medium
CN110209875A (en) User content portrait determines method, access object recommendation method and relevant apparatus
CN113505198A (en) Keyword-driven generating type dialogue reply method and device and electronic equipment
CN109271624A (en) A kind of target word determines method, apparatus and storage medium
TWI749349B (en) Text restoration method, device, electronic equipment and computer readable storage medium
CN112307048B (en) Semantic matching model training method, matching method, device, equipment and storage medium
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
CN110895656A (en) Text similarity calculation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181207

RJ01 Rejection of invention patent application after publication