CN109062973A - A kind of method for digging, device, server and the storage medium of question and answer resource - Google Patents

A kind of method for digging, device, server and the storage medium of question and answer resource Download PDF

Info

Publication number
CN109062973A
CN109062973A CN201810696978.4A CN201810696978A CN109062973A CN 109062973 A CN109062973 A CN 109062973A CN 201810696978 A CN201810696978 A CN 201810696978A CN 109062973 A CN109062973 A CN 109062973A
Authority
CN
China
Prior art keywords
answer
initial
question
resource
initial problem
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810696978.4A
Other languages
Chinese (zh)
Inventor
程耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810696978.4A priority Critical patent/CN109062973A/en
Publication of CN109062973A publication Critical patent/CN109062973A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the invention discloses method for digging, device, server and the storage mediums of a kind of question and answer resource.The described method includes: extracting the corresponding initial answer of each initial problem in each question and answer pair in community's question and answer resource;The corresponding target answer of each initial problem is determined according to each UGC content in the corresponding initial answer of each initial problem and vertical class resource;Target question and answer resource is excavated according to each initial problem and the corresponding target answer of each initial problem.Excavating cost can be not only saved, digging efficiency can also be improved and excavates accuracy.

Description

A kind of method for digging, device, server and the storage medium of question and answer resource
Technical field
The present embodiments relate to Internet technical field more particularly to a kind of method for digging, device, the clothes of question and answer resource Business device and storage medium.
Background technique
With the fast development of internet, the function of search engine is increasingly powerful, and user also gets over the expectation of search engine Come it is higher, start from basic related web page recall to intelligent answer change.When user to be inquired by search engine input The problem of when, it is desirable to the search result of acquisition is no longer relevant webpage, and wants to directly obtain the answer of problem.
Depth question and answer refer to the language for understanding the mankind, the meaning of intelligent recognition problem, and from the internet data of magnanimity The answer of extraction problem.One of the vital task of depth question answering system exactly constructs good question and answer resource.On the internet, Community's question and answer resource can provide question and answer resource for user, but the quality of question and answer pair is difficult to ensure in community's question and answer resource;And The source that UGC (User Generated Content) can furnish an answer for offline question answering system, but generally existing problem in UGC It is second-rate, or even the situation of mistake.By way of manual review and artificial correction, although can be from magnanimity, many and diverse The question and answer resource of a collection of high quality is excavated in UGC content.But the human cost of this method is too big, efficiency is too low, it is difficult to It is applied in actual product.
Summary of the invention
In view of this, method for digging, device, server and storage that the embodiment of the present invention provides a kind of question and answer resource are situated between Matter can not only save excavating cost, can also improve digging efficiency and excavate accuracy.
In a first aspect, the embodiment of the invention provides a kind of method for digging of question and answer resource, which comprises
The corresponding initial answer of each initial problem is extracted in each question and answer pair in community's question and answer resource;
It is determined according to each UGC content in the corresponding initial answer of each initial problem and vertical class resource each initial The corresponding target answer of problem;
Target question and answer resource is excavated according to each initial problem and the corresponding target answer of each initial problem.
In the above-described embodiments, described according to each in the corresponding initial answer of each initial problem and vertical class resource UGC content determines the corresponding target answer of each initial problem, comprising:
It is determined according to each UGC content in the corresponding initial answer of each initial problem and vertical class resource each initial The corresponding alternative answer of problem;
It is determined according to the corresponding initial answer of each initial problem and the corresponding alternative answer of each initial problem each The corresponding target answer of initial problem.
In the above-described embodiments, described according to each in the corresponding initial answer of each initial problem and vertical class resource UGC content determines the corresponding alternative answer of each initial problem, comprising:
Calculate the corresponding sentence vector of each UGC content of the corresponding sentence vector sum of each initial answer;
It is determined according to the corresponding sentence vector of the corresponding each UGC content of sentence vector sum of each initial answer each initial The corresponding alternative answer of problem.
It is in the above-described embodiments, described that calculate the corresponding each UGC content of sentence vector sum of each initial answer corresponding Sentence vector, comprising:
According to basic statement be unit by each initial answer and each UGC content be respectively divided into the first sentence dictionary and Second sentence dictionary;
The corresponding sentence vector of each initial answer is calculated according to the first sentence dictionary and the second sentence dictionary Sentence vector corresponding with each UGC content.
In the above-described embodiments, described corresponding according to the corresponding initial answer of each initial problem and each initial problem Alternative answer determine the corresponding target answer of each initial problem, comprising:
Calculate the corresponding word vectors of each initial answer and the corresponding word vectors of each alternative answer;
It is determined according to each initial corresponding word vectors of answer and the corresponding word vectors of each alternative answer each first The corresponding target answer of beginning problem.
Second aspect, the embodiment of the invention provides a kind of method for digging of question and answer resource, described device includes: extraction mould Block, determining module and excavation module;Wherein,
The extraction module, it is corresponding for extracting each initial problem in each question and answer pair in community's question and answer resource Initial answer;
The determining module, for according to each in the corresponding initial answer of each initial problem and vertical class resource UGC content determines the corresponding target answer of each initial problem;
The excavation module, for being excavated according to each initial problem and the corresponding target answer of each initial problem Target question and answer resource.
In the above-described embodiments, the determining module, be specifically used for according to the corresponding initial answer of each initial problem with And each UGC content in vertical class resource determines the corresponding alternative answer of each initial problem;It is corresponding according to each initial problem Initial answer and the corresponding alternative answer of each initial problem determine the corresponding target answer of each initial problem.
In the above-described embodiments, the determining module includes: computational submodule and determining submodule;Wherein,
The computational submodule, it is corresponding for calculating each UGC content of the corresponding sentence vector sum of each initial answer Sentence vector;
The determining submodule, for corresponding according to the corresponding each UGC content of sentence vector sum of each initial answer Sentence vector determines the corresponding alternative answer of each initial problem.
In the above-described embodiments, the computational submodule, specifically for initially being answered for unit by each according to basic statement Case and each UGC content are respectively divided into the first sentence dictionary and the second sentence dictionary;According to the first sentence dictionary and institute It states the second sentence dictionary and calculates the corresponding sentence vector of the corresponding each UGC content of sentence vector sum of each initial answer.
In the above-described embodiments, the computational submodule, be also used to calculate the corresponding word vectors of each initial answer and The corresponding word vectors of each alternative answer;
The determining submodule is also used to corresponding according to each initial corresponding word vectors of answer and each alternative answer Word vectors determine the corresponding target answer of each initial problem.
The third aspect, the embodiment of the invention provides a kind of servers, comprising:
One or more processors;
Memory, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes the method for digging of question and answer resource described in any embodiment of that present invention.
Fourth aspect, the embodiment of the invention provides a kind of storage mediums, are stored thereon with computer program, the program quilt The method for digging of question and answer resource described in any embodiment of that present invention is realized when processor executes.
The embodiment of the present invention proposes method for digging, device, server and the storage medium of a kind of question and answer resource, first in society The corresponding initial answer of each initial problem is extracted in each question and answer pair in area's question and answer resource;Then it is initially asked according to each It inscribes each UGC content in corresponding initial answer and vertical class resource and determines the corresponding target answer of each initial problem;Most Target question and answer resource is excavated according to each initial problem and the corresponding target answer of each initial problem afterwards.That is, It in the inventive solutions, can be according to each in the corresponding initial answer of each initial problem and vertical class resource UGC content determines the corresponding target answer of each initial problem;Then according to each initial problem and each initial problem pair Target question and answer resource is excavated in the target answer answered.In the method for digging of existing question and answer resource, pass through manual review and people The modified mode of work excavates the question and answer resource of a collection of high quality from magnanimity, many and diverse UGC content.It is asked using existing The method for digging of resource is answered, human cost is too big, and efficiency is too low, it is difficult to be applied in actual product.Therefore with prior art phase Than method for digging, device, server and the storage medium of the question and answer resource that the embodiment of the present invention proposes can not only save digging Cost is dug, digging efficiency can also be improved and excavates accuracy;Also, the technical solution of the embodiment of the present invention realizes simple side Just, convenient for universal, the scope of application is wider.
Detailed description of the invention
Fig. 1 is the implementation flow chart of the method for digging for the question and answer resource that the embodiment of the present invention one provides;
Fig. 2 is the implementation flow chart of the method for digging of question and answer resource provided by Embodiment 2 of the present invention;
Fig. 3 is the implementation flow chart of the method for digging for the question and answer resource that the embodiment of the present invention three provides;
Fig. 4 is the first structure diagram of the excavating gear for the question and answer resource that the embodiment of the present invention four provides;
Fig. 5 is the second structural schematic diagram of the excavating gear for the question and answer resource that the embodiment of the present invention four provides;
Fig. 6 is the structural schematic diagram for the server that the embodiment of the present invention five provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just In description, only some but not all contents related to the present invention are shown in the drawings.
Embodiment one
Fig. 1 is the implementation flow chart of the method for digging for the question and answer resource that the embodiment of the present invention one provides.As shown in Figure 1, asking The method for digging for answering resource may comprise steps of:
S101, the corresponding initial answer of each initial problem is extracted in each question and answer pair in community's question and answer resource.
In the prior art, class resource of hanging down includes high-quality and authoritative UGC content, but lacks matched problem; Community's question and answer resource is by question and answer to forming, but the quality of question and answer pair not can guarantee.If class resource of hanging down and community's question and answer resource There are a large amount of laps, then the two mutually verification, just therefrom can go out the good question and answer resource of a batch by automatic mining.In this hair In bright specific embodiment, server can extract each initial problem pair in each question and answer pair in community's question and answer resource The initial answer answered.Specifically, may include multiple<problem in community's question and answer resource, answer>right, server can it is each< Problem, the answer > corresponding initial answer of each initial problem is extracted in.Specifically, server can problem to 1 < just Beginning problem 1, initial answer 1 > in extract the corresponding initial answer 1 of initial problem 1;In question and answer to 2 < initial problem 2, initially answer Case 2 > in extract the corresponding initial answer 2 of initial problem 2;...;In question and answer to N<initial problem N, initial answer N>middle extraction The corresponding initial answer N of initial problem N out;Wherein, N is the natural number more than or equal to 1.
S102, it is determined respectively according to each UGC content in the corresponding initial answer of each initial problem and vertical class resource The corresponding target answer of a initial problem.
In a specific embodiment of the present invention, server can according to the corresponding initial answer of each initial problem and hang down Each UGC content in class resource determines the corresponding target answer of each initial problem.Specifically, server can be asked in community Answer in each question and answer pair in resource and extract<initial problem 1, initial answer 1>,<initial problem 2, initial answer 2>...,< Initial problem N, initial answer N >;Wherein, N is the natural number more than or equal to 1.Then server can < initial problem 1, initially Answer 1>,<initial problem 2, initial answer 2>..., the UGC content in<initial problem N, initial answer N>and vertical class resource 1, UGC content 1 ..., UGC content M, determine initial problem 1, initial problem 2 ..., the corresponding target answer 1 of initial problem N, Target answer 2 ..., target answer N;M is the natural number more than or equal to 1.
S103, target question and answer money is excavated according to each initial problem and the corresponding target answer of each initial problem Source.
In a specific embodiment of the present invention, server is according to the corresponding initial answer of each initial problem and vertical class After each UGC content in resource determines the corresponding target answer of each initial problem, server can be according to each initial Target question and answer resource is excavated in problem and the corresponding target answer of each initial problem.Specifically, server can be according to first Beginning problem 1 and the corresponding target answer 1 of initial problem 1, initial problem 2 and the corresponding target answer of initial problem 2 2 ..., the corresponding target answer N of initial problem N and initial problem N excavates target question and answer resource to < initial problem 1, mesh Mark answer 1>,<initial problem 2, target answer 2>...,<initial problem N, target answer N>;Each target question and answer resource is to group At target question and answer resource.
The method for digging for the question and answer resource that the embodiment of the present invention proposes, first in each question and answer pair in community's question and answer resource Extract the corresponding initial answer of each initial problem;Then it is provided according to the corresponding initial answer of each initial problem and vertical class Each UGC content in source determines the corresponding target answer of each initial problem;Finally according to each initial problem and each Target question and answer resource is excavated in the corresponding target answer of initial problem.That is, in the inventive solutions, Ke Yigen Determine that each initial problem is corresponding according to each UGC content in the corresponding initial answer of each initial problem and vertical class resource Target answer;Then target question and answer money is excavated according to each initial problem and the corresponding target answer of each initial problem Source.In the method for digging of existing question and answer resource, by way of manual review and artificial correction, from magnanimity, many and diverse The question and answer resource of a collection of high quality is excavated in UGC content.Using the method for digging of existing question and answer resource, human cost is too Greatly, efficiency is too low, it is difficult to be applied in actual product.Therefore, compared to the prior art, the question and answer money that the embodiment of the present invention proposes The method for digging in source can not only save excavating cost, can also improve digging efficiency and excavate accuracy;Also, the present invention The technical solution realization of embodiment is simple and convenient, it is universal to be convenient for, and the scope of application is wider.
Embodiment two
Fig. 2 is the implementation flow chart of the method for digging of question and answer resource provided by Embodiment 2 of the present invention.As shown in Fig. 2, asking The method for digging for answering resource may comprise steps of:
S201, the corresponding initial answer of each initial problem is extracted in each question and answer pair in community's question and answer resource.
In the prior art, class resource of hanging down includes high-quality and authoritative UGC content, but lacks matched problem; Community's question and answer resource is by question and answer to forming, but the quality of question and answer pair not can guarantee.If class resource of hanging down and community's question and answer resource There are a large amount of laps, then the two mutually verification, just therefrom can go out the good question and answer resource of a batch by automatic mining.In this hair In bright specific embodiment, server can extract each initial problem pair in each question and answer pair in community's question and answer resource The initial answer answered.Specifically, may include multiple<problem in community's question and answer resource, answer>right, server can it is each< Problem, the answer > corresponding initial answer of each initial problem is extracted in.Specifically, server can question and answer to 1 < just Beginning problem 1, initial answer 1 > in extract the corresponding initial answer 1 of initial problem 1;In question and answer to 2 < initial problem 2, initially answer Case 2>in extract the corresponding initial answer 2... of initial problem 2, question and answer to N<initial problem N, initial answer N>in extract The corresponding initial answer N of initial problem N out;Wherein, N is the natural number more than or equal to 1.
S202, it is determined respectively according to each UGC content in the corresponding initial answer of each initial problem and vertical class resource The corresponding alternative answer of a initial problem.
In a specific embodiment of the present invention, server can according to the corresponding initial answer of each initial problem and hang down Each UGC content in class resource determines the corresponding alternative answer of each initial problem.Specifically, server can be according to initial The corresponding initial answer 1 of problem 1, the corresponding initial answer 2 of initial problem 2 ..., the corresponding initial answer N of initial problem N with And UGC content 1 in vertical class resource, UGC content 2 ..., UGC content N determine the corresponding alternative answer of initial problem 1, initial The corresponding alternative answer of problem 2 ..., the corresponding alternative answer of initial problem N.That is, in specific embodiments of the present invention In, server can first filter out the corresponding alternative answer of each initial problem in each UGC content, then filter out The corresponding target answer of each initial problem is determined in the corresponding alternative answer of each initial problem.Due to can in screening process To exclude a large amount of UGC content unrelated with each initial problem, therefore the digging efficiency of server can be effectively improved.
S203, it is determined according to the corresponding alternative answer of the corresponding initial answer of each initial problem and each initial problem The corresponding target answer of each initial problem.
In a specific embodiment of the present invention, server can be according to the corresponding initial answer of each initial problem and each The corresponding alternative answer of a initial problem determines the corresponding target answer of each initial problem.Specifically, server can basis The corresponding initial answer 1 of initial problem 1, the corresponding initial answer 2 of initial problem 2 ..., the corresponding initial answer of initial problem N N and the corresponding alternative answer of initial problem 1, the corresponding alternative answer of initial problem 2 ..., initial problem N it is corresponding alternative Answer determine the corresponding target answer 1 of initial problem 1, the corresponding target answer 2 of initial problem 2 ..., initial problem N it is corresponding Target answer N.That is, in a specific embodiment of the present invention, server can be filtered out first in each UGC content respectively The corresponding alternative answer of a initial problem, then determined in the corresponding alternative answer of each initial problem filtered out it is each just The corresponding target answer of beginning problem.Since a large amount of UGC unrelated with each initial problem can be excluded in screening process Content, therefore the digging efficiency of server can be effectively improved.
S204, target question and answer money is excavated according to each initial problem and the corresponding target answer of each initial problem Source.
In a specific embodiment of the present invention, server is according to the corresponding initial answer of each initial problem and vertical class After each UGC content in resource determines the corresponding target answer of each initial problem, server can be according to each initial Target question and answer resource is excavated in problem and the corresponding target answer of each initial problem.Specifically, server can be according to first Beginning problem 1 and the corresponding target answer 1 of initial problem 1, initial problem 2 and the corresponding target answer of initial problem 2 2 ..., the corresponding target answer N of initial problem N and initial problem N excavates target question and answer resource to < initial problem 1, mesh Mark answer 1>,<initial problem 2, target answer 2>...,<initial problem N, target answer N>;Each target question and answer resource is to group At target question and answer resource.
The method for digging for the question and answer resource that the embodiment of the present invention proposes, first in each question and answer pair in community's question and answer resource Extract the corresponding initial answer of each initial problem;Then it is provided according to the corresponding initial answer of each initial problem and vertical class Each UGC content in source determines the corresponding target answer of each initial problem;Finally according to each initial problem and each Target question and answer resource is excavated in the corresponding target answer of initial problem.That is, in the inventive solutions, Ke Yigen Determine that each initial problem is corresponding according to each UGC content in the corresponding initial answer of each initial problem and vertical class resource Target answer;Then target question and answer money is excavated according to each initial problem and the corresponding target answer of each initial problem Source.In the method for digging of existing question and answer resource, by way of manual review and artificial correction, from magnanimity, many and diverse The question and answer resource of a collection of high quality is excavated in UGC content.Using the method for digging of existing question and answer resource, human cost is too Greatly, efficiency is too low, it is difficult to be applied in actual product.Therefore, compared to the prior art, the question and answer money that the embodiment of the present invention proposes The method for digging in source can not only save excavating cost, can also improve digging efficiency and excavate accuracy;Also, the present invention The technical solution realization of embodiment is simple and convenient, it is universal to be convenient for, and the scope of application is wider.
Embodiment three
Fig. 3 is the implementation flow chart of the method for digging of the question and answer resource in the embodiment of the present invention three.As shown in figure 3, question and answer The method for digging of resource may comprise steps of:
S301, the corresponding initial answer of each initial problem is extracted in each question and answer pair in community's question and answer resource.
In the prior art, class resource of hanging down includes high-quality and authoritative UGC content, but lacks matched problem; Community's question and answer resource is by question and answer to forming, but the quality of question and answer pair not can guarantee.If class resource of hanging down and community's question and answer resource There are a large amount of laps, then the two mutually verification, just therefrom can go out the good question and answer resource of a batch by automatic mining.In this hair In bright specific embodiment, server can extract each initial problem pair in each question and answer pair in community's question and answer resource The initial answer answered.Specifically, may include multiple<problem in community's question and answer resource, answer>right, server can it is each< Problem, the answer > corresponding initial answer of each initial problem is extracted in.Specifically, server can question and answer to 1 < just Beginning problem 1, initial answer 1 > in extract the corresponding initial answer 1 of initial problem 1;In question and answer to 2 < initial problem 2, initially answer Case 2>in extract the corresponding initial answer 2... of initial problem 2, question and answer to N<initial problem N, initial answer N>in extract The corresponding initial answer N of initial problem N out;Wherein, N is the natural number more than or equal to 1.
S302, the corresponding sentence vector of the corresponding each UGC content of sentence vector sum of each initial answer is calculated.
In a specific embodiment of the present invention, server can calculate the corresponding each UGC of sentence vector sum of initial answer The corresponding sentence vector of content.Specifically, server can be unit according to basic statement by each initial answer and each UGC Content is respectively divided into the first sentence dictionary and the second sentence dictionary;Then according to the first sentence dictionary and the second sentence dictionary meter Calculate the corresponding sentence vector of each UGC content of the corresponding sentence vector sum of each initial answer.For example, it is assumed that initial answer 1 is wrapped Include: initial statement 1, initial statement 2 ..., initial statement X;UGC content 1 include: UGC sentence 1, UGC sentence 2 ..., UGC language Sentence Y;Wherein, X and Y is the natural number more than or equal to 1.In this step, server can be extracted first in initial answer 1 Each basic statement: initial statement 1, initial statement 2 ..., initial statement X;Then it is extracted in UGC content 1 each basic Sentence: UGC sentence 1, UGC sentence 2 ..., UGC sentence Y.It then will be in the basic statement and each UGC in each initial answer Basic statement group in appearance is combined into a basic sentence dictionary;Each initial answer pair is calculated further according to each basic sentence dictionary The corresponding sentence vector of each UGC content of the sentence vector sum answered.Specifically, each basic sentence dictionary may include: dictionary List item 1, dictionary list item 2 ..., dictionary list item P;Wherein, P is the natural number less than or equal to the sum of X and Y.Dictionary list item 1 includes: Sentence mark 1 and basic statement 1;Dictionary list item 2 includes: sentence mark 2 and basic statement 2;...;Dictionary list item P includes: language Sentence mark P and basic statement P.It should be noted that if certain in initial answer in 1 some initial statement and UGC content 1 When a UGC sentence is identical, in the basis sentence dictionary, which can be merged with the UGC sentence in the same word In allusion quotation list item.
S303, determined according to the corresponding sentence vector of the corresponding each UGC content of sentence vector sum of each initial answer it is each The corresponding alternative answer of a initial problem.
In a specific embodiment of the present invention, server can be each according to the corresponding sentence vector sum of each initial answer The corresponding sentence vector of UGC content determines the corresponding alternative answer of each initial problem.Specifically, server can calculate each The similarity of the corresponding sentence vector of each UGC content of the initial corresponding sentence vector sum of answer, when each initial answer is corresponding The similarity of the corresponding sentence vector of each UGC content of sentence vector sum when being greater than preset threshold, server can will be similar The UGC content that degree is greater than preset threshold is determined as the corresponding alternative answer of each initial problem.
S304, each initial corresponding word vectors of answer and the corresponding word vectors of each alternative answer are calculated.
In a specific embodiment of the present invention, server can calculate corresponding word vectors of each initial answer and each The corresponding word vectors of alternative answer.Specifically, server can according to basic word be unit will each initial answer and respectively A alternative answer is divided into the first word dictionary and the second word dictionary;Then according to the first word dictionary and the second word dictionary Calculate the corresponding word vectors of each initial answer and the corresponding word vectors of each alternative answer.
S305, it is determined respectively according to each initial corresponding word vectors of answer and the corresponding word vectors of each alternative answer The corresponding target answer of a initial problem.
In a specific embodiment of the present invention, server can be according to corresponding word vectors of each initial answer and each The alternative corresponding word vectors of answer determine the corresponding target answer of each initial problem.Specifically, server can calculate respectively The similarity of a initial answer corresponding word vectors and the corresponding word vectors of each alternative answer, server can will be similar It spends maximum alternative answer and is determined as the corresponding target answer of each initial problem.For example, the corresponding word vectors of initial answer For A=(0,1,1 ..., 2,1);The alternative corresponding word vectors of answer are C1=(3,0,0 ..., 2,1), C2=(0,3,1 ..., 1,0), C3=(1,2,0 ..., 4,0), C4=(2,1,1 ..., 0,0).In this step, server can calculate separately C1 and A Similarity, the similarity of C2 and A, the similarity of C3 and A, the similarity of C4 and A;Then by the highest alternative answer of similarity It is determined as target answer.
S306, target question and answer money is excavated according to each initial problem and the corresponding target answer of each initial problem Source.
In a specific embodiment of the present invention, server can be corresponding according to each initial problem and each initial problem Target answer excavate target question and answer resource.Specifically, server can be corresponding according to initial problem 1 and initial problem 1 Target answer 1, initial problem 2 and the corresponding target answer 2 of initial problem 2 ..., initial problem N and initial problem N Corresponding target answer N, excavates target question and answer resource to<initial problem 1, target answer 1>,<initial problem 2, target answer 2>...,<initial problem N, target answer N>;Each target question and answer resource is to composition target question and answer resource.
The method for digging for the question and answer resource that the embodiment of the present invention proposes, first in each question and answer pair in community's question and answer resource Extract the corresponding initial answer of each initial problem;Then it is provided according to the corresponding initial answer of each initial problem and vertical class Each UGC content in source determines the corresponding target answer of each initial problem;Finally according to each initial problem and each Target question and answer resource is excavated in the corresponding target answer of initial problem.That is, in the inventive solutions, Ke Yigen Determine that each initial problem is corresponding according to each UGC content in the corresponding initial answer of each initial problem and vertical class resource Target answer;Then target question and answer money is excavated according to each initial problem and the corresponding target answer of each initial problem Source.In the method for digging of existing question and answer resource, by way of manual review and artificial correction, from magnanimity, many and diverse The question and answer resource of a collection of high quality is excavated in UGC content.Using the method for digging of existing question and answer resource, human cost is too Greatly, efficiency is too low, it is difficult to be applied in actual product.Therefore, compared to the prior art, the question and answer money that the embodiment of the present invention proposes The method for digging in source can not only save excavating cost, can also improve digging efficiency and excavate accuracy;Also, the present invention The technical solution realization of embodiment is simple and convenient, it is universal to be convenient for, and the scope of application is wider.
Example IV
Fig. 4 is the first structure diagram of the excavating gear for the question and answer resource that the embodiment of the present invention four provides.Such as Fig. 4 institute Show, the excavating gear of question and answer resource includes: that described device includes: extraction module 401, determining module 402 and excavation module 403; Wherein,
The extraction module 401, for extracting each initial problem in each question and answer pair in community's question and answer resource Corresponding initial answer;
The determining module 402, for according to each in the corresponding initial answer of each initial problem and vertical class resource A UGC content determines the corresponding target answer of each initial problem;
The excavation module 403, for being dug according to each initial problem and the corresponding target answer of each initial problem Excavate target question and answer resource.
Further, the determining module 402 is specifically used for according to the corresponding initial answer of each initial problem and hangs down Each UGC content in class resource determines the corresponding alternative answer of each initial problem;It is corresponding just according to each initial problem Beginning answer and the corresponding alternative answer of each initial problem determine the corresponding target answer of each initial problem.
Fig. 5 is the second structural schematic diagram of the excavating gear for the question and answer resource that the embodiment of the present invention four provides.Such as Fig. 5 institute Show, the determining module 402 includes: computational submodule 4021 and determining submodule 4022;Wherein,
The computational submodule 4021, for calculating each UGC content pair of the corresponding sentence vector sum of each initial answer The sentence vector answered;
The determining submodule 4022, for according to the corresponding each UGC content pair of sentence vector sum of each initial answer The sentence vector answered determines the corresponding alternative answer of each initial problem.
Further, the computational submodule 4021 is specifically used for according to basic statement being unit by each initial answer The first sentence dictionary and the second sentence dictionary are respectively divided into each UGC content;According to the first sentence dictionary and described Second sentence dictionary calculates the corresponding sentence vector of the corresponding each UGC content of sentence vector sum of each initial answer.
Further, the computational submodule 4021 is also used to calculate corresponding word vectors of each initial answer and each The corresponding word vectors of a alternative answer;
The determining submodule 4022 is also used to according to each initial corresponding word vectors of answer and each alternative answer Corresponding word vectors determine the corresponding target answer of each initial problem.
Method provided by any embodiment of the invention can be performed in the excavating gear of above-mentioned question and answer resource, has execution method Corresponding functional module and beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to the present invention is arbitrarily real The method for digging of the question and answer resource of example offer is provided.
Embodiment five
Fig. 6 is the structural schematic diagram for the server that the embodiment of the present invention five provides.Fig. 6, which is shown, to be suitable for being used to realizing this hair The block diagram of the exemplary servers of bright embodiment.The server 12 that Fig. 6 is shown is only an example, should not be to of the invention real The function and use scope for applying example bring any restrictions.
As shown in fig. 6, server 12 is showed in the form of universal computing device.The component of server 12 may include but not Be limited to: one or more processor or processing unit 16, system storage 28 connect different system components (including system Memory 28 and processing unit 16) bus 18.
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Server 12 typically comprises a variety of computer system readable media.These media can be and any can be serviced The usable medium that device 12 accesses, including volatile and non-volatile media, moveable and immovable medium.
System storage 28 may include the computer system readable media of form of volatile memory, such as arbitrary access Memory (RAM) 30 and/or cache memory 32.Server 12 may further include other removable/nonremovable , volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing not removable Dynamic, non-volatile magnetic media (Fig. 6 do not show, commonly referred to as " hard disk drive ").Although being not shown in Fig. 6, can provide Disc driver for being read and write to removable non-volatile magnetic disk (such as " floppy disk "), and to removable anonvolatile optical disk The CD drive of (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver can To be connected by one or more data media interfaces with bus 18.Memory 28 may include at least one program product, The program product has one group of (for example, at least one) program module, these program modules are configured to perform each implementation of the invention The function of example.
Program/utility 40 with one group of (at least one) program module 42 can store in such as memory 28 In, such program module 42 include but is not limited to operating system, one or more application program, other program modules and It may include the realization of network environment in program data, each of these examples or certain combination.Program module 42 is usual Execute the function and/or method in embodiment described in the invention.
Server 12 can also be logical with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 etc.) Letter, can also be enabled a user to one or more equipment interact with the server 12 communicate, and/or with make the server The 12 any equipment (such as network interface card, modem etc.) that can be communicated with one or more of the other calculating equipment communicate. This communication can be carried out by input/output (I/O) interface 22.Also, server 12 can also pass through network adapter 20 With one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication. As shown, network adapter 20 is communicated by bus 18 with other modules of server 12.It should be understood that although not showing in figure Out, can in conjunction with server 12 use other hardware and/or software module, including but not limited to: microcode, device driver, Redundant processing unit, external disk drive array, RAID system, tape drive and data backup storage system etc..
Processing unit 16 by the program that is stored in system storage 28 of operation, thereby executing various function application and Data processing, such as realize the method for digging of question and answer resource provided by the embodiment of the present invention.
Embodiment six
The embodiment of the present invention six provides a kind of computer storage medium.
The computer readable storage medium of the embodiment of the present invention, can be using one or more computer-readable media Any combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.Computer Readable storage medium storing program for executing for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, dress It sets or device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium wraps It includes: there is the electrical connection of one or more conducting wires, portable computer diskette, hard disk, random access memory (RAM), read-only Memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable Storage medium can be it is any include or storage program tangible medium, the program can be commanded execution system, device or Device use or in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service It is connected for quotient by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims (12)

1. a kind of method for digging of question and answer resource, which is characterized in that the described method includes:
The corresponding initial answer of each initial problem is extracted in each question and answer pair in community's question and answer resource;
Each initial problem is determined according to each UGC content in the corresponding initial answer of each initial problem and vertical class resource Corresponding target answer;
Target question and answer resource is excavated according to each initial problem and the corresponding target answer of each initial problem.
2. the method according to claim 1, wherein it is described according to the corresponding initial answer of each initial problem with And each UGC content in vertical class resource determines the corresponding target answer of each initial problem, comprising:
Each initial problem is determined according to each UGC content in the corresponding initial answer of each initial problem and vertical class resource Corresponding alternative answer;
It is determined according to the corresponding initial answer of each initial problem and the corresponding alternative answer of each initial problem each initial The corresponding target answer of problem.
3. according to the method described in claim 2, it is characterized in that, it is described according to the corresponding initial answer of each initial problem with And each UGC content in vertical class resource determines the corresponding alternative answer of each initial problem, comprising:
Calculate the corresponding sentence vector of each UGC content of the corresponding sentence vector sum of each initial answer;
Each initial problem is determined according to the corresponding sentence vector of the corresponding each UGC content of sentence vector sum of each initial answer Corresponding alternative answer.
4. according to the method described in claim 3, it is characterized in that, described calculate the corresponding sentence vector sum of each initial answer The corresponding sentence vector of each UGC content, comprising:
Each initial answer and each UGC content are respectively divided into the first sentence dictionary and second for unit according to basic statement Sentence dictionary;
It is each that the corresponding sentence vector sum of each initial answer is calculated according to the first sentence dictionary and the second sentence dictionary The corresponding sentence vector of a UGC content.
5. according to the method described in claim 2, it is characterized in that, it is described according to the corresponding initial answer of each initial problem with And the corresponding alternative answer of each initial problem determines the corresponding target answer of each initial problem, comprising:
Calculate the corresponding word vectors of each initial answer and the corresponding word vectors of each alternative answer;
It is determined and each is initially asked according to each initial corresponding word vectors of answer and the corresponding word vectors of each alternative answer Inscribe corresponding target answer.
6. a kind of excavating gear of question and answer resource, which is characterized in that described device includes: extraction module, determining module and excavation Module;Wherein,
The extraction module, it is corresponding just for extracting each initial problem in each question and answer pair in community's question and answer resource Beginning answer;
The determining module, for according in each UGC in the corresponding initial answer of each initial problem and vertical class resource Hold and determines the corresponding target answer of each initial problem;
The excavation module, for excavating target according to each initial problem and the corresponding target answer of each initial problem Question and answer resource.
7. device according to claim 6, it is characterised in that:
The determining module, specifically for according to each in the corresponding initial answer of each initial problem and vertical class resource UGC content determines the corresponding alternative answer of each initial problem;According to the corresponding initial answer of each initial problem and each The corresponding alternative answer of initial problem determines the corresponding target answer of each initial problem.
8. device according to claim 7, which is characterized in that the determining module includes: computational submodule and determines sub Module;Wherein,
The computational submodule, for calculating the corresponding sentence of each UGC content of the corresponding sentence vector sum of each initial answer Vector;
The determining submodule, for according to the corresponding sentence of the corresponding each UGC content of sentence vector sum of each initial answer Vector determines the corresponding alternative answer of each initial problem.
9. device according to claim 8, it is characterised in that:
The computational submodule is specifically used for being that unit distinguishes each initial answer and each UGC content according to basic statement It is divided into the first sentence dictionary and the second sentence dictionary;It is calculated according to the first sentence dictionary and the second sentence dictionary each The corresponding sentence vector of the corresponding each UGC content of sentence vector sum of a initial answer.
10. device according to claim 7, which is characterized in that the computational submodule is also used to calculate and each initially answer The corresponding word vectors of case and the corresponding word vectors of each alternative answer;
The determining submodule is also used to according to each initial corresponding word vectors of answer and the corresponding word of each alternative answer Language vector determines the corresponding target answer of each initial problem.
11. a kind of server characterized by comprising
One or more processors;
Memory, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now method for digging of the question and answer resource as described in any one of claims 1 to 5.
12. a kind of storage medium, is stored thereon with computer program, which is characterized in that the realization when program is executed by processor The method for digging of question and answer resource as described in any one of claims 1 to 5.
CN201810696978.4A 2018-06-29 2018-06-29 A kind of method for digging, device, server and the storage medium of question and answer resource Pending CN109062973A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810696978.4A CN109062973A (en) 2018-06-29 2018-06-29 A kind of method for digging, device, server and the storage medium of question and answer resource

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810696978.4A CN109062973A (en) 2018-06-29 2018-06-29 A kind of method for digging, device, server and the storage medium of question and answer resource

Publications (1)

Publication Number Publication Date
CN109062973A true CN109062973A (en) 2018-12-21

Family

ID=64818461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810696978.4A Pending CN109062973A (en) 2018-06-29 2018-06-29 A kind of method for digging, device, server and the storage medium of question and answer resource

Country Status (1)

Country Link
CN (1) CN109062973A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783631A (en) * 2019-02-02 2019-05-21 北京百度网讯科技有限公司 Method of calibration, device, computer equipment and the storage medium of community's question and answer data
CN111767366A (en) * 2019-04-01 2020-10-13 北京百度网讯科技有限公司 Question and answer resource mining method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1928864A (en) * 2006-09-22 2007-03-14 浙江大学 FAQ based Chinese natural language ask and answer method
CN102254039A (en) * 2011-08-11 2011-11-23 武汉安问科技发展有限责任公司 Searching engine-based network searching method
CN105159996A (en) * 2015-09-07 2015-12-16 百度在线网络技术(北京)有限公司 Deep question-and-answer service providing method and device based on artificial intelligence
US20160110364A1 (en) * 2014-10-18 2016-04-21 International Business Machines Corporation Realtime Ingestion via Multi-Corpus Knowledge Base with Weighting
US20170192976A1 (en) * 2016-01-06 2017-07-06 International Business Machines Corporation Ranking answers in ground truth of a question-answering system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1928864A (en) * 2006-09-22 2007-03-14 浙江大学 FAQ based Chinese natural language ask and answer method
CN102254039A (en) * 2011-08-11 2011-11-23 武汉安问科技发展有限责任公司 Searching engine-based network searching method
US20160110364A1 (en) * 2014-10-18 2016-04-21 International Business Machines Corporation Realtime Ingestion via Multi-Corpus Knowledge Base with Weighting
CN105159996A (en) * 2015-09-07 2015-12-16 百度在线网络技术(北京)有限公司 Deep question-and-answer service providing method and device based on artificial intelligence
US20170192976A1 (en) * 2016-01-06 2017-07-06 International Business Machines Corporation Ranking answers in ground truth of a question-answering system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783631A (en) * 2019-02-02 2019-05-21 北京百度网讯科技有限公司 Method of calibration, device, computer equipment and the storage medium of community's question and answer data
CN109783631B (en) * 2019-02-02 2022-05-17 北京百度网讯科技有限公司 Community question-answer data verification method and device, computer equipment and storage medium
CN111767366A (en) * 2019-04-01 2020-10-13 北京百度网讯科技有限公司 Question and answer resource mining method and device, computer equipment and storage medium
CN111767366B (en) * 2019-04-01 2023-07-14 北京百度网讯科技有限公司 Question and answer resource mining method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110008045A (en) Polymerization, device, equipment and the storage medium of micro services
CN110046116B (en) Tensor filling method, device, equipment and storage medium
CN111460815B (en) Rule processing method, apparatus, medium, and electronic device
CN109558604A (en) A kind of machine translation method, device, electronic equipment and storage medium
CN113407850B (en) Method and device for determining and acquiring virtual image and electronic equipment
JP2023533404A (en) DRIVABLE 3D CHARACTER GENERATION METHOD, APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM
CN109885628A (en) A kind of tensor transposition method, device, computer and storage medium
CN109683880A (en) Webpage exchange method, device, equipment and storage medium
CN109408834A (en) Auxiliary machinery interpretation method, device, equipment and storage medium
CN111625638B (en) Question processing method, device, equipment and readable storage medium
CN109933530A (en) Components testing method and device, equipment and storage medium
CN109062973A (en) A kind of method for digging, device, server and the storage medium of question and answer resource
CN109145164A (en) Data processing method, device, equipment and medium
CN107301220A (en) Method, device, equipment and the storage medium of data-driven view
CN109753644A (en) A kind of RichText Edition method, apparatus, mobile terminal and storage medium
CN113568605A (en) Method, device and system for configuring constrained route
CN107862035A (en) Network read method, device, Intelligent flat and the storage medium of minutes
US10482171B2 (en) Digital form optimization
CN109800361A (en) A kind of method for digging of interest point name, device, electronic equipment and storage medium
CN112764802A (en) Business logic customization method and device, electronic equipment and storage medium
CN109656728A (en) Page data operating method, device, equipment and medium
CN109948251A (en) Data processing method, device, equipment and storage medium based on CAD
KR20210061156A (en) System and method of providing civil model linking 3 dimensional model and analysis model
CN109408254A (en) A kind of information processing method, system and server
CN109739623A (en) A kind of method, apparatus of virtual machine (vm) migration, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181221

RJ01 Rejection of invention patent application after publication