CN110390005A - A kind of data processing method and device - Google Patents

A kind of data processing method and device Download PDF

Info

Publication number
CN110390005A
CN110390005A CN201910666576.4A CN201910666576A CN110390005A CN 110390005 A CN110390005 A CN 110390005A CN 201910666576 A CN201910666576 A CN 201910666576A CN 110390005 A CN110390005 A CN 110390005A
Authority
CN
China
Prior art keywords
question
answer
word
representation
embedded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910666576.4A
Other languages
Chinese (zh)
Inventor
吴玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shannon Huiyu Technology Co Ltd
Original Assignee
Beijing Shannon Huiyu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shannon Huiyu Technology Co Ltd filed Critical Beijing Shannon Huiyu Technology Co Ltd
Priority to CN201910666576.4A priority Critical patent/CN110390005A/en
Publication of CN110390005A publication Critical patent/CN110390005A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention provides a kind of data processing method and device, wherein this method comprises: handling the answer of problem and described problem, obtains the embedded expression of word of the word embedded expression and the answer of described problem;The embedded expression of the word of described problem is compressed, the embedded expression of word of compressed described problem is obtained;According to the embedded expression of word of the word of compressed described problem embedded expression and the answer, the matching value of answer and problem is calculated, and answer is ranked up according to obtained matching value.The data processing method and device provided through the embodiment of the present invention participate in the sequencer procedure of answer without artificial, and time saving and energy saving and sequence efficiency is high.

Description

Data processing method and device
Technical Field
The invention relates to the technical field of computers, in particular to a data processing method and device.
Background
Currently, with the development of web2.0 technology, an internet product model in which content is generated by being dominated by a user is gradually flourishing. In the web community forum, people can freely ask various questions and answer questions of others.
Due to the increase in the number of questions and answers and the disparity in the quality of answers, it is necessary to manually check the quality of answers and rank the plurality of answers to the questions according to the quality of answers.
Manually checking the quality of each answer is time consuming, labor intensive and inefficient.
Disclosure of Invention
In order to solve the above problem, embodiments of the present invention provide a data processing method and apparatus.
In a first aspect, an embodiment of the present invention provides a data processing method, including:
processing a question and an answer to the question to obtain a word-embedded representation of the question and a word-embedded representation of the answer;
compressing the word embedded representation of the problem to obtain a compressed word embedded representation of the problem;
and calculating the matching value of the answer and the question according to the compressed word embedded expression of the question and the compressed word embedded expression of the answer, and sequencing the answer according to the obtained matching value.
In a second aspect, an embodiment of the present invention further provides a data processing apparatus, including:
the first processing module is used for processing the question and the answer of the question to obtain a word embedded representation of the question and a word embedded representation of the answer;
the second processing module is used for compressing the word embedded expression of the problem to obtain the compressed word embedded expression of the problem;
and the sorting module is used for calculating the matching value of the answer and the question according to the compressed word embedded expression of the question and the compressed word embedded expression of the answer, and sorting the answer according to the obtained matching value.
In the solutions provided in the first to second aspects of the embodiments of the present invention, the word-embedded representation of the question is compressed to obtain a compressed word-embedded representation of the question, then the matching value between the answer and the question is calculated according to the compressed word-embedded representation of the question and the word-embedded representation of the answer, and the answers are ranked according to the obtained matching values.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a data processing method according to embodiment 1 of the present invention;
fig. 2 is a schematic structural diagram of a data processing apparatus according to embodiment 2 of the present invention.
Detailed Description
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
The initial design of the scheme aims to solve the problem of reordering answers in the community question answering.
With the development of web2.0 technology, internet product models (e.g., known hundreds, etc.) that are dominated by users to generate content are gradually flourishing. In the web community forum, people can freely ask various questions and answer questions of others. Due to the increase of the number of questions and answers and the irregularity of the quality of the answers, the process of manually checking the quality of the answers and ranking a plurality of answers to the questions according to the quality of the answers is time-consuming and labor-consuming.
Two characteristics of community question answering which are not possessed by the common question answering can be seen. First, the question includes both the subject matter that provides a brief overview of the question and the subject matter that describes the question in detail. Questioners often convey their primary focus and key information in the subject matter section of the question. They then provide more detailed information about the topic in the question content section, seeking help or expressing the emotions to the respondents. Second, redundancy and noise problems are common in community question-answering. Both the question and the answer may contain auxiliary sentences that do not provide meaningful information.
Previous studies have generally treated each word equally in question-answer expressions. However, due to redundant and noisy questions, only a portion of the text from the question and the answer is useful for determining the quality of the answer. Worse still, previous studies ignored the differences between the problem topic and the content parts and simply concatenated them into a problem representation. This simple connection may exacerbate the redundancy in question due to the above-described theme-content relationship.
Based on the above, the scheme provides a data processing method and device, the answers can be sorted only by calculating the matching values of the answers and the questions, manual participation is not needed in the whole sorting process of the answers, time and labor are saved, and sorting efficiency is high.
In the data processing method and the data processing device, the following steps are executed through a deep learning network. In the following embodiments, the parameters in the deep learning network are all set in the server, and when the parameters need to be used, the server may obtain the parameters from the server itself.
Example 1
The embodiment provides a data processing method, and an execution main body is a server.
The server may use any computing device capable of processing the text of the question and the answer to obtain the matching value between the answer and the question in the prior art, which is not described in detail herein.
Referring to a flow chart of a data processing method shown in fig. 1, the data processing method may include the following specific steps:
step 100, processing the question and the answer to the question to obtain a word-embedded representation of the question and a word-embedded representation of the answer.
In the above step 100, in order to obtain the word-embedded representation of the question and the word-embedded representation of the answer, the following steps (1) to (2) may be performed:
(1) inputting the text of the question and the text of the answer of the question into a dictionary to respectively obtain a word vector and a word vector of the question and a word vector of the answer;
(2) and splicing the word vector and the character vector of the question to obtain a word embedded representation of the question, and splicing the word vector and the character vector of the answer to obtain a word embedded representation of the answer.
In the step (1), the dictionary includes, but is not limited to: a GloVe word vector dictionary trained by the unmarked corpus and a word vector dictionary based on a convolutional neural network.
Because web text in the community question and answer forum is very different from standardized text in spelling and grammar, the specially trained GloVe vector can more accurately simulate single word interactions. Character embedding has proven to be very useful for unknown words, so it is particularly suitable for noisy web text in community question and answer forums.
Inputting the text of the problem into a GloVe word vector dictionary trained by the unmarked corpus to obtain a word vector of the problem; and inputting the text of the question into a word vector dictionary based on a convolutional neural network to obtain a word vector of the question.
Similarly, inputting the text of the answer into a GloVe word vector dictionary trained by the unmarked corpus to obtain the word vector of the answer; and inputting the text of the answer into a word vector dictionary based on a convolutional neural network to obtain a word vector of the answer.
And 102, compressing the word embedded expression of the problem to obtain the compressed word embedded expression of the problem.
The step 102 may specifically include the following steps (1) to (2):
(1) performing orthogonal decomposition on the word embedded expression of the problem to obtain a word embedded parallel component and an orthogonal component of the problem;
(2) and splicing the word-embedded parallel component and the orthogonal component of the problem to obtain the compressed word-embedded expression of the problem.
The word-embedded parallel component of the problem is obtained in step (1) above by the following formula:
wherein,the word representing the question is an embedded parallel component,a word-embedded representation of a body portion representing a question;a word-embedded representation of the ith word in the header section representing the question.
The word-embedded quadrature component of the problem is obtained by the following formula:
wherein,the word-embedded orthogonal component representing the question.
In step (2) above, the word-embedded parallel component and orthogonal component of the problem may be spliced using a fusion gate. This is the prior art, and is not described in detail in this embodiment.
In order to get a word-embedded representation of the problem after compression, the following steps (21) to (23) may be performed:
(21) computing an alignment score for the horizontal component based on the word-embedded parallel component of the question;
(22) calculating a summarized representation of the subject portion of the question from the subject portion of the question based on the alignment score of the horizontal component and the word-embedded representation of the subject portion of the question;
(23) and splicing the word embedded parallel component and the orthogonal component of the question according to the summarized representation of the question body part obtained according to the title part of the question to obtain the compressed word embedded representation of the question.
In the above step (21), the alignment score of the horizontal component is calculated by the following formula:
wherein,an alignment score representing the horizontal component, c an alignment parameter, Wp1And bp1Respectively, are parameters in a deep learning network.
And the alignment parameters are preset in the server.
In step (22) above, a summarized representation of the subject part of the question, obtained from the title part of the question, is calculated by the following formula:
wherein,a summary representation of the subject part of the question, taken from the title part of the question, is shown.
In the step (23), the word-embedded parallel component and the orthogonal component of the problem are spliced by the following formula to obtain a compressed word-embedded representation of the problem:
Fpara=σ(Wp2Semb+Wp3Sap+bp2)
Spara=Fpara⊙Semb+(1-Fpara)⊙Sap
wherein, Wp2、Wp3And bp2Representing parameters of a fusion gate in a deep learning network, FparaThe representation represents the size of the fusion gate, SparaWord-embedded representation, S, of parallel components of the problem after compressionembWord-embedded representation of the header part representing the question, SapRepresenting a rootA summary representation of the subject part of the question taken from the title part of the question.
And 104, calculating the matching value of the answer and the question according to the compressed word embedded expression of the question and the compressed word embedded expression of the answer, and sequencing the answer according to the obtained matching value.
In order to calculate the matching value between the answer and the question and sort the answers according to the obtained matching value, the following steps (1) to (9) may be specifically performed:
(1) mapping words in the answers from a word vector space to an interactive space with the same dimension as the question representation to obtain a compressed word embedded representation of the answers;
(2) calculating the similarity of the question theme and the question content in the compressed word-embedded expression of the question according to the compressed word-embedded expression of the question and the compressed word-embedded expression of the answer;
(3) calculating the similarity between the question and the answer according to the similarity between the calculated question theme and the calculated question content;
(4) calculating a first similarity between the question and the answer from the aspect of the question based on the calculated similarity between the question and the answer and the compressed word-embedded expression of the answer;
(5) calculating a second similarity of the question and the answer from the aspect of the answer based on the calculated similarity of the question and the answer and the compressed word-embedded expression of the question;
(6) splicing the first similarity with the word embedded representation of the question to obtain a summarized representation of the question obtained according to the answer;
(7) splicing the second similarity with the word embedded representation of the answer to obtain a summarized representation of the answer obtained according to the question;
(8) calculating a matching value of the answer to the question based on the obtained summarized representation of the question obtained from the answer and the summarized representation of the answer obtained from the question;
(9) and sequencing the answers of the questions according to the obtained matching values.
In the step (1) above, the compressed word-embedded representation of the answer is obtained by the following formula:
Crep=σ(Wc1Cemb+bc1)⊙
tanh(Wc2Cemb+bc2)
wherein, CrepWord-embedded representation, W, representing the compressed answerc1、Wc2、bc1And bc2Is a parameter in a deep learning network, CembA word-embedded representation representing the answer.
In the step (2) above, the similarity between the question subject and the question content in the compressed word-embedded representation of the question is calculated by the following formula:
wherein,representing the similarity of the subject and the content of the question, Wa1、Wa2And baAre parameters in a deep-learning network,a word-embedded representation of the question after compression,representing the mapped representation of the answer.
In the step (3), the similarity between the question and the answer is calculated by the following formula:
wherein, c represents an alignment parameter,representing the similarity of the question to the answer.
In the step (4) above, the first similarity of the question and the answer is calculated by the following formula:
wherein,representing a first degree of similarity of the question to the answer,representing the mapped representation of the answer.
In the above step (5), the second similarity of the question and the answer is calculated by the following formula:
wherein,a second degree of similarity representing the question and the answer,a word-embedded representation of the problem after compression.
In the step (8) above, the following steps (81) to (83) may be performed to calculate a matching value of the answer to the question:
(81) calculating a problem representation based on the summarized representation of the problem obtained from the answer;
(82) calculating an answer representation based on a summarized representation of answers obtained from the question;
(83) and calculating the matching value of the answer and the question through the question representation and the answer representation.
In the above step (81), the problem expression is calculated by the following formula:
As1=Ws2tanh(Ws1Satt+bs1)+bs2
wherein s issumRepresenting a problem representation; a. thes1Representing the result of attention matching when computing the problem representation; sattA summarized representation representing the questions derived from the answers; ws1、Ws2、bs1And bs2Are parameters in a deep learning network.
In the step (82), the answer expression is calculated by the following formula:
As2=Ws2tanh(Ws1Catt+bs1)+bs2
wherein, csumRepresenting an answer representation; a. thes2Representing the result of attention matching when calculating the answer representation; cattA summarized representation representing answers to questions; ws1、Ws2、bs1And bs2Are parameters in a deep learning network.
In the step (83), the matching value of the answer and the question is calculated by the following formula, and the probability that the answer belongs to "good answer, medium answer, or poor answer" is obtained:
Pr(y|S,B,C)=s0ftmax(W2tanh(W1[ssum;csum]+b1)+b2)
wherein, Pr (y | S, B, C) represents the matching value of the answer and the question; ssumRepresenting a problem representation; c. CsumRepresenting an answer representation; w1、W2、b1And b2Are parameters in a deep learning network.
In the step (9), the answers to the questions may be sorted in descending order of the calculated matching values.
In summary, in the data processing method provided in this embodiment, the word-embedded representation of the question is compressed to obtain the compressed word-embedded representation of the question, then the matching value between the answer and the question is calculated according to the compressed word-embedded representation of the question and the word-embedded representation of the answer, and the answers are ranked according to the obtained matching values.
Example 2
The present embodiment proposes a data processing apparatus for executing the data processing method of embodiment 1 described above.
Referring to the schematic structural diagram of the data processing apparatus shown in fig. 2, the present embodiment provides a data processing apparatus, including:
a first processing module 200, configured to process a question and an answer to the question to obtain a word-embedded representation of the question and a word-embedded representation of the answer;
a second processing module 202, configured to compress the word-embedded representation of the question to obtain a compressed word-embedded representation of the question;
and the sorting module 204 is configured to calculate a matching value between the answer and the question according to the compressed word-embedded representation of the question and the word-embedded representation of the answer, and sort the answer according to the obtained matching value.
In summary, the data processing apparatus provided in this embodiment compresses the word-embedded representation of the question to obtain a compressed word-embedded representation of the question, then calculates the matching values of the answers and the question according to the compressed word-embedded representation of the question and the word-embedded representation of the answer, and sorts the answers according to the obtained matching values.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (20)

1. A data processing method, comprising:
processing a question and an answer to the question to obtain a word-embedded representation of the question and a word-embedded representation of the answer;
compressing the word embedded representation of the problem to obtain a compressed word embedded representation of the problem;
and calculating the matching value of the answer and the question according to the compressed word embedded expression of the question and the compressed word embedded expression of the answer, and sequencing the answer according to the obtained matching value.
2. The method of claim 1, wherein processing a question and an answer to the question to obtain a word-embedded representation of the question and a word-embedded representation of the answer comprises:
inputting the text of the question and the text of the answer of the question into a dictionary to respectively obtain a word vector and a word vector of the question and a word vector of the answer;
and splicing the word vector and the character vector of the question to obtain a word embedded representation of the question, and splicing the word vector and the character vector of the answer to obtain a word embedded representation of the answer.
3. The method of claim 1, wherein compressing the word-embedded representation of the question to obtain a compressed word-embedded representation of the question comprises:
performing orthogonal decomposition on the word embedded expression of the problem to obtain a word embedded parallel component and an orthogonal component of the problem;
and splicing the word-embedded parallel component and the orthogonal component of the problem to obtain the compressed word-embedded expression of the problem.
4. The method of claim 3, wherein orthogonally decomposing the word-embedded representation of the question to obtain parallel components of the word-embedded representation of the question comprises:
the word-embedded parallel component of the problem is obtained by the following formula:
wherein,the word representing the question is an embedded parallel component,a word-embedded representation of a body portion representing a question;a word-embedded representation of the ith word in the header section representing the question.
5. The method of claim 4, wherein orthogonally decomposing the word-embedded representation of the question to obtain orthogonal components of the word-embedded representation of the question comprises:
the word-embedded quadrature component of the problem is obtained by the following formula:
wherein,the word-embedded orthogonal component representing the question.
6. The method of claim 4, wherein concatenating the word-embedded parallel component and the orthogonal component of the question to obtain a compressed word-embedded representation of the question comprises:
computing an alignment score for the horizontal component based on the word-embedded parallel component of the question;
calculating a summarized representation of the subject portion of the question from the subject portion of the question based on the alignment score of the horizontal component and the word-embedded representation of the subject portion of the question;
and splicing the word embedded parallel component and the orthogonal component of the question according to the summarized representation of the question body part obtained according to the title part of the question to obtain the compressed word embedded representation of the question.
7. The method of claim 6, wherein computing an alignment score for the horizontal component based on the parallel component of the word embedding of the question comprises:
calculating an alignment score of the horizontal component by the following formula:
wherein,an alignment score representing the horizontal component, c an alignment parameter, Wp1And bp1Respectively, are parameters in a deep learning network.
8. The method of claim 7, wherein computing a summarized representation of the subject portion of the question based on the alignment score of the horizontal component and the word-embedded representation of the subject portion of the question comprises:
a summarized representation of the subject part of the question, obtained from the title part of the question, is calculated by the following formula:
wherein,a summary representation of the subject part of the question, taken from the title part of the question, is shown.
9. The method of claim 7, wherein concatenating the word-embedded parallel component and the orthogonal component of the question according to the summarized representation of the question body part obtained from the question header part to obtain a compressed word-embedded representation of the question, comprises:
splicing the word-embedded parallel component and the orthogonal component of the problem by the following formula to obtain a compressed word-embedded representation of the problem:
Fpara=σ(Wp2Semb+Wp3Sap+bp2)
Spara=Fpara⊙Semb+(1-Fpara)⊙Sap
wherein, Wp2、Wp3And bp2Indicating fusion gate depthLearning parameters in a network, FparaThe representation represents the size of the fusion gate, SparaWord-embedded representation, S, of parallel components of the problem after compressionembWord-embedded representation of the header part representing the question, SapA summary representation of the subject part of the question, taken from the title part of the question, is shown.
10. The method of claim 1, wherein computing matching values for answers to questions based on the compressed word-embedded representations of the questions and the word-embedded representations of the answers, and ranking answers based on the resulting matching values comprises:
mapping words in the answers from a word vector space to an interactive space with the same dimension as the question representation to obtain a compressed word embedded representation of the answers;
calculating the similarity of the question theme and the question content in the compressed word-embedded expression of the question according to the compressed word-embedded expression of the question and the compressed word-embedded expression of the answer;
calculating the similarity between the question and the answer according to the similarity between the calculated question theme and the calculated question content;
calculating a first similarity between the question and the answer from the aspect of the question based on the calculated similarity between the question and the answer and the compressed word-embedded expression of the answer;
calculating a second similarity of the question and the answer from the aspect of the answer based on the calculated similarity of the question and the answer and the compressed word-embedded expression of the question;
splicing the first similarity with the word embedded representation of the question to obtain a summarized representation of the question obtained according to the answer;
splicing the second similarity with the word embedded representation of the answer to obtain a summarized representation of the answer obtained according to the question;
calculating a matching value of the answer to the question based on the obtained summarized representation of the question obtained from the answer and the summarized representation of the answer obtained from the question;
and sequencing the answers of the questions according to the obtained matching values.
11. The method of claim 10, wherein mapping words in an answer from a word vector space to an interaction space of the same dimension as a question representation, resulting in a compressed word-embedded representation of the answer, comprises:
obtaining a compressed word-embedded representation of the answer by:
Crep=σ(Wc1Cemb+bc1)⊙tanh(Wc2Cemb+bc2)
wherein, CrepWord-embedded representation, W, representing the compressed answerc1、Wc2、bc1And bc2Is a parameter in a deep learning network, CembA word-embedded representation representing the answer.
12. The method of claim 11, wherein calculating the similarity between the subject of the question and the content of the question in the compressed word-embedded representation of the question according to the compressed word-embedded representation of the question and the compressed word-embedded representation of the answer comprises:
calculating the similarity of the question subject and the question content in the compressed word-embedded representation of the question by the following formula:
wherein,representing the similarity of the subject and the content of the question, Wa1、Wa2And baAre parameters in a deep-learning network,a word-embedded representation of the question after compression,representing the mapped representation of the answer.
13. The method of claim 12, wherein calculating the similarity between the question and the answer according to the calculated similarity between the question subject and the question content comprises:
the similarity of the question and the answer is calculated by the following formula:
wherein, c represents an alignment parameter,representing the similarity of the question to the answer.
14. The method of claim 13, wherein calculating a first similarity of a question and an answer from a question aspect based on the calculated similarity of the question and the answer and the compressed word-embedded representation of the answer comprises:
calculating a first similarity of the question and the answer by the following formula:
wherein,representing a first degree of similarity of the question to the answer,representing the mapped representation of the answer.
15. The method of claim 13, wherein computing a second similarity of the question to the answer from the answer perspective based on the computed similarity of the question to the answer and the compressed word-embedded representation of the question comprises:
calculating a second similarity of the question to the answer by:
wherein,a second degree of similarity representing the question and the answer,a word-embedded representation of the problem after compression.
16. The method of claim 13, wherein calculating a matching value for an answer to a question based on the resulting summarized representation of the question as a function of the answer and the resulting summarized representation of the answer as a function of the question comprises:
calculating a problem representation based on the summarized representation of the problem obtained from the answer;
calculating an answer representation based on a summarized representation of answers obtained from the question;
and calculating the matching value of the answer and the question through the question representation and the answer representation.
17. The method of claim 16, wherein computing a question representation based on a summarized representation of the question derived from the answers comprises:
the problem representation is calculated by the following formula:
As1=Ws2tanh(Ws1Satt+bs1)+bs2
wherein s issumRepresenting a problem representation; a. thes1Representing the result of attention matching when computing the problem representation; sattA summarized representation representing the questions derived from the answers; ws1、Ws2、bs1And bs2Are parameters in a deep learning network.
18. The method of claim 16, wherein computing a representation of the answer based on a summarized representation of the answer from the question comprises:
the answer representation is calculated by the following formula:
As2=Ws2tanh(Ws1Catt+bs1)+bs2
wherein, csumRepresenting an answer representation; a. thes2Representing the result of attention matching when calculating the answer representation; cattA summarized representation representing answers to questions; ws1、Ws2、bs1And bs2Are parameters in a deep learning network.
19. The method of claim 16, wherein a matching value of an answer to a question is calculated from the question representation and the answer representation:
the matching value of the answer to the question is calculated by the following formula:
Pr(y|S,B,C)=softmax(W2 tanh(W1[ssum;csum]+b1)+b2)
wherein, Pr (y | S, B, C) represents the matching value of the answer and the question; ssumRepresenting a problem representation; c. CsumRepresenting an answer representation; w1、W2、b1And b2Are parameters in a deep learning network.
20. A data processing apparatus, comprising:
the first processing module is used for processing the question and the answer of the question to obtain a word embedded representation of the question and a word embedded representation of the answer;
the second processing module is used for compressing the word embedded expression of the problem to obtain the compressed word embedded expression of the problem;
and the sorting module is used for calculating the matching value of the answer and the question according to the compressed word embedded expression of the question and the compressed word embedded expression of the answer, and sorting the answer according to the obtained matching value.
CN201910666576.4A 2019-07-23 2019-07-23 A kind of data processing method and device Pending CN110390005A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910666576.4A CN110390005A (en) 2019-07-23 2019-07-23 A kind of data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910666576.4A CN110390005A (en) 2019-07-23 2019-07-23 A kind of data processing method and device

Publications (1)

Publication Number Publication Date
CN110390005A true CN110390005A (en) 2019-10-29

Family

ID=68287149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910666576.4A Pending CN110390005A (en) 2019-07-23 2019-07-23 A kind of data processing method and device

Country Status (1)

Country Link
CN (1) CN110390005A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108132931A (en) * 2018-01-12 2018-06-08 北京神州泰岳软件股份有限公司 A kind of matched method and device of text semantic
CN108829818A (en) * 2018-06-12 2018-11-16 中国科学院计算技术研究所 A kind of file classification method
CN109271505A (en) * 2018-11-12 2019-01-25 深圳智能思创科技有限公司 A kind of question answering system implementation method based on problem answers pair
US20190065576A1 (en) * 2017-08-23 2019-02-28 Rsvp Technologies Inc. Single-entity-single-relation question answering systems, and methods
US20190079921A1 (en) * 2015-01-23 2019-03-14 Conversica, Inc. Systems and methods for automated question response
CN109656952A (en) * 2018-10-31 2019-04-19 北京百度网讯科技有限公司 Inquiry processing method, device and electronic equipment
CN109726396A (en) * 2018-12-20 2019-05-07 泰康保险集团股份有限公司 Semantic matching method, device, medium and the electronic equipment of question and answer text

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190079921A1 (en) * 2015-01-23 2019-03-14 Conversica, Inc. Systems and methods for automated question response
US20190065576A1 (en) * 2017-08-23 2019-02-28 Rsvp Technologies Inc. Single-entity-single-relation question answering systems, and methods
CN108132931A (en) * 2018-01-12 2018-06-08 北京神州泰岳软件股份有限公司 A kind of matched method and device of text semantic
CN108829818A (en) * 2018-06-12 2018-11-16 中国科学院计算技术研究所 A kind of file classification method
CN109656952A (en) * 2018-10-31 2019-04-19 北京百度网讯科技有限公司 Inquiry processing method, device and electronic equipment
CN109271505A (en) * 2018-11-12 2019-01-25 深圳智能思创科技有限公司 A kind of question answering system implementation method based on problem answers pair
CN109726396A (en) * 2018-12-20 2019-05-07 泰康保险集团股份有限公司 Semantic matching method, device, medium and the electronic equipment of question and answer text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
霍欢: "一种基于关键词扩展的答案块提取模型", 《小型微型计算机系统》 *

Similar Documents

Publication Publication Date Title
CN109213999B (en) Subjective question scoring method
CN108363743B (en) Intelligent problem generation method and device and computer readable storage medium
CN107330130B (en) Method for realizing conversation robot recommending reply content to manual customer service
US8818926B2 (en) Method for personalizing chat bots
CN108228576B (en) Text translation method and device
CN104657923B (en) Method and device for double checking and judging of test questions
CN104731777A (en) Translation evaluation method and device
CN110895553A (en) Semantic matching model training method, semantic matching method and answer obtaining method
KR20080021017A (en) Comparing text based documents
CN109614480B (en) Method and device for generating automatic abstract based on generation type countermeasure network
Pramukantoro et al. Comparative analysis of string similarity and corpus-based similarity for automatic essay scoring system on e-learning gamification
Yoshimura et al. SOME: Reference-less sub-metrics optimized for manual evaluations of grammatical error correction
Gruzd et al. Coding and classifying knowledge exchange on social media: A comparative analysis of the# Twitterstorians and AskHistorians communities
CN117972434A (en) Training method, training device, training equipment, training medium and training program product for text processing model
JP2020160159A (en) Scoring device, scoring method, and program
CN111680134B (en) Method for measuring inquiry and answer consultation information by information entropy
CN117034956A (en) Text quality evaluation method and device
CN113011154A (en) Job duplicate checking method based on deep learning
CN110390005A (en) A kind of data processing method and device
CN109657250B (en) Text translation method, device, equipment and readable storage medium
CN108959467B (en) Method for calculating correlation degree of question sentences and answer sentences based on reinforcement learning
KR102330970B1 (en) Assessment system and method for education based on artificial intelligence
CN115934891A (en) Question understanding method and device
CN115795007A (en) Intelligent question-answering method, intelligent question-answering device, electronic equipment and storage medium
Willis et al. Identifying domain reasoning to support computer monitoring in typed-chat problem solving dialogues

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191029