CN111143531A - Question-answer pair construction method, system, device and computer readable storage medium - Google Patents

Question-answer pair construction method, system, device and computer readable storage medium Download PDF

Info

Publication number
CN111143531A
CN111143531A CN201911349116.5A CN201911349116A CN111143531A CN 111143531 A CN111143531 A CN 111143531A CN 201911349116 A CN201911349116 A CN 201911349116A CN 111143531 A CN111143531 A CN 111143531A
Authority
CN
China
Prior art keywords
question
answer
answer pairs
sentences
pairs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911349116.5A
Other languages
Chinese (zh)
Inventor
蒋芳清
熊友军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ubtech Robotics Corp
Original Assignee
Ubtech Robotics Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ubtech Robotics Corp filed Critical Ubtech Robotics Corp
Priority to CN201911349116.5A priority Critical patent/CN111143531A/en
Publication of CN111143531A publication Critical patent/CN111143531A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The invention discloses a question-answer pair construction method, a system, a device and a computer readable storage medium, wherein the method comprises the following steps: extracting sentences of potential question-answer pairs in the text paragraphs; sorting the sentences of the potential question-answer pairs to generate candidate question-answer pairs; and scoring and screening the candidate question-answer pairs to obtain the question-answer pairs with the scores higher than a set threshold value. Through the method, the potential question-answer pairs in the documents are automatically extracted to construct high-quality question-answer pairs, so that the rapidness and the accuracy of constructing the question-answer pairs are improved, and the quality of a question-answer knowledge base is improved.

Description

Question-answer pair construction method, system, device and computer readable storage medium
Technical Field
The invention relates to the technical field of natural language processing and knowledge base storage, in particular to a question-answer pair construction method, a question-answer pair construction system, a question-answer pair construction device and a computer readable storage medium.
Background
The existing question-answer knowledge base consists of scenes, questions and corresponding answers, the knowledge sources of the question-answer knowledge base mainly comprise documents such as rule terms, user manuals and the like, and the documents all have some simple descriptions of facts, such as 'when a person takes a high-speed rail, the person cannot take a high-speed rail for a pet', and 'the person can return goods without reason within 7 days after goods inspection and acceptance'.
In the question-answering system based on question-answering pairs, a question-answering knowledge base formed by the question-answering pairs is a knowledge source of the question-answering system, and the accuracy and richness of knowledge determine the quality of the question-answering system, so that the knowledge base formed by the question-answering pairs is an important ring of the question-answering system.
The construction of the existing knowledge base depends on the traditional manual editing mode, question-answer pairs are extracted from text documents such as rule terms and user manuals, and the question-answer pairs are scored in a manual screening mode. This construction requires manual intervention, which not only requires high operating and maintenance costs, but also makes it difficult to control the quality of the knowledge base.
Disclosure of Invention
Aiming at the defects in the prior art, the invention mainly solves the technical problem of providing a question-answer pair construction method, a system, a device and a computer readable storage medium, and the question-answer knowledge base is constructed by automatically extracting question-answer pairs in a document based on natural processing and deep learning technology, so that the automatic construction of the question-answer knowledge base is realized, the labor cost is reduced, and the quality of the question-answer knowledge base is improved.
In order to solve the technical problems, one technical scheme adopted by the invention is to provide a question-answer pair construction method, which comprises the following steps: extracting sentences of potential question-answer pairs in the text paragraphs; sorting the sentences of the potential question-answer pairs to generate candidate question-answer pairs; and scoring the candidate question-answer pairs and screening to obtain the question-answer pairs with scores higher than a set threshold value.
Before the step of extracting sentences of potential question-answer pairs in the text paragraphs, the method comprises the following steps: extracting text paragraphs from an input text document, and performing segmentation processing on the text paragraphs by adopting a segmentation method; and performing text preprocessing on the text paragraphs after the segmentation processing.
The step of extracting the sentences of the potential question-answer pairs in the text paragraphs specifically comprises the following steps: performing syntactic dependency analysis on sentences in the text paragraphs and outputting dependency relationships among words in the sentences; extracting a main stem of the sentence according to the dependency relationship; judging whether the backbone of the sentence has potential question-answer knowledge or not; and when the judgment result is yes, extracting sentences which are potential question-answer pairs.
The step of sorting the sentences of the potential question-answer pairs to generate candidate question-answer pairs specifically comprises the following steps: simplifying the sentences of the potential question-answer pairs; performing entity recognition on the simplified sentence; extracting entity construction question answer pairs of sentences; and rewriting the question-answer pairs by adopting a question-answer pair rewriting method based on a depth generation model to obtain candidate question-answer pairs.
The step of scoring and screening the candidate question-answer pairs to obtain the question-answer pairs specifically comprises the following steps: grading the candidate question answers by adopting a grading method based on a rapid text classification model; and obtaining the question-answer pairs with the scores higher than the set threshold value by adopting a screening method based on sorting filtering according to the scoring results of the candidate question-answer pairs.
In order to solve the above technical problem, another technical solution adopted by the present invention is to provide a question-answer pair construction system, including: the extraction module is used for extracting sentences of potential question-answer pairs in the text paragraphs; the candidate question-answer pair generating module is used for sorting sentences of the potential question-answer pairs and generating candidate question-answer pairs; the scoring module is used for training a rapid text classification model to score the candidate question answers for classification; and the screening module is used for screening out question-answer pairs with the scores higher than a set threshold value through the sorting filter.
Wherein, the question-answer pair construction system further comprises: the input module is used for inputting a text document; and the preprocessing module is used for preprocessing the text paragraphs.
Wherein, the question-answer pair extraction system further comprises: the judging module is used for judging whether the sentences in the text paragraphs have potential question-answer knowledge or not; and the output module is used for outputting the question-answer pairs with the scores higher than the set threshold value.
In order to solve the above technical problem, another technical solution adopted by the present invention is to provide a question-answer pair constructing apparatus, including: a memory for storing program data which, when executed, implements the steps of the question-answer pair construction method described in any one of the above; a processor for executing the program instructions stored in the memory to implement the steps of the question-answer pair construction method described in any one of the above.
In order to solve the above technical problem, another technical solution adopted by the present invention is to provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program, when executed by a processor, implements the steps in the question-answer pair construction method described in any one of the above.
The invention has the beneficial effects that: different from the situation of the prior art, the candidate question-answer pairs are generated by automatically extracting sentences of potential question-answer pairs in the document from the document, the candidate question-answer pairs are rewritten by adopting a method based on combination of a question template and a deep learning model, the accuracy and the diversity of generated questions are ensured, and finally, the candidate question-answer pairs are scored and screened based on a scoring model and a rapid text classification model, the correlation between the generated questions and answers is ensured, and high-quality question-answer pairs are obtained. By the method, the system and the device, the automatic construction of the question-answer pairs is realized, the degree of dependence on the traditional manual editing is reduced, the labor cost is reduced, the question-answer pairs are constructed more quickly and accurately, and the quality of the question-answer knowledge base is improved.
Drawings
FIG. 1 is a schematic flow chart diagram of an embodiment of a question-answer pair construction method according to the present invention;
FIG. 2 is a schematic flow chart of one embodiment of step 11 of FIG. 1;
FIG. 3 is a schematic flow chart of one embodiment of step 12 of FIG. 1;
FIG. 4 is a schematic diagram of a question template and a corresponding question and answer pair according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart diagram of one embodiment of step 13 of FIG. 1;
FIG. 6 is a block diagram of an embodiment of a question-answer pair construction system of the present invention;
FIG. 7 is a schematic structural diagram of an embodiment of a question-answer pair constructing apparatus according to the present invention;
FIG. 8 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and "a" and "an" generally include at least two, but do not exclude at least one, unless the context clearly dictates otherwise.
It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
It should be understood that the terms "comprises," "comprising," or any other variation thereof, as used herein, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
A question-answer pair construction method, system, apparatus, and computer-readable storage medium according to embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that these examples are not intended to limit the scope of the present disclosure.
Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a question-answer pair construction method according to the present invention, in which the method includes:
s11: sentences of potential question-answer pairs in the text paragraphs are extracted.
In an embodiment of the invention, text paragraphs are extracted from the input original text document. After the text paragraphs are extracted from the text document, the text paragraphs are segmented by adopting a segmentation method.
Specifically, the line feed character is used as a mark for paragraph distinction, and the number of characters of each paragraph is controlled within a preset interval range.
And further, preprocessing the text paragraphs after the segmentation processing.
Specifically, the text paragraphs are processed in at least one mode of abnormal character removal, case and case conversion, simplified and traditional body conversion, sentence breaking and word segmentation, part of speech tagging and sentence simplification, and the difficulty of subsequent processing is reduced by reducing character noise pollution.
In the embodiment of the invention, syntactic dependency analysis is carried out on the preprocessed sentence, the dependency relationship among the words in the sentence is output through the syntactic dependency analysis, the stem of the sentence is extracted according to the dependency relationship, and finally whether potential question-answering knowledge exists in the sentence is judged through the stem.
Specifically, referring to fig. 2, fig. 2 is a schematic flow chart of an embodiment of step 11 in fig. 1, in which the method includes:
s211: and performing syntactic dependency analysis on the preprocessed sentences and outputting the dependency relationship among the words in the sentences.
Syntactic dependency analysis, called dependency analysis for short, is used for identifying interdependencies among words in a sentence.
Specifically, the dependency syntax expresses the whole sentence structure by the dependency relationship among the words, which expresses the semantic dependency relationship among the components of the sentence, the dependency relationship among all the words forms a syntax tree, the root node of the tree is the sentence center word and is used for expressing the core content of the whole sentence, that is, each sentence has only one center word, and each word in the sentence has one word related to the center word.
In an embodiment of the invention, syntactic dependency analysis is to label dependencies between words in a sentence.
Specifically, the dependency relationship among words is at least one of a dominance relationship, a move-guest relationship, an inter-guest relationship, a preposition object, a bilingual, a centering relationship, a middle-form structure, a move-complement structure, a parallel relationship, a mediate-guest relationship, a left additional relationship, a right additional relationship, an independent structure, and a core relationship.
In the embodiment of the present invention, the syntax dependency analysis method or tool is not specifically limited, but the format of the syntax analysis result is limited.
Specifically, the output format of the syntactic dependency analysis result is defined as the CONLL markup format.
S212: and extracting the main stem of the sentence according to the dependency relationship.
In an embodiment of the present invention, the stems of sentences are extracted from the result of parsing the syntactic dependencies.
Specifically, a word having a core relationship with the headword dependency relationship is extracted as a predicate.
The central word is the center of the sentence and governs other components in the sentence, and is not governed by any other components.
Further, all the dominance relations in the sentence are traversed, and words with the predicate dependency relations as the dominance relations are extracted as the subjects.
Further, all the verb relations in the sentence are traversed, and the words with the verb dependency relations being the verb relations are extracted as the objects.
Further, the subject, predicate, and object extracted in the above steps are combined as the backbone of the sentence.
S213: and judging whether the backbone of the sentence has potential question-answering knowledge.
In the embodiment of the invention, whether the extracted backbone meets one of the conditions of containing a subject, a predicate and an object or containing a subject, a predicate and an object is judged, if yes, the sentence is judged to have potential knowledge of questioning and answering, otherwise, the sentence is not judged to exist.
S214: and when the judgment result is yes, extracting the sentence as a sentence of the potential question-answer pair.
In this step, the sentences with potential question-answer knowledge are extracted as the sentences of potential question-answer pairs.
S12: and sorting the sentences of the potential question-answer pairs to generate candidate question-answer pairs.
In an embodiment of the invention, sentences identified as potentially capable of extracting knowledge of question-answer pairs are collated to generate question-answer pairs, and the question-answer pairs generated in this step are candidate question-answer pairs.
Specifically, please refer to fig. 3, where fig. 3 is a schematic flowchart of an embodiment of step 12 in fig. 1, and in this embodiment, the method includes:
s311: simplifying the sentences of the potential question-answer pairs.
In embodiments of the present invention, simplifying a sentence of a complex potential question-and-answer pair refers to deleting meaningless clauses or components in the sentence.
The complex sentence is a sentence composed of a plurality of clauses or having a complex structure.
In the embodiment of the invention, whether the sentences of the potential question-answer pairs are complex sentences is judged firstly.
Wherein, the complex sentence is a sentence with the number of clauses larger than 1 or the syntactic dependency coefficient larger than 3.
Further, when the judgment result is yes, simplifying the sentence; if the determination result is negative, the process proceeds directly to S312.
In the embodiment of the invention, the sentence simplification comprises three steps of nonsense clause deletion, main stem extraction and main stem supplement.
In the step of deleting the nonsense clauses, firstly, a nonsense sentence set is defined, then clauses in the complex sentences are matched with the nonsense sentences in the defined set one by one, and the clauses are deleted if the matching is successful.
The stem extraction step is the same as S212, and is not described herein again.
Wherein, in the stem supplement step, the modified components of the stem and the extracted stem are combined to form a new stem.
The embodiment of the present invention does not limit the manner of the trunk supplement.
Optionally, in other embodiments of the present invention, the backbone supplementing step may not be performed.
S312: and performing entity recognition on the simplified sentence.
In an embodiment of the invention, the identified entity type includes at least one of a person name, a place name, an organization name, and a time.
Specifically, the embodiment of the present invention does not limit the recognition method, and the entity may be recognized by at least one of dictionary matching, training of an entity recognition model, and direct utilization of an open source tool.
S313: and extracting entities of the sentences to construct question-answer pairs.
In the embodiment of the invention, question-answer pairs are constructed by adopting a question-answer pair construction method based on a question template.
In embodiments of the present invention, the problem templates defined are divided into two broad categories, one is entity replacement and the other is template filling.
In a specific implementation scenario, based on a question template replaced by an entity, corresponding query words are directly adopted for several types of entities such as a person name, a place name, an organization name and time to generate a question, and the replaced entity is used as an answer.
For example, when the identified entity is "Roman", since Roman is the place name, the corresponding query word is "where", and "Roman" is directly used as the answer; when the identified entity is "at light o 'clock", since 8 points correspond to time, the corresponding interrogators are "where", "at light o' clock" directly as the answer.
In another specific implementation scenario, based on the problem template filled by the template, slot slots are first determined, and then phrases are extracted from the document to fill the slots to complete sentences and adjust sentence structures.
Specifically, please refer to fig. 4, fig. 4 is a schematic structural diagram of a question template and a corresponding question and answer pair according to an embodiment of the present invention.
In practical applications, inputting "X is Y" in the template slot 41, the following problem can be constructed in the problem slot 42: is X Y? "," What is Y? "," where is Y? "," Why is Y? "," Who is Y? ", and the corresponding answer slot 43 generates the following answer: "yes.", "x.
Alternatively, inputting "The X verbs Y" in template slot 41, The following question may be constructed in question slot 42: "Does X verbs Y? "," at does the X verbs? ", and the corresponding answer slot 43 generates the following answer: "x.", "y.".
S314: and rewriting the question answer pair by adopting a question-answer pair rewriting method based on a depth generation model to obtain the candidate question-answer pair.
As can be seen from the question-answer pairs generated in the steps, the problems constructed based on the predefined problem template have the problems of single structure, poor expression diversity and the like.
In the embodiment of the invention, a question-answer pair rewriting method for training a deep generation model is provided, the deep generation model is trained in advance in a supervised learning mode, the trained deep generation model is used for rewriting the question generated in S313, and the rewritten question answer pair is output.
Supervised learning is a machine learning mode, is often used in a scene with sufficient data, can learn a function (model parameters) from a given training data set, and can predict a result according to the function when new data comes.
The training requirement of supervised learning comprises input and output, targets in a training set are labeled by people, and a training sample set consists of samples with labels.
Specifically, an optimal model is obtained through training of an existing training sample, namely known data and corresponding output of the known data, all input is mapped into corresponding output by the optimal model, and the output is simply judged so as to achieve the purpose of classification.
In the embodiment of the invention, the labeled question has a changeable structure and expresses question answer pairs with high diversity as the optimal model.
The deep learning technology has a good effect in various tasks such as text classification, sequence labeling, machine translation and the like. In embodiments of the present invention, answers in candidate question-answer pairs may be generated by a user question through a depth-generating model.
Specifically, the depth generation model used in the embodiment of the present invention adopts a Sequence-to-Sequence (Sequence to Sequence) depth generation model.
Wherein, the Sequence to Sequence (Sequence to Sequence) model can translate one language Sequence into another language Sequence, and the whole process is to map one Sequence as output to another output Sequence by using a deep neural network.
Specifically, the deep neural network is LSTM (long short term memory network) or RNN (recurrent neural network).
S13: and scoring the candidate question-answer pairs and screening to obtain the question-answer pairs with scores higher than a set threshold value.
Specifically, referring to fig. 5, fig. 5 is a schematic flow chart of an embodiment of step 13 in fig. 1, in which the method includes:
s511: and (4) adopting a scoring method based on a rapid text classification model to score the candidate question answers for classification.
In the embodiment of the invention, the fast text classification model is trained in advance in a supervised learning mode, and then the trained classification model is applied to the classification scoring of the candidate question-answer pairs.
Specifically, training data are obtained through a question-answer pair generation method, then the training data are reviewed through a manual review process, the output class of the question-answer pair with the highest matching degree is labeled as 1, correspondingly, the output class of the question-answer pair with the lowest matching degree is labeled as 0, the question-answer pair in the training data is used as model input, the score of the matching degree of the question-answer pair is used as model output, and a fastText classification model is trained.
And further, classifying and scoring the generated candidate question-answer pairs by using the trained fastText classification model, and outputting scoring results of the candidate question-answer pairs.
S512: and obtaining the question-answer pairs with the scores higher than the set threshold value by adopting a question-answer pair screening method based on sorting filtering according to the scoring result.
In the embodiment of the invention, the candidate question-answer pairs are sorted from high to low according to the score value, a score threshold value is preset, the candidate question-answer pairs with the score value lower than the set threshold value are filtered, and the final question-answer pairs are screened out to construct the high-quality question-answer pairs.
Optionally, the score threshold is set to 0.8-0.9.
In a specific implementation scenario, if the score threshold is set to 0.9, the candidate question-answer pairs with output scores higher than 0.9 are retained, and the candidate question-answer pairs with output scores lower than 0.9 are deleted.
In another specific implementation scenario, if the score threshold is set to 0.8, the candidate question-answer pairs with output scores higher than 0.8 are retained, and the candidate question-answer pairs with output scores lower than 0.8 are deleted.
Referring to fig. 6, fig. 6 is a schematic diagram of a framework of an embodiment of a question-answer pair construction system according to the present invention, where the question-answer pair construction system includes an input module 61, a preprocessing module 62, a judging module 63, an extracting module 64, a candidate question-answer pair generating module 65, a scoring module 66, a screening module 67, and an output module 68.
An input module 61 for inputting a text document.
After the input module 61 obtains the original text document, the text paragraphs are further segmented.
And the preprocessing module 62 is connected with the input module 61 and is used for preprocessing the segmented text paragraphs.
The preprocessing comprises at least one of abnormal character removal, case and case conversion, simplified and traditional body conversion, sentence breaking and word segmentation, part of speech tagging and sentence simplification.
And the judging module 63 is connected with the preprocessing module 62 and is used for judging whether the sentences in the preprocessed text paragraphs have potential question-answer knowledge.
In an embodiment of the present invention, the step of determining, by the determining module 63, whether there is potential question-answering knowledge in the sentences in the preprocessed text paragraphs includes: and performing syntactic dependency analysis on the preprocessed sentence, outputting the dependency relationship among words in the sentence through the syntactic dependency analysis, extracting a main stem of the sentence according to the dependency relationship, and finally judging whether potential question-answer knowledge exists in the sentence through the main stem.
And the extracting module 64 is connected to the judging module 63 and is configured to extract the sentences of the identified potential question-answer pairs in the text paragraphs.
The candidate question-answer pair generating module 65 is connected to the extracting module 64, and is configured to sort the sentences of the potential question-answer pairs and generate candidate question-answer pairs.
In the embodiment of the present invention, the step of generating the candidate question-answer pair by the candidate question-answer pair generating module 65 specifically includes: simplifying the sentences of the potential question-answer pairs, carrying out entity recognition on the simplified sentences, extracting the entities of the sentences, constructing question-answer pairs based on the question templates, and finally training a deep generation model through a supervised learning mode to rewrite the question-answer pairs.
And the scoring module 66 is connected with the candidate question-answer pair generating module 65 and is used for training the rapid text classification model to score the candidate question-answer pairs.
And the screening module 67 is connected with the scoring module 66 and is used for screening out question-answer pairs with the scores higher than the set threshold value through the sorting filter.
And the output module 68 is connected with the screening module 67 and is used for outputting question-answer pairs with scores higher than the set threshold value.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of a question-answer pair constructing device according to the present invention, which includes a processor 71 and a memory 72.
Processor 71 is configured to execute program instructions stored in memory 72 to implement the steps of the question-answer pair construction method described in any of the above-described method embodiments.
Specifically, the processor 71 is configured to control itself and the memory 72 to implement the specific steps in the question-answer pair construction method described in any one of the above-mentioned method embodiments. The processor 71 may also be referred to as a CPU (Central processing unit). The processor 71 may be an integrated circuit chip having signal processing capabilities. The Processor 71 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 71 may be commonly implemented by a plurality of integrated circuit chips.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
The computer-readable storage medium 80 includes a computer program 801 stored on the computer-readable storage medium 80, and when executed by the processor, the computer program 801 implements the specific steps in the question-answer pair construction method described in any one of the above method embodiments.
In particular, the integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium 80. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a computer-readable storage medium 80 and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned computer-readable storage medium 80 includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the several embodiments provided in the present application, it should be understood that the disclosed method, system, and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A question-answer pair construction method is characterized by comprising the following steps:
extracting sentences of potential question-answer pairs in the text paragraphs;
sorting the sentences of the potential question-answer pairs to generate candidate question-answer pairs;
and scoring the candidate question-answer pairs and screening to obtain the question-answer pairs with the scores higher than a set threshold value.
2. The question-answer pair construction method according to claim 1, characterized by comprising, before the step of extracting sentences of potential question-answer pairs in a text passage, the steps of:
extracting text paragraphs from an input original text document, and performing segmentation processing on the text paragraphs by adopting a segmentation method;
and performing text preprocessing on the text paragraphs after the segmentation processing.
3. The question-answer pair construction method according to claim 1, wherein in the step of extracting sentences of potential question-answer pairs in a text passage, the method specifically comprises:
performing syntactic dependency analysis on sentences in the text paragraphs and outputting dependency relationships among words in the sentences;
extracting a main stem of the sentence according to the dependency relationship;
judging whether potential question-answer knowledge exists in the main stem of the sentence;
and when the judgment result is yes, extracting the sentence as a sentence of the potential question-answer pair.
4. The question-answer pair construction method according to claim 1, wherein in the step of sorting the sentences of the potential question-answer pairs to generate candidate question-answer pairs, the method specifically comprises:
simplifying the sentences of the potential question-answer pairs;
performing entity recognition on the simplified sentence;
extracting entity construction question answer pairs of the sentences;
and rewriting the question answer pair by adopting a question-answer pair rewriting method based on a depth generation model to obtain the candidate question-answer pair.
5. The question-answer pair construction method according to claim 1, wherein in the step of scoring and screening the candidate question-answer pairs to obtain question-answer pairs with scores higher than a set threshold, the method specifically comprises:
grading the candidate question answers by adopting a grading method based on a rapid text classification model;
and obtaining the question-answer pairs with the scores higher than a set threshold value by adopting a screening method based on sorting filtering according to the scoring results of the candidate question-answer pairs.
6. A question-answer pair construction system, comprising:
the extraction module is used for extracting sentences of potential question-answer pairs in the text paragraphs;
the candidate question-answer pair generating module is used for sorting the sentences of the potential question-answer pairs and generating candidate question-answer pairs;
the scoring module is used for training a rapid text classification model to score the candidate question answers for classification;
and the screening module is used for screening out question-answer pairs with the scores higher than a set threshold value through the sorting filter.
7. The question-answer pair construction system according to claim 6, characterized by further comprising:
the input module is used for inputting a text document;
and the preprocessing module is used for preprocessing the text paragraphs.
8. The question-answer pair construction system according to claim 7, characterized in that the question-answer pair construction system further comprises:
the judging module is used for judging whether the sentences in the text paragraphs have potential question-answer knowledge or not;
and the output module is used for outputting the question-answer pairs with the scores higher than the set threshold value.
9. A question-answer pair construction apparatus comprising:
a memory for storing program data which, when executed, implements the steps in the question-answer pair construction method according to any one of claims 1 to 5;
a processor for executing the program instructions stored by the memory to implement the steps in the question-answer pair construction method according to any one of claims 1 to 5.
10. A computer-readable storage medium, having a computer program stored thereon, which, when being executed by a processor, implements the steps in the question-answer pair construction method according to any one of claims 1 to 5.
CN201911349116.5A 2019-12-24 2019-12-24 Question-answer pair construction method, system, device and computer readable storage medium Pending CN111143531A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911349116.5A CN111143531A (en) 2019-12-24 2019-12-24 Question-answer pair construction method, system, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911349116.5A CN111143531A (en) 2019-12-24 2019-12-24 Question-answer pair construction method, system, device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111143531A true CN111143531A (en) 2020-05-12

Family

ID=70519704

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911349116.5A Pending CN111143531A (en) 2019-12-24 2019-12-24 Question-answer pair construction method, system, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111143531A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858880A (en) * 2020-06-18 2020-10-30 北京百度网讯科技有限公司 Method and device for obtaining query result, electronic equipment and readable storage medium
CN111914062A (en) * 2020-07-13 2020-11-10 上海乐言信息科技有限公司 Long text question-answer pair generation system based on keywords
CN111949779A (en) * 2020-07-29 2020-11-17 交控科技股份有限公司 Intelligent rail transit response method and system based on knowledge graph
CN112347226A (en) * 2020-11-06 2021-02-09 平安科技(深圳)有限公司 Document knowledge extraction method and device, computer equipment and readable storage medium
CN113901793A (en) * 2021-12-08 2022-01-07 北京来也网络科技有限公司 Event extraction method and device combining RPA and AI

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108231059A (en) * 2017-11-27 2018-06-29 北京搜狗科技发展有限公司 Treating method and apparatus, the device for processing
CN109933661A (en) * 2019-04-03 2019-06-25 上海乐言信息科技有限公司 It is a kind of that the semi-supervised question and answer of model are generated to inductive method and system based on depth
CN110110054A (en) * 2019-03-22 2019-08-09 北京中科汇联科技股份有限公司 A method of obtaining question and answer pair in the slave non-structured text based on deep learning
CN110532369A (en) * 2019-09-04 2019-12-03 腾讯科技(深圳)有限公司 A kind of generation method of question and answer pair, device and server
CN110532348A (en) * 2019-09-04 2019-12-03 网易(杭州)网络有限公司 Question and answer are to the generation method of data, device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108231059A (en) * 2017-11-27 2018-06-29 北京搜狗科技发展有限公司 Treating method and apparatus, the device for processing
CN110110054A (en) * 2019-03-22 2019-08-09 北京中科汇联科技股份有限公司 A method of obtaining question and answer pair in the slave non-structured text based on deep learning
CN109933661A (en) * 2019-04-03 2019-06-25 上海乐言信息科技有限公司 It is a kind of that the semi-supervised question and answer of model are generated to inductive method and system based on depth
CN110532369A (en) * 2019-09-04 2019-12-03 腾讯科技(深圳)有限公司 A kind of generation method of question and answer pair, device and server
CN110532348A (en) * 2019-09-04 2019-12-03 网易(杭州)网络有限公司 Question and answer are to the generation method of data, device and electronic equipment

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858880A (en) * 2020-06-18 2020-10-30 北京百度网讯科技有限公司 Method and device for obtaining query result, electronic equipment and readable storage medium
CN111858880B (en) * 2020-06-18 2024-01-26 北京百度网讯科技有限公司 Method, device, electronic equipment and readable storage medium for obtaining query result
CN111914062A (en) * 2020-07-13 2020-11-10 上海乐言信息科技有限公司 Long text question-answer pair generation system based on keywords
CN111914062B (en) * 2020-07-13 2021-04-06 上海乐言科技股份有限公司 Long text question-answer pair generation system based on keywords
CN111949779A (en) * 2020-07-29 2020-11-17 交控科技股份有限公司 Intelligent rail transit response method and system based on knowledge graph
CN112347226A (en) * 2020-11-06 2021-02-09 平安科技(深圳)有限公司 Document knowledge extraction method and device, computer equipment and readable storage medium
CN112347226B (en) * 2020-11-06 2023-05-26 平安科技(深圳)有限公司 Document knowledge extraction method, device, computer equipment and readable storage medium
CN113901793A (en) * 2021-12-08 2022-01-07 北京来也网络科技有限公司 Event extraction method and device combining RPA and AI

Similar Documents

Publication Publication Date Title
CN109726293B (en) Causal event map construction method, system, device and storage medium
CN111143531A (en) Question-answer pair construction method, system, device and computer readable storage medium
CN104503998B (en) For the kind identification method and device of user query sentence
CN110765257A (en) Intelligent consulting system of law of knowledge map driving type
CN107943911A (en) Data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing
CN114036281B (en) Knowledge graph-based citrus control question-answering module construction method and question-answering system
CN110727796A (en) Multi-scale difficulty vector classification method for graded reading materials
CN112541337B (en) Document template automatic generation method and system based on recurrent neural network language model
CN107301163A (en) Text semantic analysis method and device comprising formula
CN112052324A (en) Intelligent question answering method and device and computer equipment
CN113505209A (en) Intelligent question-answering system for automobile field
CN114218379B (en) Attribution method for question answering incapacity of intelligent question answering system
CN110717045A (en) Letter element automatic extraction method based on letter overview
CN115292457A (en) Knowledge question answering method and device, computer readable medium and electronic equipment
CN108536673B (en) News event extraction method and device
CN114372153A (en) Structured legal document warehousing method and system based on knowledge graph
CN111814476A (en) Method and device for extracting entity relationship
CN113361252B (en) Text depression tendency detection system based on multi-modal features and emotion dictionary
CN112380848B (en) Text generation method, device, equipment and storage medium
CN113157887A (en) Knowledge question-answering intention identification method and device and computer equipment
CN112667819A (en) Entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition method and device
CN115017271B (en) Method and system for intelligently generating RPA flow component block
KR101506757B1 (en) Method for the formation of an unambiguous model of a text in a natural language
CN113742469B (en) Method for constructing question-answering system based on Pipeline processing and ES storage
CN116483314A (en) Automatic intelligent activity diagram generation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200512

RJ01 Rejection of invention patent application after publication