CN113836296A - Method, device, equipment and storage medium for generating Buddhist question-answer abstract - Google Patents

Method, device, equipment and storage medium for generating Buddhist question-answer abstract Download PDF

Info

Publication number
CN113836296A
CN113836296A CN202111146330.8A CN202111146330A CN113836296A CN 113836296 A CN113836296 A CN 113836296A CN 202111146330 A CN202111146330 A CN 202111146330A CN 113836296 A CN113836296 A CN 113836296A
Authority
CN
China
Prior art keywords
answer
buddhist
question
paragraph
asked
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111146330.8A
Other languages
Chinese (zh)
Inventor
杜江楠
李剑锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202111146330.8A priority Critical patent/CN113836296A/en
Publication of CN113836296A publication Critical patent/CN113836296A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a method, a device, equipment and a storage medium for generating a Buddhist question-answer abstract, wherein the method comprises the following steps: acquiring a plurality of answer documents corresponding to the Buddhist question asked by the user, and extracting an answer section related to the Buddhist question asked from each answer document; each answer paragraph is divided, and sentences obtained by dividing the sentences are added into a candidate sentence set; determining topic keywords belonging to the field of Buddhism based on historical Buddhism questions asked by a user; determining semantic similarity of the sentences relative to the question Buddhist question according to the topic keywords aiming at each sentence in the candidate sentence set; and selecting sentences from the candidate sentence set according to the semantic similarity to generate the Buddhist answer abstract. The extracted Buddhist answer abstract considers the correlation between the questions and the answers and the attention points of the users, can well meet the personalized requirements of different users, can improve the reading efficiency of the users, and reduces the reading threshold.

Description

Method, device, equipment and storage medium for generating Buddhist question-answer abstract
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method, a device, equipment and a storage medium for generating a Buddhist question-answer abstract.
Background
With the development of science and technology, Artificial Intelligence (AI) gradually enters the industrial, commercial and living fields. Artificial intelligence is a theory, method and technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.
The intelligent question-answering based on human-computer interaction is an important application direction of artificial intelligence, is widely applied to consultation scenes in various fields, and needs to search answers according to questions in more and more scenes along with the development of internet technology. For example, in a Buddhist question-and-answer scenario, a user may consult an online robot for Buddhist questions, the robot searching for appropriate answers in the question-and-answer data (including questions and answers) in response to the user.
However, the length of the Buddhist question-answer data is relatively long, and one Buddhist question usually corresponds to a plurality of answer documents, so that the answer data required to be responded to the user is many, the user needs to spend much time to read, and the use experience is poor.
Disclosure of Invention
The present invention provides a method, an apparatus, a device and a storage medium for generating a Buddhist question-answer abstract, which aims to overcome the defects of the prior art, and the present invention is achieved by the following technical solutions.
The first aspect of the present invention provides a method for generating a Buddhist question-answer abstract, wherein the method comprises:
acquiring a plurality of answer documents corresponding to the Buddhist questions asked by the user, and extracting answer paragraphs related to the Buddhist questions asked from each answer document;
each answer paragraph is divided, and sentences obtained by dividing the sentences are added into a candidate sentence set;
determining topic keywords belonging to the field of Buddhism based on historical Buddhism questions asked by the user;
aiming at each sentence in the candidate sentence set, determining the semantic similarity of the sentence relative to the question Buddhist question according to the topic keywords;
and selecting sentences from the candidate sentence set according to the semantic similarity to generate the Buddha answer abstract.
In some embodiments of the present application, said extracting from each answer document an answer paragraph related to the asked Buddhist question, comprises:
for each answer document, predicting the paragraph beginning probability and the paragraph ending probability corresponding to each word in the answer document according to the answer document and the questioning Buddhism question; and determining the answer paragraphs related to the asked Buddhist questions in the answer document according to the paragraph beginning probability and the paragraph ending probability corresponding to each word.
In some embodiments of the present application, the determining answer paragraphs in the answer document that are related to the questioned Buddhist question according to the paragraph beginning probability and the paragraph end probability corresponding to each word includes:
for each word in the answer document, acquiring a first difference value obtained by subtracting a corresponding paragraph ending probability from a paragraph starting probability corresponding to the word, and acquiring a second difference value obtained by subtracting the corresponding paragraph starting probability from a paragraph ending probability corresponding to the word; selecting a word corresponding to the largest first difference value from the answer document as the beginning position of the answer paragraph, and selecting a word corresponding to the largest second difference value as the end position of the answer paragraph; and taking the words positioned between the initial position and the end position in the answer document as answer sections relevant to the asked Buddhist question.
In some embodiments of the present application, the determining the topic keywords belonging to the field of Buddhism based on the historical Buddhism questions asked by the user includes:
acquiring historical Buddhist questions asked by the user; inputting the obtained historical Buddhist question into a trained multi-label classification model, and marking a category label related to Buddhist for the user according to the historical Buddhist question by the multi-label classification model; and in the preset corresponding relation between the category label and the subject keyword, searching the subject keyword contained in the category label as the subject keyword belonging to the field of Buddhism.
In some embodiments of the present application, the process of configuring the correspondence between the category label and the topic keyword includes:
collecting various historical Buddhist questions, and performing topic distribution calculation by using the collected various historical Buddhist questions to obtain topic keywords contained in each topic; determining category labels contained in the multi-label classification model, outputting a prompt by using a topic keyword contained in each topic and the determined category labels, and configuring the topic keyword for each category label by a user; and receiving the topic keywords configured by the user for each category label.
In some embodiments of the present application, selecting sentences from the candidate sentence set and generating a Buddhist answer summary according to semantic similarity, comprises:
sequencing sentences in the candidate sentence set according to the sequence of the semantic similarity from high to low to obtain a sequencing result; and selecting sentences from the sequencing result to generate a Buddha answer abstract according to the preset abstract length.
A second aspect of the present invention provides a device for generating a buddha question-answer abstract, the device comprising:
the answer paragraph extraction module is used for acquiring a plurality of answer documents corresponding to the Buddhist questions asked by the user and extracting the answer paragraphs related to the Buddhist questions asked from each answer document;
the candidate sentence set determining module is used for carrying out sentence segmentation on each answer paragraph and adding sentences obtained by the sentence segmentation into the candidate sentence set;
the theme keyword determining module is used for determining theme keywords belonging to the field of Buddhism based on historical Buddhism questions asked by the user;
the matching module is used for determining the semantic similarity of each sentence in the candidate sentence set relative to the question Buddhist question according to the topic keywords;
and the abstract generating module is used for selecting sentences from the candidate sentence set according to the semantic similarity to generate the Buddhist answer abstract.
A third aspect of the present invention proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect when executing the program.
A fourth aspect of the present invention proposes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the method according to the first aspect as described above.
Based on the method and the device for generating the Buddhist question-answer abstract in the first aspect and the second aspect, the invention has at least the following beneficial effects or advantages:
extracting answer paragraphs relevant to the Buddhist question from each answer document of a plurality of answer documents, segmenting each answer paragraph to form a candidate sentence set, wherein the sentences in the set are answers relevant to the Buddhist question, further obtaining some topic keywords interested by a user based on historical Buddhist questions asked by the user, determining semantic similarity between the Buddhist question and the answer sentences according to the topic keywords, finally selecting the answer sentences with higher semantic similarity to generate a Buddhist answer abstract, thus obtaining the Buddhist answer abstract through a series of abstractions, considering the relevance between the questions and the answers and the attention points of the user, being capable of well meeting the individual requirements of different users, and because the Buddhist answer abstract is obtained through simplification and extraction, therefore, the reading efficiency of the user can be improved, and the reading threshold is reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart illustrating an embodiment of a method for generating a Buddhist question-answer summary according to an exemplary embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating an answer paragraph extraction process according to the embodiment of FIG. 1;
FIG. 3 is a schematic diagram illustrating a process of determining topic keywords belonging to the field of Buddhism according to the embodiment of FIG. 1;
FIG. 4 is a schematic structural diagram illustrating an apparatus for generating a Buddhist question-answer summary according to an exemplary embodiment of the present invention;
FIG. 5 is a diagram illustrating a hardware configuration of an electronic device according to an exemplary embodiment of the present invention;
fig. 6 is a schematic diagram illustrating a structure of a storage medium according to an exemplary embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The question-answering system is an important component of the field of artificial intelligence, particularly Natural Language Processing (NLP), and the requirement of answering professional knowledge by the question-answering system in the vertical field of Buddhism is more and more at present.
The Buddhism knowledge question-answer data has two problems, the first is that the length of the Buddhism question-answer data is relatively long, and certain threshold requirements are required for reading by the laymen due to high specialty; secondly, Buddhism belongs to the category of philosophy, and more than one answer is often provided for one question, so that the situation that one question corresponds to a plurality of documents exists, the attention points of different questioners are different, and therefore personalized answers need to be provided for different users. Combining these two points, personalized abstract of answer is the problem to be solved.
In order to solve the technical problems, the invention provides a method for generating a Buddhist question-answer abstract, which comprises the steps of extracting answer paragraphs relevant to a Buddhist question to be asked from each answer document of a plurality of answer documents, dividing each answer paragraph into sentences to form a candidate sentence set, wherein the sentences in the set are answers relevant to the Buddhist question to be asked, further obtaining some topic keywords interested by a user based on historical Buddhist questions asked by the user, determining semantic similarity between the Buddhist question and the answer sentence according to the topic of the keywords, and finally selecting an answer with higher semantic similarity to generate the Buddhist question-answer abstract, so that the Buddhist answer abstract obtained through a series of abstractions takes the relevance between the question and the answer into consideration and the attention point of the user, and can well meet the individual requirements of different users, and because the Buddhist answer abstract is obtained by simplification and refinement, the reading efficiency of the user can be improved, and the reading threshold is reduced.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
The first embodiment is as follows:
fig. 1 is a flowchart illustrating an embodiment of a method for generating a summary of a Buddhism question and answer according to an exemplary embodiment of the present invention, where the method for generating a summary of a Buddhism question and answer may be applied to a computer device, and the computer device may be a terminal device, a mobile terminal, a PC, a server, or the like, as illustrated in fig. 1, and the method for generating a summary of a Buddhism question and answer includes the following steps:
step 101: and acquiring a plurality of answer documents corresponding to the Buddhist questions asked by the user.
Wherein, for the Buddhist question-answer data, usually one question corresponds to multiple correct answers. The obtaining process of the answer document corresponding to the question may be implemented by using a related technology, for example, by using a database matching technology, which is not specifically limited in the present invention.
Step 102: an answer passage related to the asked Buddhist question is extracted from each answer document.
In this embodiment, since each answer document of the question is long, each redundant answer document is refined by extracting the answer segment most relevant to the question from each answer document.
For the extraction process of the answer paragraphs, reference may be made to the following description of the embodiments, and the present invention will not be described in detail herein.
Step 103: and carrying out sentence segmentation on each answer paragraph, and adding sentences obtained by sentence segmentation into the candidate sentence set.
By segmenting the answer paragraphs, each answer can be split into a plurality of sentences, and the sentences all belong to candidate sentences related to the question.
It should be noted that, in order to further simplify the answers to the questions, the similarity of each sentence in the candidate sentence set in the set may be calculated, and the sentences whose similarity exceeds a certain value may be removed.
Alternatively, the term frequency of each sentence may be calculated as the similarity using a TF-IDF (term frequency-inverse document frequency) method.
Step 104: the topic keywords belonging to the field of Buddhism are determined based on historical Buddhism questions asked by the user.
The topic keywords determined by the historical Buddhist questions asked by the user are words that the user is interested in, such as Buddhist scriptures, Buddhist senses, joys, sadness, repairment and the like.
It should be noted that, for the process of determining the topic keywords belonging to the field of Buddhism based on the historical Buddhism questions asked by the user, reference may be made to the following description of the embodiments, and the detailed description of the present invention is omitted here.
It should be further noted that, if there is no historical Buddhist question asked by the user, one answer paragraph may be directly selected from the extracted answer paragraphs and pushed to the user.
Step 105: and aiming at each sentence in the candidate sentence set, determining the semantic similarity of the sentence relative to the question Buddhist question according to the topic keywords.
In an alternative embodiment, the topic keywords, the sentences and the asked Buddhist questions are input into a trained matching model, so that the matching model determines the semantic similarity of the sentences relative to the asked Buddhist questions according to the topic keywords.
The semantic similarity output by the matching model is a scoring result determined under the condition of considering the topic keywords, and the semantic similarity can reflect the attention point of the user and simultaneously represents the matching degree of the answer sentence and the question.
In particular, the matching model may employ a keyword-based bert model, i.e., a keyword-bert model.
The following describes the training process of the model by taking the keyword-bert model as an example:
collecting a sample answer with high similarity with the question as a positive sample, collecting a sample answer with low similarity with the question as a negative sample and a subject term in the field of Buddhism, and training a keyword-bert model by using the collected positive sample, negative sample and subject term until the model converges.
In the training process, a triple loss function is used for calculating a loss value, and the loss value calculation formula is as follows:
L=max(margind+d(a,n)-d(a,p),0)
where a is the question, n is a negative sample, p is a positive sample, margin is a boundary value, d (a, n) is the prediction score of the negative sample, and d (a, p) is the prediction score of the positive sample.
It can be understood by those skilled in the art that the determination method of semantic similarity of sentences to the question of Buddhism is not particularly limited, and the keyword-bert model given above is only an exemplary illustration, and other determination methods are within the scope of the present application.
Step 106: and selecting sentences from the candidate sentence set according to the semantic similarity to generate the Buddha answer abstract.
In an optional embodiment, the sentences in the candidate sentence set are sorted according to the sequence of semantic similarity from high to low, and sentences meeting preset conditions are extracted from the sorting result according to a preset digest length to generate a Buddhist answer digest.
The preset condition refers to that the sentence with high semantic similarity is selected from the candidate sentence set by controlling the length of the generated abstract, so that the length of the abstract is ensured, and the sentence which is most matched with the problem can be extracted.
To this end, the process shown in fig. 1 is completed, the answer paragraphs related to the questioned Buddhist question are extracted from each answer document of the multiple answer documents, each answer paragraph is divided into sentences to form a candidate sentence set, the sentences in the set are all answers related to the questioned Buddhist question, some topic keywords which are interesting to the user are further obtained based on the historical Buddhist question asked by the user, the semantic similarity between the questioned Buddhist question and the answer sentences is determined according to the topic keywords, finally, the answer sentences with higher semantic similarity are selected to generate the Buddhist answer abstract, thus, the Buddhist answer abstract obtained through a series of abstractions considers the correlation between the questions and the answers and the attention points of the user, can well meet the individual requirements of different users, and the Buddhist answer abstract is obtained through simplification, therefore, the reading efficiency of the user can be improved, and the reading threshold is reduced.
Example two:
fig. 2 is a schematic diagram illustrating an extraction flow of answer paragraphs according to an exemplary embodiment of the present invention, based on the embodiment illustrated in fig. 1, in the step 102, a process for extracting an answer paragraph related to a question of Buddhism from each answer document includes the following steps:
step 201: and for each answer document, predicting the paragraph beginning probability and the paragraph ending probability corresponding to each word in the answer document according to the answer document and the questioning Buddha question.
The paragraph beginning probability and the paragraph ending probability of each word have no correlation, and the value ranges are all 0-1.
In an alternative embodiment, the answer document and the questioned Buddhist question may be input into a trained reading understanding model, so that the reading understanding model respectively predicts and outputs the paragraph beginning probability and the paragraph end probability corresponding to each word in the answer document.
The reading understanding model can be realized by adopting a roberta model, namely, the document and the problem are understood through a bert layer in the model, and then the paragraph beginning probability and the paragraph ending probability of each word in the document are predicted through a sequence marking layer in the model according to the output of the bert layer.
Before step 201 is executed, the roberta model needs to be fine-tuned by using the Buddha data, that is, by collecting various Buddha questions and answer documents of each Buddha question, and marking the answer documents with labels at the beginning and the end of an answer paragraph, the roberta model is trained by using the Buddha questions and the corresponding labeled answer documents until convergence.
In the training process, a cross entropy loss function is used for calculating a loss value, and the loss value calculation formula is as follows:
Figure BDA0003285531120000081
where M is the word length of the actual output of the roberta model, pijIs the prediction probability that the ith word is the beginning or end of a paragraph, yijIs the label that the ith word is the beginning or end of a paragraph.
It will be understood by those skilled in the art that the present application is not limited to the method for predicting the paragraph beginning probability and the paragraph ending probability, and the above-mentioned implementation of the reading comprehension model to predict the paragraph beginning probability and the paragraph ending probability of each word in the answer document is only an exemplary illustration, and other prediction methods are also within the scope of the present application.
Step 202: and determining the answer paragraphs related to the asked Buddhist questions in the answer document according to the paragraph beginning probability and the paragraph ending probability corresponding to each word.
In an alternative embodiment, for each word in the answer document, a first difference obtained by subtracting a corresponding paragraph ending probability from a paragraph starting probability corresponding to the word and a second difference obtained by subtracting a corresponding paragraph starting probability from a paragraph ending probability corresponding to the word may be obtained, and then a word corresponding to the largest first difference is selected from the answer document as a starting position of the answer paragraph, and a word corresponding to the largest second difference is selected as an ending position of the answer paragraph, and a word located between the starting position and the ending position in the answer document is used as an answer paragraph related to the proposed Buddhist question.
Wherein a first difference of the paragraph start probability minus the paragraph end probability represents the degree to which the paragraph starts. Further, a second difference of the paragraph end probability minus the paragraph start probability represents the degree of belonging to the paragraph end.
In another alternative embodiment, the word corresponding to the maximum value may also be selected from the paragraph beginning probabilities as the beginning position of the answer paragraph, the word corresponding to the maximum value may be selected from the paragraph end probabilities as the end position of the answer paragraph, and the word between the beginning position and the end position in the answer document may be used as the answer paragraph related to the questioning and Buddhist question.
To this end, the extraction process of the answer paragraph shown in fig. 2 is completed, and whether to perform binary probability prediction on each word in the answer document by using the reading understanding model is performed, so that the extraction accuracy of the answer paragraph can be improved.
Example three:
fig. 3 is a schematic flow chart illustrating a process of determining a topic keyword belonging to the field of Buddhism according to the embodiment shown in fig. 1, based on the embodiments shown in fig. 1 to fig. 2, in step 104, a process of determining a topic keyword belonging to the field of Buddhism for a historical Buddhism question asked by a user includes the following steps:
step 301: and acquiring historical Buddhist questions asked by the user.
Wherein, the historical Buddhism question of the user can reflect the historical behavior of the user.
Step 302: inputting the obtained historical Buddhist questions into a trained multi-label classification model, and marking category labels related to Buddhist for the user according to the historical Buddhist questions by the multi-label classification model.
The category labels output by the multi-label classification model belong to category labels in the field of Buddhism, such as textbook, fill-in interest, emotional tendency and the like.
Step 303: and in the preset corresponding relation between the category label and the subject keyword, searching the subject keyword contained in the category label as the subject keyword belonging to the field of Buddhism.
Before step 303 is executed, it is necessary to establish a correspondence between each category label and the topic keyword in advance, and the establishing process includes: collecting various historical Buddhist questions, performing topic distribution calculation by using the collected various historical Buddhist questions to obtain topic keywords contained in each topic, determining category labels contained in the multi-label classification model, outputting prompts for the topic keywords contained in each topic and the determined category labels, and configuring the topic keywords for each category label by a user, so that the topic keywords configured for each category label by the user can be received.
Optionally, an LDA algorithm may be used to perform topic distribution calculation to generate some topic keywords for each topic.
To this end, the above-mentioned process of determining the topic keywords belonging to the field of Buddhism shown in fig. 3 is completed, some historical Buddhism questions asked by the user are input into the multi-tag classification model, and different category tags are marked on the historical behaviors of the user by the multi-tag classification model, so that the topic keywords belonging to the field of Buddhism are obtained by searching the corresponding relationship between the pre-established category tags and the topic keywords.
Corresponding to the embodiment of the method for generating the Buddhist question-answer abstract, the invention also provides an embodiment of a device for generating the Buddhist question-answer abstract.
Fig. 4 is a schematic structural diagram of a device for generating a forskohlist summary according to an exemplary embodiment of the present invention, the device is used for executing a method for generating a forskohlist summary provided in any of the above embodiments, as shown in fig. 4, the device for generating a forskohlist summary includes:
an answer paragraph extraction module 410, configured to obtain multiple answer documents corresponding to the buddhist questions asked by the user, and extract an answer paragraph related to the buddhist questions asked from each answer document;
a candidate sentence set determining module 420, configured to perform sentence segmentation on each answer paragraph, and add a sentence obtained by the sentence segmentation to a candidate sentence set;
a topic keyword determination module 430, configured to determine topic keywords belonging to the field of Buddhism based on historical Buddhism questions asked by the user;
a matching module 440, configured to determine, for each sentence in the candidate sentence set, a semantic similarity of the sentence with respect to the question of Buddhist question according to the topic keyword;
the abstract generating module 450 is configured to select a sentence from the candidate sentence set according to the semantic similarity to generate a Buddhist answer abstract.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
The embodiment of the invention also provides electronic equipment corresponding to the method for generating the Buddhist question-answer abstract provided by the embodiment, so as to execute the method for generating the Buddhist question-answer abstract.
Fig. 5 is a hardware block diagram of an electronic device according to an exemplary embodiment of the present invention, the electronic device including: a communication interface 601, a processor 602, a memory 603, and a bus 604; the communication interface 601, the processor 602 and the memory 603 communicate with each other via a bus 604. The processor 602 may execute the above-described method for generating a Buddhism question-answer abstract by reading and executing machine-executable instructions corresponding to the control logic of the method for generating a Buddhism question-answer abstract in the memory 603, and the details of the method are described in the above embodiments and will not be described again here.
The memory 603 referred to in this disclosure may be any electronic, magnetic, optical, or other physical storage device that can contain stored information, such as executable instructions, data, and so forth. Specifically, the Memory 603 may be a RAM (Random Access Memory), a flash Memory, a storage drive (e.g., a hard disk drive), any type of storage disk (e.g., an optical disk, a DVD, etc.), or similar storage medium, or a combination thereof. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 601 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
Bus 604 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 603 is used for storing a program, and the processor 602 executes the program after receiving the execution instruction.
The processor 602 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 602. The Processor 602 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
The electronic device provided by the embodiment of the application and the generation method of the Buddhist question-answer abstract provided by the embodiment of the application have the same inventive concept and the same beneficial effects as the method adopted, operated or realized by the electronic device.
The embodiment of the present application further provides a computer-readable storage medium corresponding to the method for generating the summaries of the buddhist questions and answers provided by the foregoing embodiment, please refer to fig. 6, which illustrates the computer-readable storage medium as an optical disc 30, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program will execute the method for generating the summaries of the buddhist questions and answers provided by any of the foregoing embodiments.
It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.
The computer-readable storage medium provided by the above-mentioned embodiment of the present application and the method for generating the Buddhist question-answer abstract provided by the embodiment of the present application have the same inventive concept, and have the same beneficial effects as the method adopted, operated or implemented by the application program stored in the computer-readable storage medium.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for generating a Buddhist question-answer abstract, which is characterized by comprising the following steps:
acquiring a plurality of answer documents corresponding to the Buddhist questions asked by the user, and extracting answer paragraphs related to the Buddhist questions asked from each answer document;
each answer paragraph is divided, and sentences obtained by dividing the sentences are added into a candidate sentence set;
determining topic keywords belonging to the field of Buddhism based on historical Buddhism questions asked by the user;
aiming at each sentence in the candidate sentence set, determining the semantic similarity of the sentence relative to the question Buddhist question according to the topic keywords;
and selecting sentences from the candidate sentence set according to the semantic similarity and generating the Buddhist answer abstract.
2. The method of claim 1, wherein extracting answer paragraphs from each answer document that are relevant to the asked Buddhism question comprises:
for each answer document, predicting the paragraph beginning probability and the paragraph ending probability corresponding to each word in the answer document according to the answer document and the questioning Buddhism question;
and determining the answer paragraphs related to the asked Buddhist questions in the answer document according to the paragraph beginning probability and the paragraph ending probability corresponding to each word.
3. The method according to claim 2, wherein determining the answer paragraphs in the answer document that are related to the asked Buddhist question according to the paragraph beginning probability and the paragraph end probability corresponding to each word comprises:
for each word in the answer document, acquiring a first difference value obtained by subtracting a corresponding paragraph ending probability from a paragraph starting probability corresponding to the word, and acquiring a second difference value obtained by subtracting the corresponding paragraph starting probability from a paragraph ending probability corresponding to the word;
selecting a word corresponding to the largest first difference value from the answer document as the beginning position of the answer paragraph, and selecting a word corresponding to the largest second difference value as the end position of the answer paragraph;
and taking the words positioned between the initial position and the end position in the answer document as answer sections relevant to the asked Buddhist question.
4. The method according to claim 2, wherein determining the answer paragraphs in the answer document that are related to the asked Buddhist question according to the paragraph beginning probability and the paragraph end probability corresponding to each word comprises:
selecting a word corresponding to the maximum value from the paragraph beginning probability as the beginning position of the answer paragraph;
selecting a word corresponding to the maximum value from the paragraph ending probability as the ending position of the answer paragraph;
and taking the words positioned between the initial position and the end position in the answer document as answer sections relevant to the asked Buddhist question.
5. The method of claim 1, wherein determining topic keywords belonging to the field of Buddhism based on historical Buddhist questions asked by the user comprises:
acquiring historical Buddhist questions asked by the user;
inputting the obtained historical Buddhist question into a trained multi-label classification model, and marking a category label related to Buddhist for the user according to the historical Buddhist question by the multi-label classification model;
and in the preset corresponding relation between the category label and the subject keyword, searching the subject keyword contained in the category label as the subject keyword belonging to the field of Buddhism.
6. The method according to claim 5, wherein the configuration process of the correspondence between the category labels and the topic keywords comprises:
collecting various historical Buddhist questions, and performing topic distribution calculation by using the collected various historical Buddhist questions to obtain topic keywords contained in each topic;
and determining category labels contained in the multi-label classification model, outputting a prompt for the topic keywords contained in each topic and the determined category labels, and receiving a configuration theme keyword for each category label by a user according to the category label output prompt.
7. The method of claim 1, wherein selecting sentences from the set of candidate sentences and generating a Buddhist answer summary based on semantic similarity comprises:
sequencing sentences in the candidate sentence set according to the sequence of the semantic similarity from high to low to obtain a sequencing result;
and selecting sentences meeting preset conditions from the sequencing results according to preset abstract length to generate a Buddhist answer abstract.
8. An apparatus for generating a Buddhist question-answer summary, the apparatus comprising:
the answer paragraph extraction module is used for acquiring a plurality of answer documents corresponding to the Buddhist questions asked by the user and extracting the answer paragraphs related to the Buddhist questions asked from each answer document;
the candidate sentence set determining module is used for carrying out sentence segmentation on each answer paragraph and adding sentences obtained by the sentence segmentation into the candidate sentence set;
the theme keyword determining module is used for determining theme keywords belonging to the field of Buddhism based on historical Buddhism questions asked by the user;
the matching module is used for determining the semantic similarity of each sentence in the candidate sentence set relative to the question Buddhist question according to the topic keywords;
and the abstract generating module is used for selecting sentences from the candidate sentence set according to the semantic similarity to generate the Buddhist answer abstract.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1-7 are implemented when the processor executes the program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202111146330.8A 2021-09-28 2021-09-28 Method, device, equipment and storage medium for generating Buddhist question-answer abstract Pending CN113836296A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111146330.8A CN113836296A (en) 2021-09-28 2021-09-28 Method, device, equipment and storage medium for generating Buddhist question-answer abstract

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111146330.8A CN113836296A (en) 2021-09-28 2021-09-28 Method, device, equipment and storage medium for generating Buddhist question-answer abstract

Publications (1)

Publication Number Publication Date
CN113836296A true CN113836296A (en) 2021-12-24

Family

ID=78967223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111146330.8A Pending CN113836296A (en) 2021-09-28 2021-09-28 Method, device, equipment and storage medium for generating Buddhist question-answer abstract

Country Status (1)

Country Link
CN (1) CN113836296A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880449A (en) * 2022-05-17 2022-08-09 平安科技(深圳)有限公司 Reply generation method and device of intelligent question answering, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163229A (en) * 2011-04-13 2011-08-24 北京百度网讯科技有限公司 Method and equipment for generating abstracts of searching results
CN103902652A (en) * 2014-02-27 2014-07-02 深圳市智搜信息技术有限公司 Automatic question-answering system
CN104375977A (en) * 2013-08-14 2015-02-25 腾讯科技(深圳)有限公司 Answer message processing method and device for question-answer communities
CN110020009A (en) * 2017-09-29 2019-07-16 阿里巴巴集团控股有限公司 Online answering method, apparatus and system
CN110162778A (en) * 2019-04-02 2019-08-23 阿里巴巴集团控股有限公司 The generation method and device of text snippet

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163229A (en) * 2011-04-13 2011-08-24 北京百度网讯科技有限公司 Method and equipment for generating abstracts of searching results
CN104375977A (en) * 2013-08-14 2015-02-25 腾讯科技(深圳)有限公司 Answer message processing method and device for question-answer communities
CN103902652A (en) * 2014-02-27 2014-07-02 深圳市智搜信息技术有限公司 Automatic question-answering system
CN110020009A (en) * 2017-09-29 2019-07-16 阿里巴巴集团控股有限公司 Online answering method, apparatus and system
CN110162778A (en) * 2019-04-02 2019-08-23 阿里巴巴集团控股有限公司 The generation method and device of text snippet

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880449A (en) * 2022-05-17 2022-08-09 平安科技(深圳)有限公司 Reply generation method and device of intelligent question answering, electronic equipment and storage medium
CN114880449B (en) * 2022-05-17 2024-05-10 平安科技(深圳)有限公司 Method and device for generating answers of intelligent questions and answers, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112632385B (en) Course recommendation method, course recommendation device, computer equipment and medium
CN106570708B (en) Management method and system of intelligent customer service knowledge base
CN106649742B (en) Database maintenance method and device
CN109815487B (en) Text quality inspection method, electronic device, computer equipment and storage medium
US20170132314A1 (en) Identifying relevant topics for recommending a resource
CN111539197B (en) Text matching method and device, computer system and readable storage medium
US10762150B2 (en) Searching method and searching apparatus based on neural network and search engine
CN111078837B (en) Intelligent question-answering information processing method, electronic equipment and computer readable storage medium
US20220343082A1 (en) System and method for ensemble question answering
CN109189894B (en) Answer extraction method and device
CN111368048A (en) Information acquisition method and device, electronic equipment and computer readable storage medium
US11461613B2 (en) Method and apparatus for multi-document question answering
CN117009490A (en) Training method and device for generating large language model based on knowledge base feedback
CN113761868B (en) Text processing method, text processing device, electronic equipment and readable storage medium
CN111832305B (en) User intention recognition method, device, server and medium
CN112328800A (en) System and method for automatically generating programming specification question answers
CN113342958B (en) Question-answer matching method, text matching model training method and related equipment
JP2020512651A (en) Search method, device, and non-transitory computer-readable storage medium
CN113705191A (en) Method, device and equipment for generating sample statement and storage medium
CN117149984A (en) Customization training method and device based on large model thinking chain
CN113742446A (en) Knowledge graph question-answering method and system based on path sorting
CN115714030A (en) Medical question-answering system and method based on pain perception and active interaction
EP4030355A1 (en) Neural reasoning path retrieval for multi-hop text comprehension
CN113836296A (en) Method, device, equipment and storage medium for generating Buddhist question-answer abstract
CN113569018A (en) Question and answer pair mining method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40062838

Country of ref document: HK