CN113111159A - Question and answer record generation method and device, electronic equipment and storage medium - Google Patents

Question and answer record generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113111159A
CN113111159A CN202110429297.3A CN202110429297A CN113111159A CN 113111159 A CN113111159 A CN 113111159A CN 202110429297 A CN202110429297 A CN 202110429297A CN 113111159 A CN113111159 A CN 113111159A
Authority
CN
China
Prior art keywords
question
answer
chat
corpus
participles
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110429297.3A
Other languages
Chinese (zh)
Inventor
朱章春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kangjian Information Technology Shenzhen Co Ltd
Original Assignee
Kangjian Information Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kangjian Information Technology Shenzhen Co Ltd filed Critical Kangjian Information Technology Shenzhen Co Ltd
Priority to CN202110429297.3A priority Critical patent/CN113111159A/en
Publication of CN113111159A publication Critical patent/CN113111159A/en
Priority to PCT/CN2022/087818 priority patent/WO2022222942A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention relates to a data analysis technology and discloses a question and answer record generation method which comprises the steps of extracting participles in a chat record and calculating the occurrence frequency of each participle; summarizing the participles with the frequency greater than a preset threshold value to obtain a popular participle set, sequencing the participles in the popular participle set, and generating a popular list according to the sequencing; sequentially selecting one of the participles in the hot word list, and taking the selected participle as a search word to search in the chat records to obtain a problem corresponding to the search word; and classifying the problems obtained by searching according to the purpose of the problems, solving the problems, generating a question and answer record, and pushing the question and answer record to the user side. In addition, the invention also relates to a blockchain technology, and the hot word segmentation set can be stored in the nodes of the blockchain. The invention also provides a question and answer record generating device, electronic equipment and a computer readable storage medium. The invention can solve the problem of low efficiency when the question is matched with the question-answer database to generate the question-answer record.

Description

Question and answer record generation method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data analysis technologies, and in particular, to a method and an apparatus for generating a question and answer record, an electronic device, and a computer-readable storage medium.
Background
With the rapid development of science and technology, the process of proposing and answering questions is important no matter in a live broadcast room with strong manual operation property, a classroom of a teacher or in the daily communication of people, and the generated question-answer record can be used as a reference for subsequent communication.
The existing question-answer record generation method is generally to directly match and reply the acquired questions with the existing question-answer database, and does not consider the frequency of the questions, which causes repeated matching of the same or the same type of questions, and the efficiency of generating the question-answer records by using the method is low.
Disclosure of Invention
The invention provides a question and answer record generation method and device and a computer readable storage medium, and mainly aims to solve the problem of low efficiency in generating question and answer records by matching questions with a question and answer database.
In order to achieve the above object, the present invention provides a method for generating a question and answer record, comprising:
performing word segmentation processing on the obtained chat records, and counting the occurrence frequency of each word segmentation;
summarizing the participles with the frequency larger than a preset threshold value to obtain a popular participle set;
sorting the participles in the popular participle set according to the frequency to generate a popular list;
sequentially selecting one participle according to the arrangement sequence of the participles in the hot word list, and searching the selected participle serving as a search word in the chat records to obtain a problem corresponding to the search word;
acquiring an original question-answer data set, extracting each flow node in the original question-answer data set and corpus data corresponding to the flow node, and marking and combining the flow nodes of the corpus data to obtain a training corpus;
performing feature coding on the training corpus to obtain a training corpus vector, and training a preset multi-classification model by using the training corpus to obtain a problem intention classification model;
inputting the retrieved questions into the question intention classification model to obtain question intentions, answering the questions according to the question intentions to generate question and answer records, and pushing the question and answer records to a client.
Optionally, the performing word segmentation processing on the obtained chat records includes:
preprocessing the chat records according to a preset rule to obtain initial chat records;
performing word segmentation processing on the initial chat records by using a word segmentation tool to obtain a word segmentation chat set;
and screening out the participles from the participle chat set according to a preset keyword dictionary.
Optionally, before performing word segmentation processing on the obtained chat records, the method further includes:
identifying a user corresponding to the chat record;
judging whether the user passes identity verification;
if the user does not pass the identity verification, deleting the chat record of the user;
and if the user passes the identity verification, reserving the chat record corresponding to the user.
Optionally, the sequentially selecting one of the participles according to the ranking order of the participles in the hot list, and retrieving the selected participle as a search term in the chat record to obtain a problem corresponding to the search term includes:
sequentially selecting one participle according to the arrangement sequence of the participles in the hot word list through traversal operation, taking the selected participle as a search word, and carrying out vectorization processing on the search word to obtain a search word vector;
extracting chat keywords in the initial chat records, and performing vectorization processing on the chat keywords to obtain keyword vectors;
calculating the similarity between the search word vector and the keyword vector, and selecting the problem corresponding to the keyword with the similarity being greater than or equal to a preset similarity threshold as the problem corresponding to the search word.
Optionally, the performing feature coding on the corpus to obtain a corpus vector includes:
summarizing the total number of the corpuses of the training corpuses in the original question-answer data set to obtain the total number of the corpuses;
constructing to obtain an initial matrix vector by taking the training corpus as the row number of a preset matrix and taking the total corpus as the column number of the preset matrix;
and setting the position of the column number corresponding to the corpus in the initial matrix vector as a first numerical value, and setting the rest column numbers as second numerical values to obtain the corpus vector.
Optionally, the training a preset multi-classification model by using the training corpus to obtain a problem intention classification model includes:
classifying the training corpus by using the preset multi-classification model to obtain one or more classification intents;
labeling the intention category in the original question-answer data set, and calculating the repetition degree between the intention category and the classification intention;
when the repetition degree is smaller than a preset classification threshold value, carrying out iterative updating on the preset classification model, and classifying the final expression vector again;
and when the repetition degree is greater than or equal to a preset classification threshold value, obtaining a problem intention classification model.
Optionally, the solving the question according to the question intention and generating a question-answer record includes:
selecting a corresponding preset question-answer database according to the question intention, performing matching processing on the question by using the question-answer database, and judging whether the question is matched with a question in the question-answer database;
if the question is matched with the question in the question-answer database, taking the answer corresponding to the question in the question-answer database as the answer of the question, and generating a question-answer record according to the question and the answer;
if the question is not matched with the question in the question-answer database, marking the question as an unanswered question and performing question answering on the unanswered question to obtain an answer of the unanswered question;
and generating a question-answer record according to the unanswered question and the answer of the unanswered question.
In order to solve the above problem, the present invention also provides a question-answer record generating apparatus, including:
the segmentation extraction module is used for performing segmentation processing on the obtained chat records and counting the occurrence frequency of each segmentation;
the hot word list generating module is used for summarizing the words with the frequency greater than a preset threshold value to obtain a hot word set; sorting the participles in the popular participle set according to the frequency to generate a popular list;
the record retrieval module is used for sequentially selecting one of the participles according to the arrangement sequence of the participles in the hot word list, and retrieving the selected participle in the chat records as a retrieval word to obtain a problem corresponding to the retrieval word;
the corpus generating module is used for acquiring an original question-answer data set, extracting each flow node in the original question-answer data set and corpus data corresponding to the flow node, and marking and combining the flow nodes of the corpus data to obtain a corpus;
the model training module is used for carrying out feature coding on the training corpus to obtain a training corpus vector, and training a preset multi-classification model by using the training corpus to obtain a problem intention classification model;
and the question-answer record generating module is used for inputting the retrieved questions into the question intention classification model to obtain question intentions, answering the questions according to the question intentions to generate question-answer records, and pushing the question-answer records to the client.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize the question and answer record generation method.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, in which at least one instruction is stored, and the at least one instruction is executed by a processor in an electronic device to implement the above question and answer record generating method.
The method comprises the steps of summarizing the participles with the frequency higher than a preset threshold value in the chat records, sorting the participles according to the frequency of the participles, generating a hot word list, training a question intention classification model to perform intention classification on the questions searched according to the participles, wherein the word list comprises a plurality of participles with high mentioning frequency, so that the questions are convenient to answer in a targeted manner, meanwhile, the hot word list also represents the questions which are commonly concerned by people, the question intention can roughly estimate the range and the field of the questions, and corresponding answers are performed according to different user intentions, so that the efficiency of answering the questions can be improved. Therefore, the question and answer record generation method, the question and answer record generation device, the electronic equipment and the computer readable storage medium can solve the problem of low efficiency when the question and answer record is generated by matching the question with the question and answer database.
Drawings
Fig. 1 is a schematic flow chart of a method for generating a question and answer record according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of one step in the question answering record generation method shown in FIG. 1;
fig. 3 is a functional block diagram of a question-answer record generating device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device for implementing the method for generating a question and answer record according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a question and answer record generation method. The execution subject of the question and answer record generation method includes, but is not limited to, at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiment of the present application. In other words, the question-answer record generating method may be executed by software or hardware installed in the terminal device or the server device, and the software may be a block chain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Fig. 1 is a schematic flow chart of a method for generating a question and answer record according to an embodiment of the present invention. In this embodiment, the method for generating a question and answer record includes:
and S1, performing word segmentation processing on the obtained chat records, and counting the occurrence frequency of each word segmentation.
In the embodiment of the present invention, the chat record may be acquired from a preset IM (instant messenger) system, where the IM system is a communication system that is customizable and integrates a multi-person video conference function.
In one embodiment of the present invention, the performing word segmentation processing on the obtained chat records includes:
preprocessing the chat records according to a preset rule to obtain initial chat records;
performing word segmentation processing on the initial chat records by using a word segmentation tool to obtain a word segmentation chat set;
and screening out the participles from the participle chat set according to a preset keyword dictionary.
In the embodiment of the present invention, the preset rule refers to removing special symbols and stop words in the chat log, where the special symbols refer to some mathematical symbols, unit symbols, and tab symbols that are used less frequently and are difficult to directly input, such as @, #, etc. The stop words refer to words without practical meaning, such as: removing stop words in the chat records by using words such as "o", "yaho", and the like can refer to a preset stop word list, wherein the stop word list can be an acquired "word bank of stop words in hayada" and an acquired "word bank of stop words in a machine learning intelligent laboratory of university in Sichuan".
In one embodiment of the invention, a preset Jieba word segmentation device can be used for carrying out word segmentation on the initial chat records to obtain a word segmentation chat set.
Further, in the embodiment of the present invention, the preset keyword dictionary includes professional terms in a preset field, and the word segmentation is screened out from the word segmentation chat set in order to extract keywords fitting an actual application scene, so that useless data is prevented from being extracted and data redundancy is prevented from being caused.
For example, the chat log includes: is it asked for "# that hyperlipidemia leads to the infarct? "how should the doctor and the cerebral infarction be treated? ", removing special signs and stop words from the chat records, removing two special signs of" # "and" rah "in the chat records, and performing word segmentation on the processed chat records to obtain a word segmentation chat set: "ask/hyperlipemia/meeting/causing/cerebral infarction/do? "," doctor/brain stem/should/what/treatment? Further counting the occurrence frequency of each participle in the participle chat set, and judging whether the participle appears in a preset keyword dictionary, wherein the occurrence frequency of the cerebral infarction is 2 times, and the occurrence frequency of the blood fat is 1 time.
In addition, in another embodiment of the present invention, before performing the word segmentation process on the obtained chat records, the method further includes:
identifying a user corresponding to the chat record;
judging whether the user passes identity verification;
if the user does not pass the identity verification, deleting the chat record of the user;
and if the user passes the identity verification, reserving the chat record corresponding to the user.
In detail, the identity verification of the user corresponding to the chat record is performed to check whether the user is on a predefined user list, only the speech of the user on the user list can be used as the chat record to be processed subsequently, the speech of the user not on the user list cannot be adopted, and the obtained chat record has no actual reference meaning at this time.
And S2, summarizing the participles with the frequency larger than the preset threshold value to obtain a popular participle set.
In the embodiment of the present invention, the preset threshold is a standard for determining whether the word segmentation corresponding to the frequency is hot word segmentation, if the frequency is greater than the preset threshold, the word segmentation corresponding to the frequency is hot word segmentation, and if the frequency is less than or equal to the preset threshold, the frequency of the word segmentation corresponding to the frequency is relatively low, and the word segmentation cannot be determined as hot word segmentation.
S3, sorting the participles in the popular participle set according to the frequency, and generating a hot word list.
In one embodiment of the present invention, the sorting the participles in the popular participle set according to the frequency to generate a popular list, including:
acquiring the participles in the popular participle set and the corresponding frequency of the participles;
arranging the participles according to the sequence of the corresponding frequencies from large to small to obtain an initial list; intercepting the preset number of the participles in the initial list to generate a hot list.
Wherein, the preset number may be 10.
In detail, the step of intercepting the preset number of the pronouns in the initial list is to further screen the initial list, and although the pronouns with the frequency greater than the preset threshold are collected and arranged to obtain the initial list, the initial list may contain more low-frequency pronouns, so that the preset number of the pronouns in the initial list is intercepted to generate a hot list. In one application scenario of the invention, the hot word list can facilitate the anchor in the live broadcast room to more intuitively know the related topics and questions that the audience wants to know, and the anchor can solve the questions on the hot word list in the order from top to bottom.
S4, sequentially selecting one of the participles according to the arrangement sequence of the participles in the heat word list, and taking the selected participle as a search word to search in the chat records to obtain the problem corresponding to the search word.
In the embodiment of the present invention, referring to fig. 2, the sequentially selecting one of the participles according to the ranking order of the participles in the hot list, and retrieving the selected participle as a search term in the chat records to obtain a problem corresponding to the search term includes:
s401, sequentially selecting one participle according to the arrangement sequence of the participles in the hot word list through traversal operation, taking the selected participle as a search word, and performing vectorization processing on the search word to obtain a search word vector;
s402, extracting chat keywords in the initial chat records, and performing vectorization processing on the chat keywords to obtain keyword vectors;
s403, calculating the similarity between the search term vector and the keyword vector, and selecting the problem corresponding to the keyword with the similarity larger than or equal to a preset similarity threshold as the problem corresponding to the search term.
Specifically, in the embodiment of the present invention, the initial chat record may include a plurality of chat keywords, where the chat keywords are medical nouns in the medical field, and the similarity between the search word and the chat keywords is calculated, and as long as one chat keyword in sentences in the initial chat record matches, a corresponding sentence is used as a question corresponding to the search word.
In detail, when the question corresponding to the keyword whose similarity is greater than or equal to the preset similarity threshold is selected as the question corresponding to the search term in the embodiment of the present invention, the question may be a chat record presented in a question form. If the searched keyword appears in the chat records which are not question, the corresponding chat records can not be classified as the problems corresponding to the search words.
For example, the term is "brainstem", and the initial chat records are: "is cerebral infarction and heart disease very frequent in modern society, and a truly effective method for treating cerebral infarction is present at the end? However, the following three are the most effective ones. ", the chat keywords in the initial chat records are: the 'cerebral infarction' and the heart disease can take the problem containing the 'cerebral infarction' corresponding to the initial chat records as the problem corresponding to the search word by calculating the similarity between the search word vector and the keyword vector.
The similarity between the search term vector and the keyword vector can be calculated by adopting a plurality of calculation methods, including, but not limited to, calculating by adopting a cosine similarity formula, calculating by adopting an euclidean distance, and the like.
Optionally, in an embodiment of the present invention, the calculating a similarity between the search term vector and the keyword vector includes:
calculating the similarity between the search term vector and the keyword vector by using the following formula:
Figure BDA0003030754860000081
wherein cos (a, b) is similarity, a is the search term vector, b is the keyword vector, and | a |, | b | are the module corresponding to the search term vector and the module corresponding to the keyword vector, respectively.
The embodiment of the invention can carry out vectorization processing on the search word and the chat keyword according to a preset word2vec algorithm to obtain the search word vector and the keyword vector.
S5, obtaining an original question-answer data set, extracting each flow node in the original question-answer data set and corpus data corresponding to the flow nodes, and marking and combining the flow nodes of the corpus data to obtain a training corpus.
In the embodiment of the present invention, the original question-answer data set may be medical questions and corresponding answers included in a service scenario of an intelligent question-answer.
In detail, each process node in the original question-answer data set may be each round of dialog between the medical expert and the questioner, and the corresponding corpus data refers to the answer of the medical expert to the dialog between the medical expert and the questioner per round of dialog.
The embodiment of the invention marks the flow nodes of the corpus data and then combines the flow nodes to obtain the training corpus.
And S6, performing feature coding on the training corpus to obtain a training corpus vector, and training a preset multi-classification model by using the training corpus to obtain a problem intention classification model.
In the embodiment of the present invention, the feature coding of the corpus may adopt methods such as unique hot coding, target coding, bayes target coding, and the like, wherein the present invention uses the preset unique hot coding to perform the feature coding of the corpus to obtain the corpus vector.
Specifically, the performing feature coding on the corpus to obtain a corpus vector includes:
summarizing the total number of the corpuses of the training corpuses in the original question-answer data set to obtain the total number of the corpuses;
constructing to obtain an initial matrix vector by taking the training corpus as the row number of a preset matrix and taking the total corpus as the column number of the preset matrix;
and setting the position of the column number corresponding to the corpus in the initial matrix vector as a first numerical value, and setting the rest column numbers as second numerical values to obtain the corpus vector.
For example, the corpus total number is 5, i.e., assuming five dialogues, the corpus vector of the first round of corpus is [1,0,0,0,0 ].
In an embodiment of the present invention, the preset multi-classification model may be a shallow neural network or a support vector machine model. In the embodiment of the invention, the intention corresponding to each question can be identified by the question intention classification model obtained by training the multi-classification model.
In detail, the training of the preset multi-classification model by using the training corpus to obtain a problem intention classification model includes:
classifying the training corpus by using the preset multi-classification model to obtain one or more classification intents;
labeling the intention category in the original question-answer data set, and calculating the repetition degree between the intention category and the classification intention;
when the repetition degree is smaller than a preset classification threshold value, carrying out iterative updating on the preset classification model, and classifying the final expression vector again;
and when the repetition degree is greater than or equal to a preset classification threshold value, obtaining a problem intention classification model.
In detail, the embodiment of the present invention labels the intention category in the original question-answer dataset by judging the actual intention of the answers of the medical experts in the dialog between each round of medical experts and the questioner. For example, the first round of dialog is: questioner personnel: "ask you to solve a brainstem-related question? ", medical professional: "yes, i can. ", the intention category of the first round of dialogue is to confirm the field of medical expert solution, and the second round of dialogue is: the questioner: "what are the effective treatment regimens for cerebral infarction? ", medical professional: "the following are the most common treatment options for the treatment of cerebral infarction at present, first … … …", and the intention category of the second round of dialogue is to confirm the solution of a specific problem.
And S7, inputting the searched questions into the question intention classification model to obtain question intentions, answering the questions according to the question intentions to generate question and answer records, and pushing the question and answer records to a client.
In the embodiment of the invention, the retrieved questions are input into the question intention classification model to obtain the question intentions, the question intentions can roughly estimate the range and the field of the questions, and the corresponding answers are carried out according to different user intentions, so that the question answering efficiency can be improved.
And solving the retrieved questions according to the question intentions to obtain corresponding answers, and summarizing the questions and the corresponding answers to generate the question-answer records.
Specifically, the solving the question according to the question intention and generating a question-answer record includes:
selecting a corresponding preset question-answer database according to the question intention, performing matching processing on the question by using the question-answer database, and judging whether the question is matched with a question in the question-answer database;
if the question is matched with the question in the question-answer database, taking the answer corresponding to the question in the question-answer database as the answer of the question, and generating a question-answer record according to the question and the answer;
if the question is not matched with the question in the question-answer database, marking the question as an unanswered question and performing question answering on the unanswered question to obtain an answer of the unanswered question;
and generating a question-answer record according to the unanswered question and the answer of the unanswered question.
The different question intentions correspond to different question and answer databases, and the question and answer databases comprise common questions corresponding to some related intentions and answers corresponding to the common questions.
In detail, if the question does not match the question in the question database, the embodiment of the present invention marks the question as an unanswered question, and the unanswered question does not search for a matching question in the question-answer database, and may obtain an answer through manual answering.
Specifically, the pushing the question and answer record to the user side includes:
transmitting the question and answer records to a data push engine according to the transmission file of the question and answer records;
and pushing the question and answer record to a user side by using the data pushing engine.
The method comprises the steps of summarizing the participles with the frequency higher than a preset threshold value in the chat records, sorting the participles according to the frequency of the participles, generating a hot word list, training a question intention classification model to perform intention classification on the questions searched according to the participles, wherein the word list comprises a plurality of participles with high mentioning frequency, so that the questions are convenient to answer in a targeted manner, meanwhile, the hot word list also represents the questions which are commonly concerned by people, the question intention can roughly estimate the range and the field of the questions, and corresponding answers are performed according to different user intentions, so that the efficiency of answering the questions can be improved. Therefore, the question-answer record generation method provided by the invention can solve the problem of low efficiency when the question is matched with the question-answer database to generate the question-answer record.
Fig. 3 is a functional block diagram of a question and answer record generating device according to an embodiment of the present invention.
The question-answer record generating apparatus 100 according to the present invention may be installed in an electronic device. According to the realized functions, the question-answer record generating device 100 may include a participle extracting module 101, a hot list generating module 102, a record retrieving module 103, a corpus generating module 104, a model training module 105, and a question-answer record generating module 106. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the segmentation extraction module 101 is configured to perform segmentation processing on the obtained chat records, and count the occurrence frequency of each segmentation;
the hot word list generating module 102 is configured to summarize the partial words with the frequency greater than a preset threshold value to obtain a hot partial word set; sorting the participles in the popular participle set according to the frequency to generate a popular list;
the record retrieval module 103 is configured to sequentially select one of the participles according to the arrangement order of the participles in the hot word list, and retrieve the selected participle in the chat record as a retrieval word to obtain a problem corresponding to the retrieval word;
the corpus generating module 104 is configured to obtain an original question-answer dataset, extract each flow node in the original question-answer dataset and corpus data corresponding to the flow node, and mark and combine the flow nodes of the corpus data to obtain a corpus;
the model training module 105 is configured to perform feature coding on the corpus to obtain corpus vectors, and train a preset multi-classification model by using the corpus to obtain a problem intention classification model;
the question-answer record generating module 106 is configured to input the retrieved question into the question intention classification model to obtain a question intention, solve the question according to the question intention, generate a question-answer record, and push the question-answer record to a client.
In detail, the specific implementation of each module of the question-answer record generating device 100 is as follows:
step one, the segmentation extraction module 101 performs segmentation processing on the obtained chat records, and counts the occurrence frequency of each segmentation.
In the embodiment of the present invention, the chat record may be acquired from a preset IM (instant messenger) system, where the IM system is a communication system that is customizable and integrates a multi-person video conference function.
In one embodiment of the present invention, the performing word segmentation processing on the obtained chat records includes:
preprocessing the chat records according to a preset rule to obtain initial chat records;
performing word segmentation processing on the initial chat records by using a word segmentation tool to obtain a word segmentation chat set;
and screening out the participles from the participle chat set according to a preset keyword dictionary.
In the embodiment of the present invention, the preset rule refers to removing special symbols and stop words in the chat log, where the special symbols refer to some mathematical symbols, unit symbols, and tab symbols that are used less frequently and are difficult to directly input, such as @, #, etc. The stop words refer to words without practical meaning, such as: removing stop words in the chat records by using words such as "o", "yaho", and the like can refer to a preset stop word list, wherein the stop word list can be an acquired "word bank of stop words in hayada" and an acquired "word bank of stop words in a machine learning intelligent laboratory of university in Sichuan".
In one embodiment of the invention, a preset Jieba word segmentation device can be used for carrying out word segmentation on the initial chat records to obtain a word segmentation chat set.
Further, in the embodiment of the present invention, the preset keyword dictionary includes professional terms in a preset field, and the word segmentation is screened out from the word segmentation chat set in order to extract keywords fitting an actual application scene, so that useless data is prevented from being extracted and data redundancy is prevented from being caused.
For example, the chat log includes: is it asked for "# that hyperlipidemia leads to the infarct? "how should the doctor and the cerebral infarction be treated? ", removing special signs and stop words from the chat records, removing two special signs of" # "and" rah "in the chat records, and performing word segmentation on the processed chat records to obtain a word segmentation chat set: "ask/hyperlipemia/meeting/causing/cerebral infarction/do? "," doctor/brain stem/should/what/treatment? Further counting the occurrence frequency of each participle in the participle chat set, and judging whether the participle appears in a preset keyword dictionary, wherein the occurrence frequency of the cerebral infarction is 2 times, and the occurrence frequency of the blood fat is 1 time.
In addition, in another embodiment of the present invention, before the segmentation process is performed on the obtained chat records by the segmentation extracting module 101, the segmentation extracting module 101 is further configured to:
identifying a user corresponding to the chat record;
judging whether the user passes identity verification;
if the user does not pass the identity verification, deleting the chat record of the user;
and if the user passes the identity verification, reserving the chat record corresponding to the user.
In detail, the identity verification of the user corresponding to the chat record is performed to check whether the user is on a predefined user list, only the speech of the user on the user list can be used as the chat record to be processed subsequently, the speech of the user not on the user list cannot be adopted, and the obtained chat record has no actual reference meaning at this time.
Step two, the hot word list generating module 102 summarizes the participles with the frequency greater than the preset threshold value to obtain a hot participle set, obtains the hot participle set, performs ranking processing on the participles in the hot participle set, and generates the hot word list according to the ranking.
In the embodiment of the present invention, the preset threshold is a standard for determining whether the word segmentation corresponding to the frequency is hot word segmentation, if the frequency is greater than the preset threshold, the word segmentation corresponding to the frequency is hot word segmentation, and if the frequency is less than or equal to the preset threshold, the frequency of the word segmentation corresponding to the frequency is relatively low, and the word segmentation cannot be determined as hot word segmentation.
And thirdly, sorting the participles in the popular participle set according to the frequency to generate a popular word list.
In one embodiment of the present invention, the sorting the participles in the popular participle set according to the frequency to generate a popular list, including:
acquiring the participles in the popular participle set and the corresponding frequency of the participles;
arranging the participles according to the sequence of the corresponding frequencies from large to small to obtain an initial list; intercepting the preset number of the participles in the initial list to generate a hot list.
Wherein, the preset number may be 10.
In detail, the step of intercepting the preset number of the pronouns in the initial list is to further screen the initial list, and although the pronouns with the frequency greater than the preset threshold are collected and arranged to obtain the initial list, the initial list may contain more low-frequency pronouns, so that the preset number of the pronouns in the initial list is intercepted to generate a hot list. In one application scenario of the invention, the hot word list can facilitate the anchor in the live broadcast room to more intuitively know the related topics and questions that the audience wants to know, and the anchor can solve the questions on the hot word list in the order from top to bottom.
And fourthly, sequentially selecting one of the participles according to the arrangement sequence of the participles in the hot word list, and searching the selected participle in the chat records as a search word to obtain the problem corresponding to the search word.
In this embodiment of the present invention, the record retrieval module 103 sequentially selects one of the participles according to the ranking order of the participles in the hot list, and retrieves the selected participle as a retrieval word in the chat record to obtain a problem corresponding to the retrieval word:
sequentially selecting one participle according to the arrangement sequence of the participles in the hot word list through traversal operation, taking the selected participle as a search word, performing vectorization processing on the search word to obtain a search word vector, extracting chat keywords in the initial chat record, and performing vectorization processing on the chat keywords to obtain a keyword vector;
calculating the similarity between the search word vector and the keyword vector, and selecting the problem corresponding to the keyword with the similarity being greater than or equal to a preset similarity threshold as the problem corresponding to the search word.
Specifically, in the embodiment of the present invention, the initial chat record may include a plurality of chat keywords, where the chat keywords are medical nouns in the medical field, and the similarity between the search word and the chat keywords is calculated, and as long as one chat keyword in sentences in the initial chat record matches, a corresponding sentence is used as a question corresponding to the search word.
In detail, when the question corresponding to the keyword whose similarity is greater than or equal to the preset similarity threshold is selected as the question corresponding to the search term in the embodiment of the present invention, the question may be a chat record presented in a question form. If the searched keyword appears in the chat records which are not question, the corresponding chat records can not be classified as the problems corresponding to the search words.
For example, the term is "brainstem", and the initial chat records are: "is cerebral infarction and heart disease very frequent in modern society, and a truly effective method for treating cerebral infarction is present at the end? However, the following three are the most effective ones. ", the chat keywords in the initial chat records are: the 'cerebral infarction' and the heart disease can take the problem containing the 'cerebral infarction' corresponding to the initial chat records as the problem corresponding to the search word by calculating the similarity between the search word vector and the keyword vector.
The similarity between the search term vector and the keyword vector can be calculated by adopting a plurality of calculation methods, including, but not limited to, calculating by adopting a cosine similarity formula, calculating by adopting an euclidean distance, and the like.
Optionally, in an embodiment of the present invention, the calculating a similarity between the search term vector and the keyword vector includes:
calculating the similarity between the search term vector and the keyword vector by using the following formula:
Figure BDA0003030754860000151
wherein cos (a, b) is similarity, a is the search term vector, b is the keyword vector, and | a |, | b | are the module corresponding to the search term vector and the module corresponding to the keyword vector, respectively.
The embodiment of the invention can carry out vectorization processing on the search word and the chat keyword according to a preset word2vec algorithm to obtain the search word vector and the keyword vector.
And fifthly, acquiring an original question-answer data set, extracting each flow node in the original question-answer data set and corpus data corresponding to the flow node, and marking and combining the flow nodes of the corpus data to obtain the training corpus.
In the embodiment of the present invention, the original question-answer data set may be medical questions and corresponding answers included in a service scenario of an intelligent question-answer.
In detail, each process node in the original question-answer data set may be each round of dialog between the medical expert and the questioner, and the corresponding corpus data refers to the answer of the medical expert to the dialog between the medical expert and the questioner per round of dialog.
The embodiment of the invention marks the flow nodes of the corpus data and then combines the flow nodes to obtain the training corpus.
And step six, carrying out feature coding on the training corpus to obtain a training corpus vector, and training a preset multi-classification model by using the training corpus to obtain a problem intention classification model.
In the embodiment of the present invention, the feature coding of the corpus may adopt methods such as unique hot coding, target coding, bayes target coding, and the like, wherein the present invention uses the preset unique hot coding to perform the feature coding of the corpus to obtain the corpus vector.
Specifically, the performing feature coding on the corpus to obtain a corpus vector includes:
summarizing the total number of the corpuses of the training corpuses in the original question-answer data set to obtain the total number of the corpuses;
constructing to obtain an initial matrix vector by taking the training corpus as the row number of a preset matrix and taking the total corpus as the column number of the preset matrix;
and setting the position of the column number corresponding to the corpus in the initial matrix vector as a first numerical value, and setting the rest column numbers as second numerical values to obtain the corpus vector.
For example, the corpus total number is 5, i.e., assuming five dialogues, the corpus vector of the first round of corpus is [1,0,0,0,0 ].
In an embodiment of the present invention, the preset multi-classification model may be a shallow neural network or a support vector machine model. In the embodiment of the invention, the intention corresponding to each question can be identified by the question intention classification model obtained by training the multi-classification model.
In detail, the training of the preset multi-classification model by using the training corpus to obtain a problem intention classification model includes:
classifying the training corpus by using the preset multi-classification model to obtain one or more classification intents;
labeling the intention category in the original question-answer data set, and calculating the repetition degree between the intention category and the classification intention;
when the repetition degree is smaller than a preset classification threshold value, carrying out iterative updating on the preset classification model, and classifying the final expression vector again;
and when the repetition degree is greater than or equal to a preset classification threshold value, obtaining a problem intention classification model.
In detail, the embodiment of the present invention labels the intention category in the original question-answer dataset by judging the actual intention of the answers of the medical experts in the dialog between each round of medical experts and the questioner. For example, the first round of dialog is: questioner personnel: "ask you to solve a brainstem-related question? ", medical professional: "yes, i can. ", the intention category of the first round of dialogue is to confirm the field of medical expert solution, and the second round of dialogue is: the questioner: "what are the effective treatment regimens for cerebral infarction? ", medical professional: "the following are the most common treatment options for the treatment of cerebral infarction at present, first … … …", and the intention category of the second round of dialogue is to confirm the solution of a specific problem.
And seventhly, the question and answer record generating module 106 inputs the retrieved questions into the question intention classification model to obtain question intentions, answers the questions according to the question intentions to generate question and answer records, and pushes the question and answer records to a client.
In the embodiment of the invention, the retrieved questions are input into the question intention classification model to obtain the question intentions, the question intentions can roughly estimate the range and the field of the questions, and the corresponding answers are carried out according to different user intentions, so that the question answering efficiency can be improved.
And solving the retrieved questions according to the question intentions to obtain corresponding answers, and summarizing the questions and the corresponding answers to generate the question-answer records.
Specifically, the solving the question according to the question intention and generating a question-answer record includes:
selecting a corresponding preset question-answer database according to the question intention, performing matching processing on the question by using the question-answer database, and judging whether the question is matched with a question in the question-answer database;
if the question is matched with the question in the question-answer database, taking the answer corresponding to the question in the question-answer database as the answer of the question, and generating a question-answer record according to the question and the answer;
if the question is not matched with the question in the question-answer database, marking the question as an unanswered question and performing question answering on the unanswered question to obtain an answer of the unanswered question;
and generating a question-answer record according to the unanswered question and the answer of the unanswered question.
Wherein, the question-answer database comprises some common questions and answers corresponding to the common questions.
In detail, if the question does not match the question in the question database, the embodiment of the present invention marks the question as an unanswered question, and the unanswered question does not search for a matching question in the question-answer database, and may obtain an answer through manual answering.
Specifically, the pushing the question and answer record to the user side includes:
transmitting the question and answer records to a data push engine according to the transmission file of the question and answer records;
and pushing the question and answer record to a user side by using the data pushing engine.
Fig. 4 is a schematic structural diagram of an electronic device implementing a method for generating a question and answer record according to an embodiment of the present invention.
The electronic device 1 may include a processor 10, a memory 11, and a bus, and may further include a computer program, such as a question and answer record generating program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of the question and answer record generating program 12, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (such as a question and answer record generating program) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 4 only shows an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The question-answer record generating program 12 stored in the memory 11 of the electronic device 1 is a combination of a plurality of instructions, which when executed in the processor 10, can realize:
performing word segmentation processing on the obtained chat records, and counting the occurrence frequency of each word segmentation;
summarizing the participles with the frequency larger than a preset threshold value to obtain a popular participle set;
sorting the participles in the popular participle set according to the frequency to generate a popular list;
sequentially selecting one participle according to the arrangement sequence of the participles in the hot word list, and searching the selected participle serving as a search word in the chat records to obtain a problem corresponding to the search word;
acquiring an original question-answer data set, extracting each flow node in the original question-answer data set and corpus data corresponding to the flow node, and marking and combining the flow nodes of the corpus data to obtain a training corpus;
performing feature coding on the training corpus to obtain a training corpus vector, and training a preset multi-classification model by using the training corpus to obtain a problem intention classification model;
inputting the retrieved questions into the question intention classification model to obtain question intentions, answering the questions according to the question intentions to generate question and answer records, and pushing the question and answer records to a client.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiments corresponding to fig. 1 to fig. 4, which is not repeated herein.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
performing word segmentation processing on the obtained chat records, and counting the occurrence frequency of each word segmentation;
summarizing the participles with the frequency larger than a preset threshold value to obtain a popular participle set;
sorting the participles in the popular participle set according to the frequency to generate a popular list;
sequentially selecting one participle according to the arrangement sequence of the participles in the hot word list, and searching the selected participle serving as a search word in the chat records to obtain a problem corresponding to the search word;
acquiring an original question-answer data set, extracting each flow node in the original question-answer data set and corpus data corresponding to the flow node, and marking and combining the flow nodes of the corpus data to obtain a training corpus;
performing feature coding on the training corpus to obtain a training corpus vector, and training a preset multi-classification model by using the training corpus to obtain a problem intention classification model;
inputting the retrieved questions into the question intention classification model to obtain question intentions, answering the questions according to the question intentions to generate question and answer records, and pushing the question and answer records to a client.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method for generating a question-answer record, the method comprising:
performing word segmentation processing on the obtained chat records, and counting the occurrence frequency of each word segmentation;
summarizing the participles with the frequency larger than a preset threshold value to obtain a popular participle set;
sorting the participles in the popular participle set according to the frequency to generate a popular list;
sequentially selecting one participle according to the arrangement sequence of the participles in the hot word list, and searching the selected participle serving as a search word in the chat records to obtain a problem corresponding to the search word;
acquiring an original question-answer data set, extracting each flow node in the original question-answer data set and corpus data corresponding to the flow node, and marking and combining the flow nodes of the corpus data to obtain a training corpus;
performing feature coding on the training corpus to obtain a training corpus vector, and training a preset multi-classification model by using the training corpus to obtain a problem intention classification model;
inputting the retrieved questions into the question intention classification model to obtain question intentions, answering the questions according to the question intentions to generate question and answer records, and pushing the question and answer records to a client.
2. The method of claim 1, wherein the segmenting the obtained chat history includes:
preprocessing the chat records according to a preset rule to obtain initial chat records;
performing word segmentation processing on the initial chat records by using a word segmentation tool to obtain a word segmentation chat set;
and screening out the participles from the participle chat set according to a preset keyword dictionary.
3. The method of generating a question-answer record according to claim 1, wherein before performing the word segmentation process on the obtained chat records, the method further comprises:
identifying a user corresponding to the chat record;
judging whether the user passes identity verification;
if the user does not pass the identity verification, deleting the chat record of the user;
and if the user passes the identity verification, reserving the chat record corresponding to the user.
4. The method for generating question-answer records according to claim 1, wherein the sequentially selecting one of the participles according to the ranking order of the participles in the heat word list, and using the selected participle as a search word to search in the chat records to obtain the question corresponding to the search word comprises:
sequentially selecting one participle according to the arrangement sequence of the participles in the hot word list through traversal operation, taking the selected participle as a search word, and carrying out vectorization processing on the search word to obtain a search word vector;
extracting chat keywords in the initial chat records, and performing vectorization processing on the chat keywords to obtain keyword vectors;
calculating the similarity between the search word vector and the keyword vector, and selecting the problem corresponding to the keyword with the similarity being greater than or equal to a preset similarity threshold as the problem corresponding to the search word.
5. The method according to claim 1, wherein the performing feature coding on the corpus to obtain corpus vectors comprises:
summarizing the total number of the corpuses of the training corpuses in the original question-answer data set to obtain the total number of the corpuses;
constructing to obtain an initial matrix vector by taking the training corpus as the row number of a preset matrix and taking the total corpus as the column number of the preset matrix;
and setting the position of the column number corresponding to the corpus in the initial matrix vector as a first numerical value, and setting the rest column numbers as second numerical values to obtain the corpus vector.
6. The method for generating question-answer records according to claim 1, wherein the training of the preset multi-classification model by using the training corpus to obtain the question intention classification model comprises:
classifying the training corpus by using the preset multi-classification model to obtain one or more classification intents;
labeling the intention category in the original question-answer data set, and calculating the repetition degree between the intention category and the classification intention;
when the repetition degree is smaller than a preset classification threshold value, carrying out iterative updating on the preset classification model, and classifying the final expression vector again;
and when the repetition degree is greater than or equal to a preset classification threshold value, obtaining a problem intention classification model.
7. The method according to any one of claims 1 to 5, wherein the solving the question according to the question intention and generating a question-answer record includes:
selecting a corresponding preset question-answer database according to the question intention, performing matching processing on the question by using the question-answer database, and judging whether the question is matched with a question in the question-answer database;
if the question is matched with the question in the question-answer database, taking the answer corresponding to the question in the question-answer database as the answer of the question, and generating a question-answer record according to the question and the answer;
if the question is not matched with the question in the question-answer database, marking the question as an unanswered question and performing question answering on the unanswered question to obtain an answer of the unanswered question;
and generating a question-answer record according to the unanswered question and the answer of the unanswered question.
8. A question-answer record generation apparatus, characterized in that the apparatus comprises:
the segmentation extraction module is used for performing segmentation processing on the obtained chat records and counting the occurrence frequency of each segmentation;
the hot word list generating module is used for summarizing the participles with the frequency larger than a preset threshold value to obtain a hot participle set, and sequencing the participles in the hot participle set according to the frequency to generate a hot word list;
the record retrieval module is used for sequentially selecting one of the participles according to the arrangement sequence of the participles in the hot word list, and retrieving the selected participle in the chat records as a retrieval word to obtain a problem corresponding to the retrieval word;
the corpus generating module is used for acquiring an original question-answer data set, extracting each flow node in the original question-answer data set and corpus data corresponding to the flow node, and marking and combining the flow nodes of the corpus data to obtain a corpus;
the model training module is used for carrying out feature coding on the training corpus to obtain a training corpus vector, and training a preset multi-classification model by using the training corpus to obtain a problem intention classification model;
and the question-answer record generating module is used for inputting the retrieved questions into the question intention classification model to obtain question intentions, answering the questions according to the question intentions to generate question-answer records, and pushing the question-answer records to the client.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the question-answer record generation method according to any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the question-answer record generation method according to any one of claims 1 to 7.
CN202110429297.3A 2021-04-21 2021-04-21 Question and answer record generation method and device, electronic equipment and storage medium Pending CN113111159A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110429297.3A CN113111159A (en) 2021-04-21 2021-04-21 Question and answer record generation method and device, electronic equipment and storage medium
PCT/CN2022/087818 WO2022222942A1 (en) 2021-04-21 2022-04-20 Method and apparatus for generating question and answer record, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110429297.3A CN113111159A (en) 2021-04-21 2021-04-21 Question and answer record generation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113111159A true CN113111159A (en) 2021-07-13

Family

ID=76719045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110429297.3A Pending CN113111159A (en) 2021-04-21 2021-04-21 Question and answer record generation method and device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN113111159A (en)
WO (1) WO2022222942A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022222942A1 (en) * 2021-04-21 2022-10-27 康键信息技术(深圳)有限公司 Method and apparatus for generating question and answer record, electronic device, and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116662523B (en) * 2023-08-01 2023-10-20 宁波甬恒瑶瑶智能科技有限公司 Biochemical knowledge question-answering method, system and storage medium based on GPT model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106878819A (en) * 2017-01-20 2017-06-20 合网络技术(北京)有限公司 The method, system and device of information exchange in a kind of network direct broadcasting
US20180174020A1 (en) * 2016-12-21 2018-06-21 Microsoft Technology Licensing, Llc Systems and methods for an emotionally intelligent chat bot
CN110321564A (en) * 2019-07-05 2019-10-11 浙江工业大学 A kind of more wheel dialogue intension recognizing methods
CN112347760A (en) * 2020-11-16 2021-02-09 北京京东尚科信息技术有限公司 Method and device for training intention recognition model and method and device for recognizing intention
CN112650829A (en) * 2019-10-11 2021-04-13 阿里巴巴集团控股有限公司 Customer service processing method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304437B (en) * 2017-09-25 2020-01-31 腾讯科技(深圳)有限公司 automatic question answering method, device and storage medium
CN108388558B (en) * 2018-02-07 2022-04-19 平安普惠企业管理有限公司 Question matching method and device, customer service robot and storage medium
CN110232114A (en) * 2019-05-06 2019-09-13 平安科技(深圳)有限公司 Sentence intension recognizing method, device and computer readable storage medium
CN112632257A (en) * 2020-12-29 2021-04-09 深圳赛安特技术服务有限公司 Question processing method and device based on semantic matching, terminal and storage medium
CN113111159A (en) * 2021-04-21 2021-07-13 康键信息技术(深圳)有限公司 Question and answer record generation method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180174020A1 (en) * 2016-12-21 2018-06-21 Microsoft Technology Licensing, Llc Systems and methods for an emotionally intelligent chat bot
CN106878819A (en) * 2017-01-20 2017-06-20 合网络技术(北京)有限公司 The method, system and device of information exchange in a kind of network direct broadcasting
CN110321564A (en) * 2019-07-05 2019-10-11 浙江工业大学 A kind of more wheel dialogue intension recognizing methods
CN112650829A (en) * 2019-10-11 2021-04-13 阿里巴巴集团控股有限公司 Customer service processing method and device
CN112347760A (en) * 2020-11-16 2021-02-09 北京京东尚科信息技术有限公司 Method and device for training intention recognition model and method and device for recognizing intention

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022222942A1 (en) * 2021-04-21 2022-10-27 康键信息技术(深圳)有限公司 Method and apparatus for generating question and answer record, electronic device, and storage medium

Also Published As

Publication number Publication date
WO2022222942A1 (en) 2022-10-27

Similar Documents

Publication Publication Date Title
Zhang et al. The gap of semantic parsing: A survey on automatic math word problem solvers
CN112035636B (en) Question and answer management method, device and equipment of medical question and answer system and storage medium
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
CN113312461A (en) Intelligent question-answering method, device, equipment and medium based on natural language processing
CN111078837A (en) Intelligent question and answer information processing method, electronic equipment and computer readable storage medium
CN110134777B (en) Question duplication eliminating method and device, electronic equipment and computer readable storage medium
CN113707303A (en) Method, device, equipment and medium for solving medical problems based on knowledge graph
CN113111162A (en) Department recommendation method and device, electronic equipment and storage medium
WO2022222942A1 (en) Method and apparatus for generating question and answer record, electronic device, and storage medium
CN112906377A (en) Question answering method and device based on entity limitation, electronic equipment and storage medium
CN113886708A (en) Product recommendation method, device, equipment and storage medium based on user information
CN112035757A (en) Medical waterfall flow pushing method, device, equipment and storage medium
CN112667775A (en) Keyword prompt-based retrieval method and device, electronic equipment and storage medium
CN113722483A (en) Topic classification method, device, equipment and storage medium
CN114860887A (en) Disease content pushing method, device, equipment and medium based on intelligent association
CN111581623A (en) Intelligent data interaction method and device, electronic equipment and storage medium
CN114416939A (en) Intelligent question and answer method, device, equipment and storage medium
CN114969484A (en) Service data searching method, device, equipment and storage medium
CN113918704A (en) Question-answering method and device based on machine learning, electronic equipment and medium
CN113821622A (en) Answer retrieval method and device based on artificial intelligence, electronic equipment and medium
US20230394236A1 (en) Extracting content from freeform text samples into custom fields in a software application
CN113157867A (en) Question answering method and device, electronic equipment and storage medium
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN112199958A (en) Concept word sequence generation method and device, computer equipment and storage medium
US20210165835A1 (en) Computer driven question identification and understanding within a commerical tender document for automated bid processing for rapid bid submission and win rate enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210713