CN111159370A - Short-session new problem generation method, storage medium and man-machine interaction device - Google Patents

Short-session new problem generation method, storage medium and man-machine interaction device Download PDF

Info

Publication number
CN111159370A
CN111159370A CN201911321137.6A CN201911321137A CN111159370A CN 111159370 A CN111159370 A CN 111159370A CN 201911321137 A CN201911321137 A CN 201911321137A CN 111159370 A CN111159370 A CN 111159370A
Authority
CN
China
Prior art keywords
new
candidate new
candidate
question
short
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911321137.6A
Other languages
Chinese (zh)
Inventor
杨雷
李昱
王全礼
唐汇
蒲柯锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN201911321137.6A priority Critical patent/CN111159370A/en
Publication of CN111159370A publication Critical patent/CN111159370A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The embodiment discloses a method for generating a new short-session problem, which is used for acquiring a new candidate problem; forming a similar candidate new problem set, and clustering the candidate new problems according to the similarity degree to obtain a similar candidate new problem set; extracting tag words, and extracting service words with preset word frequency in the similar candidate new problem set as the tag words of the similar candidate new problem set; generating a main sentence, and generating the main sentence according to the content of the similar candidate new question set; and generating a new question according to a preset number of candidate new questions, the main sentence and the label words in the similar candidate new question set. In the embodiment, the similarity between the contents of the candidate new questions is considered, the new questions with similar business meanings and provided from multiple angles can be accurately clustered and matched with accurate answers, the business intention understanding capacity is closer to a real application scene, and the method is higher in applicability and applicable to various business scenes.

Description

Short-session new problem generation method, storage medium and man-machine interaction device
Technical Field
The invention relates to the field of artificial intelligence, in particular to a new problem generation method based on short conversation, a storage medium and a man-machine interaction device.
Background
With the development of the internet, artificial intelligence has been developed greatly, a human-computer interaction device is frequently used in the scenes of question and answer in many fields, such as shopping, financing, government affairs, customer service and the like, and the human-computer interaction device can accurately and efficiently respond to repeated problems, so that the working strength of customer service personnel is reduced, even the traditional artificial customer service is replaced by the intelligent customer service, the customer service efficiency can be greatly improved, the waiting time of a user is shortened, professional customer service is provided for 7x24 hours, and the defect that the service cannot be normally provided after the artificial customer service goes off duty can be overcome.
The intelligence degree of the man-machine interaction device is determined by the richness of a question-answer knowledge database of the man-machine interaction device, the stronger the conversation and exchange capacity between the robot and a client is, the question-answer knowledge database of the man-machine interaction device in the related technology is often preset, a standard question set and a corresponding standard answer set are usually set, and only questions matched with the standard questions can be identified, so that the question-answer knowledge database has larger limitation and cannot identify different expression modes of different clients, meanwhile, the man-machine interaction device in the related technology has no self-learning capacity, cannot deal with newly proposed questions of the client, and the user experience is poor.
In order to solve the problems that a question-answer knowledge base of a man-machine interaction device is single, the understanding capability of business intentions is poor, and independent learning and updating cannot be achieved, new keywords are generally used as new problems to update the question-answer knowledge base in the related technology, and therefore the independent learning and business intentions understanding capability of the man-machine interaction device is improved. However, the question-answer knowledge base updated in this way is often not representative, or cannot be accurately matched with the service intention, which may cause the human-computer interaction device to make an incorrect response when facing a new question of the user, not only does not improve the user experience, but also has a risk of misleading the user. Therefore, it is desirable to provide a reliable method for improving the autonomous learning and identification of the human-computer interaction device and generating new questions to update the question-answer knowledge base, so as to effectively improve the service intention understanding capability of the human-computer interaction device.
Disclosure of Invention
In order to overcome the defects in the related art, the invention aims to provide a reliable method for improving the autonomous learning and identification of the human-computer interaction device and generating a new question and answer knowledge base so as to effectively improve the service intention understanding capability of the human-computer interaction device.
The method for generating the short-session new problem comprises the steps of obtaining a candidate new problem;
forming a similar candidate new problem set, and clustering the candidate new problems according to the similarity degree to obtain a similar candidate new problem set;
extracting tag words, and extracting service words with preset word frequency in the similar candidate new problem set as the tag words of the similar candidate new problem set;
generating a main sentence, and generating the main sentence according to the content of the similar candidate new question set;
and generating a new question according to a preset number of candidate new questions, the main sentence and the label words in the similar candidate new question set.
Further, the main sentence is a candidate new question with a predetermined frequency in the candidate new question set or a sentence formed in an induction manner according to the candidate question set.
Further, the label words are service words with a predetermined frequency in the candidate new question set.
Further, obtaining candidate new questions comprises: reading short conversation content, calculating the ratio of the comprehensive similarity of the new problems in the short conversation content to the comprehensive similarity of the problems in the standard problem library, and if the ratio meets a first preset condition, taking the new problems in the short conversation content as candidate new problems.
Further, forming a similar candidate new problem set, clustering the candidate new problems according to the similarity degree, and obtaining the similar candidate new problem set includes: selecting a candidate new question, generating a first candidate new question main sentence and a first candidate new question set, reading a next candidate new question, calculating the ratio of the comprehensive similarity of the first candidate new question to the comprehensive similarity of the next candidate new question, and adding the read next candidate new question into the first candidate new question set if the ratio meets a second preset condition.
Further, the first preset condition is that the ratio is less than 0.8, and the second preset condition is that the ratio is greater than or equal to 0.8.
Further, the comprehensive similarity calculation method is as follows:
Q=C·α+W·β+S·γ
wherein Q is the composite similarity score;
c is the similarity calculated based on the content, α is the weight value corresponding to the similarity calculated based on the content, α takes a value at [0.7,0.9 ];
w is similarity calculated based on keyword hit, β is a weight value corresponding to the similarity calculated based on keyword hit, and β takes a value in the range of [0.05-0.15 ];
s: and the similarity calculated based on the word sequence, wherein gamma is a weighted value corresponding to the similarity calculated based on the word sequence, and the value of gamma is [0.05,0.15 ].
Specifically, the α is 0.9, the β is 0.05, and the gamma is 0.05.
Further, daily conversations, tone words, special characters, single byte content, numbers or repeated content in the short-conversation content are filtered out before new candidate questions are obtained.
According to the scheme of the invention, the candidate new questions are clustered according to the similarity to form a similar candidate new question set, then the similar candidate new questions and the label words are selected to generate new questions according to the subject sentences, the similarity between the contents of the candidate new questions is considered, the new questions with similar business meanings and provided from multiple angles can be accurately clustered, accurate answers are matched, the business intention understanding capacity is closer to a real application scene, the applicability is stronger, the method is applicable to multiple business scenes, in addition, the autonomous learning capacity of the question-answer knowledge database can be improved by applying the method, and the operability is stronger.
Another aspect of the invention provides a computer storage medium storing a computer program which, when executed by a processor, implements a method as described above.
Yet another aspect of the present invention provides a human-computer interaction device comprising a memory, a processor and a short-session new question generation program stored on the memory and executable on the processor, the short-session new question generation program, when executed by the processor, implementing the method as defined in any one of the above.
The storage medium and the man-machine interaction device of the invention have the same technical effects as the method when the method is operated, and are not described again.
Drawings
FIG. 1 is a flow chart of the short session new question generation of the present invention
FIG. 2 is a flow chart of content cleaning for human-computer interaction according to the present invention
FIG. 3 is a flow chart of label word generation according to the present invention
FIG. 4 is a flowchart of the present invention for forming a new set of similar candidates
FIG. 5 is a schematic diagram of a human-computer interaction device according to the present invention
Detailed Description
The embodiments of the invention will be described in detail below with reference to the drawings, but the invention can be implemented in many different ways as defined and covered by the claims. In addition, the steps in the content parts of this embodiment and the drawings are not limited to be in a sequential order, and the sub-steps can be implemented synchronously or in a reversed order within the spirit of the present invention.
Example 1:
the generation of new problems in short conversation based on human-computer interaction is one of core function modules of a human-computer interaction system, newly formed service vocabularies are extracted through analysis of a large number of client interaction streams, a service word bank and a question-answer knowledge bank are enriched, the service intention understanding capacity of the system can be effectively improved, and the user experience is enhanced. To overcome the shortcomings in the related art, as shown in fig. 1, embodiments of the present invention provide a method for generating a new question to update a question-answer knowledge base by improving autonomous learning and recognition of a human-computer interaction device and generating a new question, so as to effectively improve the service intention understanding capability of the human-computer interaction device. The human-computer interaction device in the embodiment comprises a customer service robot, and the customer service robot is used for replacing the human-computer interaction device in part of expressions.
The method for generating the new short-session questions comprises the steps of obtaining new candidate questions, wherein the new candidate questions related to the implementation are new questions which are not included in an original question and answer knowledge database, the original question and answer knowledge database is also called a standard question and answer knowledge database or the standard question and answer knowledge database is simply called the standard question and answer database, and if the similarity of the new questions in the short-session and the questions in the original question and answer knowledge database exceeds a threshold value after the new questions are compared in the mode provided by the implementation, the new questions can be updated to a corresponding question set of standard question and answer data to serve as a new added element of the corresponding question set.
Reading short conversation content, namely analyzing a large amount of input human-computer interaction flow, extracting sentences with business meanings, comparing the sentences with relevant questions in a standard question-answer knowledge base in comprehensive similarity, firstly calculating the ratio of the comprehensive similarity of new questions in the short conversation content to the comprehensive similarity of the questions in the standard question-answer knowledge base, and if the ratio meets a first preset condition, taking the new questions in the short conversation content as candidate new questions. The first preset condition is set as described above in order to identify a question that is not included in the standard knowledge base of question and answer as a material for subsequent processing. Since the present embodiment is directed to new problems, one problem includes several aspects, for example, the business fields, contexts, the precedence of vocabularies, etc. all affect the business meanings of the problems, and the precedence of different business fields, contexts, vocabularies, etc. all change the specific business meanings, for this reason, the inventor has made intensive research and proposed a comprehensive similarity calculation method that comprehensively considers the sentence contents, keywords, and word orders, so as to screen out candidate new problems that can point to nearly the same business meaning, and the comprehensive similarity Q calculation mode is as follows:
Q=C·α+W·β+S·γ
wherein Q is the composite similarity score;
c is the similarity calculated based on the content, α is the weight value corresponding to the similarity calculated based on the content, α takes a value of [0.7,0.9], preferably α is set to 0.9, and the value of α can be selected to be 0.8 according to the needs;
w is the similarity calculated based on the keyword hit, β is the weight value corresponding to the similarity calculated based on the keyword hit, β takes a value of [0.05,0.15], preferably β is set to 0.05, and the value of β can be selected to be 0.10 or 0.15 as required;
s: based on the similarity calculated by the word sequence, γ is a weight value corresponding to the similarity calculated by the word sequence, γ is a value of [0.05,0.15], preferably γ is set to 0.05, and the value of γ may be selected to be 0.10 or 0.15 as required.
And if the ratio meets a first preset condition, taking a new problem in the short session content as a candidate new problem. In the embodiment, the first preset condition is set to be that when the ratio of the integrated similarity value of the new question in the short session to the similarity value of the question in the standard question-answer knowledge base is less than 0.8, the new question is considered not to be included in the annotation question-answer database, and at this time, the new question is taken as a candidate new question.
Forming a similar candidate new problem set, and clustering the candidate new problems according to the similarity degree to obtain a similar candidate new problem set; the content of the man-machine interaction flow is the content of communication between a plurality of users and the customer service robot, different questions and expression modes are provided for different clients for consultation with the same service meaning, in order to normalize similar problems, the problems expressing the same or similar service meaning need to be clustered and treated as a problem type, and therefore the robustness of the man-machine interaction system can be improved.
In this embodiment, when performing clustering processing on a problem generated in a human-computer interaction process, a comprehensive similarity processing manner is also adopted, as shown in fig. 4, specifically, a candidate new problem determined from short session contents and not subjected to comparison and judgment is used as a first candidate new problem Ri, a first candidate new problem subject sentence and a first candidate new problem set are generated by taking the first candidate new problem as a center, then a next candidate new problem Rj is read, whether reading the next candidate new problem Rj has been subjected to comparison and judgment is judged, if the next candidate new problem Rj has not been subjected to comparison and judgment, the new problem Rj is compared with the first candidate new problem Ri, and if the comprehensive similarity Q thereof satisfies a second preset condition, it is considered that the first candidate new problem Ri and the read next candidate new problem Rj express the same business meaning and belong to the same class of problems, the next candidate new question Rj is configured into the first candidate new question set, and a label that has been compared and determined with the first candidate new question Ri is set for the next candidate new question Rj, such as being marked as processed or determined. And then reading the j +1 th candidate new problem Rj +1 to execute the comparison judgment. If the first candidate new problem Ri and the next candidate new problem Rj do not satisfy the second preset condition after the comparison and judgment in the judgment and comparison logic, setting a judgment label which is already compared with the first candidate new problem Ri for the next candidate new problem Rj, if the judgment label is marked as processed or judged, and the like, and not configuring the next candidate new problem Rj into the first candidate new problem set. And after the first candidate new problem Ri is compared with other candidate new problems to be compared, selecting a second candidate new problem Ri +1 to execute the comparison logic. In the comprehensive similarity calculation method in the present embodiment, the comprehensive similarity calculation method is as follows:
Q=C·α+W·β+S·γ
wherein Q is the composite similarity score;
c is the similarity calculated based on the content, α is the weight value corresponding to the similarity calculated based on the content, α takes a value of [0.7,0.9], preferably α is set to 0.9, and the value of α can be selected to be 0.8 according to the needs;
w is the similarity calculated based on the keyword hit, β is the weight value corresponding to the similarity calculated based on the keyword hit, β takes a value of [0.05,0.15], preferably β is set to 0.05, and the value of β can be selected to be 0.10 or 0.15 as required;
s: based on the similarity calculated by the word sequence, γ is a weight value corresponding to the similarity calculated by the word sequence, γ is a value of [0.05,0.15], preferably γ is set to 0.05, and the value of γ may be selected to be 0.10 or 0.15 as required.
In this step of this embodiment, in order to cluster new candidate problems, new candidate problems expressing nearly the same or the same business meaning are treated as the most, and in order to cluster new candidate problems of the same kind as much as possible, the inventor sets the second preset condition as the comprehensive similarity of the first new candidate problem and the comprehensive similarity ratio of the next new candidate problem to be compared to be equal to or greater than 0.8 after intensive research. This effectively aggregates the approximate candidate new problems.
Extracting tag words, and extracting service words with preset word frequency in the similar candidate new problem set as the tag words of the similar candidate new problem set; the word segmentation processing is performed after reading the candidate new questions in each similar candidate new question set, and the keywords having the business meaning are extracted, in this embodiment, a predetermined number of candidate new questions may be read for the candidate new questions in the candidate new question set, or the candidate new questions may be read one by one, the word segmentation processing scheme is selected for the read candidate new questions to perform word segmentation, and the words having the business meaning are extracted. And extracting the service vocabularies of the candidate new problems, performing word frequency statistics on the service vocabularies, sequencing, and taking the service vocabularies with the preset word frequency as the label vocabularies of the candidate new problem set. For example, in order to select the high-frequency service vocabulary with the top three ranked digits as the tagged vocabulary, the number of the tagged vocabulary is not particularly limited, and a plurality of tagged words capable of representing the key features of the candidate new problem set may be selected. For example, as shown in fig. 3, a candidate new problem set Ci is first selected, candidate new problems Rj are sequentially selected, the candidate new problems Rj are participated, then the service words in the word segmentation result are subjected to word frequency statistical sorting, whether a service word capable of representing the characteristics of the candidate new problem exists in the candidate new problem Rj is judged, if the service word exists, the service word is used as one of the candidate tagged words in the similar candidate new problem set Ci to be used as a word frequency statistical object of all subsequent candidate tagged words, then the next candidate new problem Rj +1 is read to be processed as above, and if the candidate new problem has no service word capable of representing the characteristics of the candidate new problem after word segmentation in the judgment, the next candidate new problem Rj +1 is read to continue the processing. And finally, selecting a business vocabulary with a preset word frequency as a label word of the similar candidate new question set Ci. After the similar candidate new problem set Ci is processed, processing of the next similar candidate new problem set Ci +1 is continued, and the configuration process of the label words of the similar candidate new problem set Ci +1 is the same as the configuration of the label words of the similar candidate new problem set Ci, and details are not repeated here.
Generating a main sentence, and generating the main sentence according to the content of the similar candidate new question set; the main sentence is a sentence capable of summarizing most or even all of the candidate new problems in the similar candidate new problem set, and the main sentence can select the candidate new problems with high occurrence frequency or can be a sentence comprehensively summarizing the candidate new problems in the similar candidate new problem set. The summary of the subject sentence in the present embodiment may be formed based on a natural language technique, or may be artificially summarized, and is not particularly limited in this example.
And generating a new question according to a preset number of candidate new questions, the main sentence and the label words in the similar candidate new question set.
Due to the characteristics of uncertainty, diversity and the like in the human-computer interaction process, the interaction stream content comprises a large amount of interference session content which is not related to the service, such as daily conversation, extremely short content of 1-2 words, a string of special characters or numbers and the like. Therefore, the request content in the client session process needs to be cleaned, the interfering and repeated request content is removed, and only the request stream content with certain business meaning is reserved. As shown in fig. 2, the present embodiment filters out daily conversations, tone words, special characters, single-byte content, numbers, or repeated content in the short-session content before acquiring the new candidate question.
In the embodiment, the candidate new questions are clustered according to the similarity to form a similar candidate new question set, then the similar candidate new questions and the label words are selected to generate new questions according to the subject sentences, the similarity among the contents of the candidate new questions is considered, the new questions with similar business meanings and proposed from multiple angles can be accurately clustered, accurate responses are matched, the business intention understanding capacity is closer to a real application scene, the applicability is stronger, and the method is applicable to multiple business scenes.
In conclusion, the autonomous learning capability of the question-answer knowledge database can be improved, the operability is higher, meanwhile, new questions with multiple functional dimensions of similar candidate new question set attributes, label word attributes and associated similar questions are generated through multiple steps of interactive content cleaning, question comparison between candidate new questions and a standard question-answer knowledge base, candidate new question cluster analysis, label word extraction of similar candidate new question sets, new question generation and the like, and multiple attributes such as the similar candidate new question set attributes, the label word attributes and the associated similar questions are given to a single new question, so that the applicable service scene is more diversified, and the practicability is higher; the comprehensive similarity calculation method, the cluster analysis method and the label vocabulary extraction method have the advantages of flexibility, practicability, simple calculation, flexible method and configurable parameters.
Example 2
The present embodiment provides a computer storage medium storing a computer program which, when executed by a processor, implements the method according to embodiment 1. Since the storage medium in the present embodiment stores the executable program for implementing the method of embodiment 1, the same technical effects as those of the embodiment are obtained.
Example 3
The embodiment provides a human-computer interaction device, which includes a memory, a processor, and a short-session new question generation program stored in the memory and executable on the processor, and when executed by the processor, the short-session new question generation program implements the method according to embodiment 1.
The man-machine interaction device in the implementation can be a customer service robot, such as a service robot in the financial field and a man-machine interaction system for inquiring and providing business handling guidance in the government field, the form of the man-machine interaction device in the implementation in the year is not particularly limited, the man-machine interaction device can be a device with an independent entity structure, and an application program can also run on intelligent equipment such as a smart phone, a PAD and a computer. The present invention may also be a human-computer interaction system that performs remote interaction via the internet, and the embodiments do not exclude a system that is separated in form and substantially implements the present solution, for example, a user interaction interface is local, and a system in which a computing device is remote or in the cloud also belongs to the protection scope of the present solution.
Since the storage medium of the human-computer interaction device in this embodiment stores the executable program for implementing the method in embodiment 1, the same technical effects as those in embodiment 1 are achieved, and details are not described herein again.
It should be particularly noted that any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and that the scope of the preferred embodiments of the present invention includes additional implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments. In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents, which are to be considered as merely preferred embodiments of the invention, and not intended to be limiting of the invention, and that various changes and modifications may be effected therein by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (11)

1. A method for generating a short-session new question is characterized in that:
acquiring a candidate new question;
forming a similar candidate new problem set, and clustering the candidate new problems according to the similarity degree to obtain a similar candidate new problem set;
extracting tag words, and extracting service words with preset word frequency in the similar candidate new problem set as the tag words of the similar candidate new problem set;
generating a main sentence, and generating the main sentence according to the content of the similar candidate new question set;
and generating a new question according to a preset number of candidate new questions, the main sentence and the label words in the similar candidate new question set.
2. The short-session new question generation method of claim 1, characterized by: the main sentence is a candidate new question with a preset frequency in the candidate new question set or a sentence formed in an induction mode according to the candidate question set.
3. The short-session new question generation method of claim 2, characterized in that: and the label words are service words with preset frequency in the candidate new question set.
4. The short-session new question generation method of claim 1, characterized by: obtaining candidate new questions includes: reading short conversation content, calculating the ratio of the comprehensive similarity of the new problems in the short conversation content to the comprehensive similarity of the problems in the standard problem library, and if the ratio meets a first preset condition, taking the new problems in the short conversation content as candidate new problems.
5. The short-session new question generation method of claim 1, characterized by: forming a similar candidate new problem set, clustering the candidate new problems according to the similarity degree, and obtaining the similar candidate new problem set comprises the following steps: selecting a candidate new question, generating a first candidate new question main sentence and a first candidate new question set, reading a next candidate new question, calculating the ratio of the comprehensive similarity of the first candidate new question to the comprehensive similarity of the next candidate new question, and adding the read next candidate new question into the first candidate new question set if the ratio meets a second preset condition.
6. The short-session new question generation method of claim 4 or 5, characterized by: the first preset condition is that the ratio is less than 0.8, and the second preset condition is that the ratio is greater than or equal to 0.8.
7. The short-session new question generation method of claim 4 or 5, characterized by: the comprehensive similarity calculation mode is as follows:
Q=C·α+W·β+S·γ
wherein Q is the composite similarity score;
c is the similarity calculated based on the content, α is the weight value corresponding to the similarity calculated based on the content, α takes a value in [0.7-0.9 ];
w is similarity calculated based on keyword hit, β is a weight value corresponding to the similarity calculated based on keyword hit, and β takes a value in the range of [0.05-0.15 ];
s: and calculating the similarity based on the word sequence, wherein gamma is a weight value corresponding to the similarity calculated based on the word sequence, and the value of gamma is [0.05-0.15 ].
8. The short-session new question generation method of claim 7, wherein said α is 0.9, said β is 0.05, and said γ is 0.05.
9. The short-session new question generation method of claim 1, characterized by: daily conversations, tone words, special characters, single byte content, numbers or repeated content in the short conversation content are filtered before new candidate questions are obtained.
10. A computer storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method according to any one of claims 1 to 9.
11. A human-computer interaction device, characterized in that the human-computer interaction device comprises a memory, a processor and a short-session new question generation program stored on the memory and executable on the processor, which when executed by the processor implements the method according to any one of claims 1 to 9.
CN201911321137.6A 2019-12-20 2019-12-20 Short-session new problem generation method, storage medium and man-machine interaction device Pending CN111159370A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911321137.6A CN111159370A (en) 2019-12-20 2019-12-20 Short-session new problem generation method, storage medium and man-machine interaction device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911321137.6A CN111159370A (en) 2019-12-20 2019-12-20 Short-session new problem generation method, storage medium and man-machine interaction device

Publications (1)

Publication Number Publication Date
CN111159370A true CN111159370A (en) 2020-05-15

Family

ID=70557411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911321137.6A Pending CN111159370A (en) 2019-12-20 2019-12-20 Short-session new problem generation method, storage medium and man-machine interaction device

Country Status (1)

Country Link
CN (1) CN111159370A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737440A (en) * 2020-07-31 2020-10-02 支付宝(杭州)信息技术有限公司 Question generation method and device
CN112287069A (en) * 2020-10-29 2021-01-29 平安科技(深圳)有限公司 Information retrieval method and device based on voice semantics and computer equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000339314A (en) * 1999-05-25 2000-12-08 Nippon Telegr & Teleph Corp <Ntt> Automatic answering method, dialog analyzing method, answer sentence generating method and their device and medium with their program recorded thereon
CN103744889A (en) * 2013-12-23 2014-04-23 百度在线网络技术(北京)有限公司 Method and device for clustering problems
CN107153639A (en) * 2016-03-04 2017-09-12 北大方正集团有限公司 Intelligent answer method and system
CN108345644A (en) * 2018-01-15 2018-07-31 阿里巴巴集团控股有限公司 A kind of method and device of data processing
CN108804567A (en) * 2018-05-22 2018-11-13 平安科技(深圳)有限公司 Improve method, equipment, storage medium and the device of intelligent customer service response rate
CN110134777A (en) * 2019-05-29 2019-08-16 三角兽(北京)科技有限公司 Problem De-weight method, device, electronic equipment and computer readable storage medium
CN110309377A (en) * 2018-03-22 2019-10-08 阿里巴巴集团控股有限公司 Semanteme normalization puts question to generation, the response of mode to determine method and device
CN110555101A (en) * 2019-09-09 2019-12-10 浙江诺诺网络科技有限公司 customer service knowledge base updating method, device, equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000339314A (en) * 1999-05-25 2000-12-08 Nippon Telegr & Teleph Corp <Ntt> Automatic answering method, dialog analyzing method, answer sentence generating method and their device and medium with their program recorded thereon
CN103744889A (en) * 2013-12-23 2014-04-23 百度在线网络技术(北京)有限公司 Method and device for clustering problems
CN107153639A (en) * 2016-03-04 2017-09-12 北大方正集团有限公司 Intelligent answer method and system
CN108345644A (en) * 2018-01-15 2018-07-31 阿里巴巴集团控股有限公司 A kind of method and device of data processing
CN110309377A (en) * 2018-03-22 2019-10-08 阿里巴巴集团控股有限公司 Semanteme normalization puts question to generation, the response of mode to determine method and device
CN108804567A (en) * 2018-05-22 2018-11-13 平安科技(深圳)有限公司 Improve method, equipment, storage medium and the device of intelligent customer service response rate
CN110134777A (en) * 2019-05-29 2019-08-16 三角兽(北京)科技有限公司 Problem De-weight method, device, electronic equipment and computer readable storage medium
CN110555101A (en) * 2019-09-09 2019-12-10 浙江诺诺网络科技有限公司 customer service knowledge base updating method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737440A (en) * 2020-07-31 2020-10-02 支付宝(杭州)信息技术有限公司 Question generation method and device
CN112287069A (en) * 2020-10-29 2021-01-29 平安科技(深圳)有限公司 Information retrieval method and device based on voice semantics and computer equipment
CN112287069B (en) * 2020-10-29 2023-07-25 平安科技(深圳)有限公司 Information retrieval method and device based on voice semantics and computer equipment

Similar Documents

Publication Publication Date Title
CN108920467B (en) Method and device for learning word meaning of polysemous word and search result display method
CN112346567B (en) Virtual interaction model generation method and device based on AI (Artificial Intelligence) and computer equipment
CN111783474A (en) Comment text viewpoint information processing method and device and storage medium
CN108319720A (en) Man-machine interaction method, device based on artificial intelligence and computer equipment
CN111581966A (en) Context feature fusion aspect level emotion classification method and device
CN107436916B (en) Intelligent answer prompting method and device
CN110347840A (en) Complain prediction technique, system, equipment and the storage medium of text categories
CN112163081A (en) Label determination method, device, medium and electronic equipment
CN111816170B (en) Training of audio classification model and garbage audio recognition method and device
CN114860913B (en) Intelligent question-answering system construction method, question-answering processing method and device
CN113407677A (en) Method, apparatus, device and storage medium for evaluating quality of consultation session
CN111159370A (en) Short-session new problem generation method, storage medium and man-machine interaction device
CN111125327A (en) Short-session-based new word discovery method, storage medium and electronic device
CN110502752A (en) A kind of text handling method, device, equipment and computer storage medium
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN113392205A (en) User portrait construction method, device and equipment and storage medium
CN116402166A (en) Training method and device of prediction model, electronic equipment and storage medium
CN115017271B (en) Method and system for intelligently generating RPA flow component block
US20230237276A1 (en) System and Method for Incremental Estimation of Interlocutor Intents and Goals in Turn-Based Electronic Conversational Flow
CN116303951A (en) Dialogue processing method, device, electronic equipment and storage medium
US11817089B2 (en) Generating aspects from attributes identified in digital video audio tracks
CN112115248B (en) Method and system for extracting dialogue strategy structure from dialogue corpus
CN111949777A (en) Intelligent voice conversation method and device based on crowd classification and electronic equipment
CN110232328A (en) A kind of reference report analytic method, device and computer readable storage medium
CN117453895B (en) Intelligent customer service response method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220908

Address after: 25 Financial Street, Xicheng District, Beijing 100033

Applicant after: CHINA CONSTRUCTION BANK Corp.

Address before: 25 Financial Street, Xicheng District, Beijing 100033

Applicant before: CHINA CONSTRUCTION BANK Corp.

Applicant before: Jianxin Financial Science and Technology Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200515