CN113535926A - Active dialogue method, device and voice terminal - Google Patents

Active dialogue method, device and voice terminal Download PDF

Info

Publication number
CN113535926A
CN113535926A CN202110855181.6A CN202110855181A CN113535926A CN 113535926 A CN113535926 A CN 113535926A CN 202110855181 A CN202110855181 A CN 202110855181A CN 113535926 A CN113535926 A CN 113535926A
Authority
CN
China
Prior art keywords
topic
target
standby
user
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110855181.6A
Other languages
Chinese (zh)
Other versions
CN113535926B (en
Inventor
熊为星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ubtech Robotics Corp
Original Assignee
Ubtech Robotics Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ubtech Robotics Corp filed Critical Ubtech Robotics Corp
Priority to CN202110855181.6A priority Critical patent/CN113535926B/en
Publication of CN113535926A publication Critical patent/CN113535926A/en
Application granted granted Critical
Publication of CN113535926B publication Critical patent/CN113535926B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application provides an active dialogue method, an active dialogue device and a voice terminal, wherein the method comprises the following steps: searching a target topic corresponding to the topic of interest of the user according to a preset topic switching table; generating a standby sentence corresponding to the target topic; finding out a target sentence with the highest score value of semantic matching from all the standby sentences; and outputting the target sentence voice. According to the active dialogue scheme, in order to realize active dialogue between the voice terminal and the user, the interested topic of the user is obtained in advance, and then the target sentence for the active dialogue between the voice terminal and the user is generated according to the interested topic of the user, so that the technical problems of low viscosity and low usability of the voice terminal caused by low relevance between the conventional passive dialogue and the user can be effectively solved.

Description

Active dialogue method, device and voice terminal
Technical Field
The invention relates to the technical field of computers, in particular to an active dialogue method, an active dialogue device and a voice terminal.
Background
Most of the voice terminals on the current market are passive style dialogue robots. The active conversation content of the conversation robot is also based on a series of contents listed in advance, and the robot chooses to actively initiate chat to the user by using the listed contents. The generated conversation content of the automatic conversation robot based on the generating formula is lack of topic continuity with the chat of the user, the generated chat content is hard and obtrusive, the user can generate a sense of separation, and the trust relationship of the user to the robot is difficult to establish.
Therefore, the existing conversation robot has the technical problem that the actual relevance between the generated conversation and the user is small.
Disclosure of Invention
In order to solve the above technical problem, embodiments of the present invention provide an active dialogue method, an active dialogue device, and a voice terminal.
In a first aspect, an embodiment of the present invention provides an active dialogue method, which is applied to a voice terminal, and the method includes:
searching a target topic corresponding to the topic of interest of the user according to a preset topic switching table;
generating a standby sentence corresponding to the target topic;
finding out a target sentence with the highest score value of semantic matching from all the standby sentences;
and outputting the target sentence voice.
According to a specific embodiment of the present disclosure, before the step of searching for a target topic corresponding to a topic of interest of a user according to a preset topic switching table, the method further includes:
acquiring a first type of historical dialogue between the user and the voice terminal;
extracting topics with the highest frequency in the first type of historical conversation according to a preset topic statistical method;
and taking the topic with the highest frequency as the topic of interest of the user.
According to a specific embodiment of the present disclosure, the step of generating the standby sentence corresponding to the target topic includes:
acquiring a second type of history dialogue between the user and the voice terminal;
and splicing the target topic and the corresponding ontology word with the second type of history dialogue of the user respectively to generate a standby sentence corresponding to the target topic.
According to a specific embodiment of the present disclosure, the step of generating a standby sentence corresponding to the target topic by splicing the target topic and the corresponding ontology word with the second type of history dialog of the user includes:
keeping a plurality of latest generated dialogues with time meeting preset requirements in the second type of historical dialogues;
splicing the target topic to each newly generated dialogue through a random strategy to generate a random statement;
calculating probability values of all random conversations;
and reserving a random statement with the probability value meeting a preset value as the standby statement corresponding to the target topic.
According to a specific embodiment of the present disclosure, the step of finding out the target sentence with the highest score value of semantic matching from all the alternative sentences includes:
extracting actual topics of each standby statement according to a preset topic statistical method;
if a first type of standby sentences of which the actual topics belong to any target topic exist;
obtaining score values of semantic matching between each first type of standby statement and each target topic;
and screening the standby statement with the largest score value as the target statement.
According to a specific embodiment of the present disclosure, after the step of extracting the actual topic of each alternative sentence, the method further includes:
if all the actual topics do not belong to any target topic, screening second standby sentences of which the actual topics belong to the interested topic;
obtaining score values of semantic matching between the second type of standby sentences and the interesting topics;
and screening the standby statement with the largest score value as the target statement.
According to a specific embodiment of the present disclosure, after the step of extracting the actual topic of each alternative sentence, the method further includes:
if all the actual topics do not belong to the target topic and the interesting topic, obtaining score values of semantic matching between each standby statement and the first type of historical conversation;
and screening the standby statement with the largest score value as the target statement.
According to a specific embodiment of the present disclosure, the step of semantic matching the standby sentence with the topic to obtain the score value includes:
respectively carrying out coding processing and pooling processing on the standby sentences and the topics to obtain first feature vectors corresponding to the standby sentences and second feature vectors corresponding to the topics;
and inputting the first feature vector and the second feature vector into a pre-trained vector correlation analysis model to obtain a score value of semantic matching.
According to a specific embodiment of the present disclosure, the preset topic statistical method includes:
segmenting words of the conversation, and recording basic topics to which the segmented words belong;
counting the occurrence frequency of each basic topic;
and taking the basic topic with the highest occurrence frequency as the topic of the conversation.
According to a specific embodiment of the present disclosure, the step of outputting the target sentence by voice includes:
determining a preset moment of the target statement;
and outputting the target sentence voice at the preset moment.
In a second aspect, an embodiment of the present invention provides an active dialog apparatus, including:
the searching module is used for searching a target topic corresponding to the topic of interest of the user according to a preset topic switching table;
the generating module is used for generating a standby sentence corresponding to the target topic;
the matching module is used for finding out a target sentence with the highest score value of semantic matching from all the standby sentences;
and the output module is used for outputting the target sentence voice.
In a third aspect, an embodiment of the present invention provides a voice terminal, including a memory and a processor, where the memory is used to store a computer program, and the computer program executes the active dialog method in any one of the first aspect when the processor runs.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program runs on a processor, the computer program performs the active dialogue method according to any one of the first aspects.
In order to realize the active conversation between the voice terminal and the user, the active conversation method, the active conversation device and the voice terminal provided by the application firstly search a target topic corresponding to the topic of interest of the user according to a preset topic switching table, then generate a standby sentence corresponding to the target topic, then find out a target sentence with the highest score value of semantic matching from all the standby sentences, and finally output the target sentence in voice. According to the scheme provided by the application, the interested topic of the user is obtained in advance, and the target sentence for the active conversation between the voice terminal and the user is generated according to the interested topic of the user, so that the technical problems that the viscosity and the using degree of the voice terminal are low due to the fact that the relevance between the conventional passive conversation and the user is low can be effectively solved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the embodiments will be briefly described below, and it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of the present invention. Like components are numbered similarly in the various figures.
Fig. 1 is a schematic flow chart illustrating an active dialogue method according to an embodiment of the present application;
FIG. 2 is a diagram illustrating model processing involved in an active dialogue method provided by an embodiment of the present application;
fig. 3 is a schematic diagram illustrating a transform-based decoder training process according to an active dialog method provided in an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a transform-based decoder prediction process involved in the active dialog method provided in the embodiment of the present application;
FIG. 5 is a flow chart illustrating sentence screening according to semantic matching in the active dialogue method provided in the embodiment of the present application;
FIG. 6 is a block diagram of an active dialog device according to an embodiment of the present application;
fig. 7 is a diagram illustrating a hardware device of a voice terminal according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present invention, are only intended to indicate specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments of the present invention.
Example 1
Referring to fig. 1, a schematic flowchart of an active dialog method according to an embodiment of the present application is provided. As shown in fig. 1, the active dialogue method mainly includes the following processes:
s101, searching a target topic corresponding to the topic of interest of the user according to a preset topic switching table.
The active dialogue method provided by the embodiment is applied to a voice terminal, such as a smart sound box and a voice robot, and is used for realizing active dialogue between the voice terminal and a user. It should be noted that the active dialog here may be a dialog actively initiated by the voice terminal to the user, or an active reply with high relevance to the user may be performed by the voice terminal according to the voice actively initiated by the user, instead of a conventional question-and-answer passive dialog. The application scene of the scheme can be a family or a fixed place, namely a scene that the voice terminal has multiple conversations with one or a few users, and the voice terminal can record historical conversations between the users and the voice terminal.
The voice terminal can determine the topic of interest of the user according to the historical conversation between the user and the voice terminal, and in order to ensure the timeliness of the topic of interest of the user, the topic of interest can be determined from the latest period of time or a period of number of historical conversations between the user and the voice terminal. Of course, the voice terminal may also determine the topic of interest of the user through personalized settings input in advance, which is not limited.
According to a specific embodiment of the present disclosure, the manner of obtaining the topic of interest of the user is further defined. Specifically, before the step of searching for the target topic corresponding to the topic of interest of the user according to the preset topic switching table, the method may further include:
acquiring a first type of historical dialogue between the user and the voice terminal;
extracting topics with the highest frequency in the first type of historical conversation according to a preset topic statistical method;
and taking the topic with the highest frequency as the topic of interest of the user.
The voice terminal records historical conversations between the user and the voice terminal, wherein the historical conversations refer to more than two rounds of chatting data. At the moment, the voice terminal needs to actively converse with the user, the conversation content is more critical, the attention of the user needs to be attracted, the topic of the user has certain topic unification with the historical chat content of the user, and the topic analysis is firstly carried out on the historical conversation of the current user. If the user has too many historical conversations, to prevent the topic of interest from being obtained from a long past historical conversation, the latest 30 rounds of conversations for the user to chat should be retained, defining the historical conversation used to determine the topic of interest as the first type of historical conversation.
The voice terminal is loaded with a preset topic statistical method, and the voice terminal can extract the topic with the highest frequency in the first-class history conversation according to the preset topic statistical method to serve as the topic of interest of the user.
According to a specific embodiment of the present disclosure, the preset topic statistical method includes:
segmenting words of the conversation, and recording basic topics to which the segmented words belong;
counting the occurrence frequency of each basic topic;
and taking the basic topic with the highest occurrence frequency as the topic of the conversation.
In the conversation, topics are usually embodied by topic types to which words belong, and whether different topics appear is counted by analyzing the topic types to which the words belong in the conversation. After the conversation containing a plurality of chat conversations is obtained, words of the chat sentences in the conversation are segmented according to the conversation sequence, the words are segmented, and then subsequent topic statistics operation is carried out.
When the words are segmented, segmentation can be carried out according to the number of bytes of the topic covering words, or according to a conventional word segmentation method, namely, segmentation is carried out according to 1 to 4 words with larger relevance, so that the granularity of the segmented words is smaller, and the accuracy is higher.
The topic categories of different words are different, for example, the words segmented in the above-mentioned multi-turn conversation example, the xishuangbanna belongs to the place name, the yunan belongs to the place name, the brother belongs to the relation name, and the marriage and the travel belong to the event topic. The word classification table is a data table for indicating the topic category to which each word belongs, and the computer device may store a preset word topic classification table, or call a word topic classification table in other devices, so as to find out the topic category to which each divided word belongs from the word topic classification table.
All chat sentences of each conversation can be divided into a plurality of words, and after the topic categories to which the words belong are determined, repeated words may exist in the conversation, and the topic categories to which different words belong may also be the same.
In the present embodiment, when topic analysis is performed, the dialog contents that do not include the voice terminal as the analysis target include only the dialog of the user, so that the degree of fitting to the user is higher, and the topic that is related to the highest frequency is directly used as the topic of interest of the user.
Suppose that the current user has a chat history with the voice terminal as follows:
a is that I is Sichuan.
B, in good places, the food is plenty.
A where do you?
And B, Shaanxi.
According to the analysis method, the topic of the conversation content is D0512, and the marked topic results are [ "D0512", "Sichuan-Shaanxi", "geography-Asia-China" ].
In addition, it is considered that people usually involve topic conversion when actually chatting, namely, switching from one topic to another topic. In order to fit actual chatting, the voice terminal loads a preset topic switching label, and the topic switching label stores the mapping relation of other topics which can be switched by one topic. After the interesting topics of the user are determined, the target topics corresponding to the interesting topics of the user are searched according to a preset topic switching target, wherein the correspondence refers to a plurality of target topics to which the interesting topics are possibly switched as source topics. In order to reduce unnecessary calculation operations, only the top 20 related topics with the highest probability in all the converted target topics can be reserved, that is, 20 topic conversion relations from the same interested topic as a source topic to the top 20 target topics with the highest probability are obtained.
S102, generating a standby sentence corresponding to the target topic;
after the interesting topic of the user and the conversion relations of the topics participated by the interesting topic are obtained according to the steps, the obtained sentence corresponding to the target topic to be converted can be combined to serve as a standby sentence for active conversation.
The main words with larger relevance can be found directly according to the main words contained in the target topic, and the standby sentences can be automatically generated. Of course, the target topic and the ontology word can be merged and spliced with the historical dialogue of the user to generate a standby sentence with a larger relevance.
S103, finding out a target sentence with the highest score value of semantic matching from all the standby sentences;
after a plurality of standby sentences are generated, semantic matching needs to be performed on the standby sentences, and sentences which are most matched with the interesting topics or the target topics of the user are found out as target sentences. When the specific semantics are matched, the semantics can be matched for each standby statement to obtain a score value, and the statement with the highest score value is selected as the most matched target statement.
And S104, outputting the target sentence voice.
And then according to the steps, generating a target sentence with larger relevance with the user, and outputting the target sentence by voice, namely carrying out active dialogue with the user.
In addition, according to an embodiment of the present disclosure, an opportunity selection for active dialogue is added. The step of outputting the target sentence by voice comprises:
determining a preset moment of the target statement;
and outputting the target sentence voice at the preset moment.
In the present embodiment, when generating a target sentence, an appropriate dialogue time of the target sentence is also determined as a preset time. The preset time can be determined according to the content of the target sentence, and can also be determined according to the time when the user is accustomed to talking with the voice terminal. After the target sentence is determined, the target sentence can be determined at a preset time of the target sentence.
For example, the target sentence is "do you last say you want to get a vaccine, hit? ", the preset time of this statement may be set to any dialable time of the user. And for another target sentence, for example, "the weather is good today and is suitable for appreciation", the determined preset moment should be night on a sunny day.
In the active dialogue method provided by the application, in order to realize active dialogue between the voice terminal and the user, a target topic corresponding to the topic of interest of the user is searched according to a preset topic switching table, a standby sentence corresponding to the target topic is generated, a target sentence with the highest score value of semantic matching is found out from all standby sentences, and finally the target sentence is output in a voice mode. According to the scheme provided by the application, the interested topic of the user is obtained in advance, and the target sentence for the active conversation between the voice terminal and the user is generated according to the interested topic of the user, so that the technical problems that the viscosity and the using degree of the voice terminal are low due to the fact that the relevance between the conventional passive conversation and the user is low can be effectively solved.
On the basis of the above-mentioned embodiment, the generation and matching process of the alternative sentence will be further defined below with reference to several embodiments.
According to a specific embodiment of the present disclosure, the step of generating the standby sentence corresponding to the target topic, described in S102, may include:
acquiring a second type of history dialogue between the user and the voice terminal;
and splicing the target topic and the corresponding ontology word with the second type of history dialogue of the user respectively to generate a standby sentence corresponding to the target topic.
The present embodiment defines a scheme of generating a standby sentence from a history dialogue of a user, and defines the history dialogue used here as a second type history dialogue. It should be noted that the second type of history dialog may be the same as or different from the first type of history dialog, and one or more newly generated history dialogs may be selected.
And splicing the determined target topic and the corresponding ontology word with the second type of history dialogue of the user respectively to generate a standby sentence corresponding to the target topic.
Specifically, the step of splicing the target topic and the corresponding ontology word with the second type of history dialog of the user to generate the standby sentence corresponding to the target topic may include:
keeping a plurality of latest generated dialogues with time meeting preset requirements in the second type of historical dialogues;
splicing the target topic to each newly generated dialogue through a random strategy to generate a random statement;
calculating probability values of all random conversations;
and reserving a random statement with the probability value meeting a preset value as the standby statement corresponding to the target topic.
And screening out a corpus set with the 20 conversion relations in all the rounds of chatting data sets according to the plurality of conversion relations obtained in the step.
Semantic retrieval is carried out on the labeled source topic results from the corpus sets, and three dialogue corpus sets with most relevant semantics are found out, wherein the semantic retrieval method can be shown in fig. 2, and the specific retrieval process is expressed as follows by an example in 2:
the method comprises the steps of coding Sichuan-Shaanxi by using BERT to obtain a vector u, simultaneously using the coded BERT and pooling firing to obtain a vector v for representing and interacting the two vectors, and then adding the two vectors into a feed-forward neural network to perform semantic classification Softmax classifier.
The model shown in fig. 2 needs to be trained in advance, and the model training may use a short text semantic matching classification task data set lcqmc whose corpus is a hayawara open source, or may use other corpus data sets collected by a voice terminal, without limitation.
And finally obtaining the ontology words of the three source topics with the maximum prediction scores, the target topics (including the first-level, the second-level and the third-level) corresponding to the three source topics and the ontology words corresponding to the target topics.
The model training may be a generative chatting mode based on a transcoder mechanism of a transformer, which is also a model adopted by the gpt series algorithm. The present embodiment mainly explains that the decoder part of the transform is used in multiple rounds of chat sessions. Fig. 3 shows a training process of multi-round chatting, where the training set is to splice a multi-round dialogue data set, and simultaneously splice a target topic and a target word to supplement as additional information, encode the content of the two splices and input the result into a model architecture of a decoder, and perform prediction after semantic information processing is performed through a multi-layer self-attribute and a feed-forward network, and fig. 4 is a schematic diagram of a prediction process of a decoder based on a transform. And comparing the predicted result with the standard output to obtain loss, and continuously updating each parameter in the decoder model through back propagation to ensure that the loss is continuously converged to finally obtain each parameter of the model.
And when generating the standby statement, respectively splicing the three target topics and the corresponding ontology words obtained in the steps to the second type of historical corpus. As shown in fig. 5, each target topic retains 5 active dialog contents with the maximum probability through a certain random strategy, and finally obtains 15 active dialog contents related to the user historical chat contents as standby sentences, and screens the 15 standby sentences.
In addition, according to another specific embodiment of the present disclosure, the step of finding the target sentence with the highest score value of semantic matching from all the alternative sentences includes:
extracting actual topics of each standby statement according to a preset topic statistical method;
if a first type of standby sentences of which the actual topics belong to any target topic exist;
obtaining score values of semantic matching between each first type of standby statement and each target topic;
and screening the standby statement with the largest score value as the target statement.
In the embodiment, the actual topic marking is firstly carried out on the generated 15 standby sentences, the first three target topics reserved in the previous step are compared for screening, and the standby sentences of which the actual topics are not in the three target topics are removed.
If the remaining standby sentences are removed, semantic matching can be performed on the remaining standby sentences. Assuming that the number of the remaining standby sentences is n, performing semantic matching on the ontology words of the three target topics to obtain 3n score values of semantic matching. And aiming at the score value of each standby statement corresponding to the three target topics, and keeping the maximum score value as the score value of the standby statement, so that n score values of n standby statements are finally obtained. And screening and retaining the standby statement corresponding to the maximum score value in the n score values in the step as a final output target statement.
In addition, in consideration of the fact that no spare sentences may be left after the target topic is removed, a scheme of screening according to interested topics serving as source topics can be added. As shown in fig. 5, according to a specific embodiment of the present disclosure, after the step of extracting the actual topic of each alternative sentence, the method further includes:
if all the actual topics do not belong to any target topic, screening second standby sentences of which the actual topics belong to the interested topic;
obtaining score values of semantic matching between the second type of standby sentences and the interesting topics;
and screening the standby statement with the largest score value as the target statement.
In the present embodiment, semantic matching is further performed on the interest topic for the 15 alternative sentences to obtain corresponding score values, and the alternative sentence with the largest score value is selected as the output target sentence.
In addition, considering the situation that no spare sentences remain after the target topic and the interesting topic are removed, a scheme for screening according to historical conversation can be added. As shown in fig. 5, according to a specific embodiment of the present disclosure, after the step of extracting the actual topic of each alternative sentence, the method further includes:
if all the actual topics do not belong to the target topic and the interesting topic, obtaining score values of semantic matching between each standby statement and the first type of historical conversation;
and screening the standby statement with the largest score value as the target statement.
And performing semantic matching on all 15 spare sentences and historical dialogue of a user, obtaining corresponding score values, selecting the spare sentence with the largest score value as a final target sentence, and outputting the final target sentence in a voice mode.
Further, the step of semantic matching the alternative sentence with the topic to obtain the score value may include:
respectively carrying out coding processing and pooling processing on the standby sentences and the topics to obtain first feature vectors corresponding to the standby sentences and second feature vectors corresponding to the topics;
and inputting the first feature vector and the second feature vector into a pre-trained vector correlation analysis model to obtain a score value of semantic matching.
And meanwhile, the expression words of the source topics to be semantically retrieved are also encoded by using BERT to obtain a feature vector, and the two vectors are added into a feed-forward neural network to perform a semantic classification task after being expressed and interacted. Of course, other semantic matching methods are possible without limitation.
In summary, the active dialog method provided in this embodiment is a scheme of generating a chat statement based on a decoder mechanism of a transformer, and the model is also a model adopted by a gpt series algorithm, and is applied to multiple rounds of chat dialogs by using a decoder portion of the transformer. The chatting content generated by the scheme has certain theme correlation with the chatting content of the user history, so that the actively generated content is closer to the chatting theme of the user, the distance between the actively generated content and the user is shortened, the viscosity of the user is improved, the emotion accompanying effect is established, and the robot has the characteristic of humanity.
Example 2
Referring to fig. 6, a block diagram of an active dialog device 600 is provided according to an embodiment of the present invention. As shown in fig. 6, the active dialog device 600 mainly includes:
the searching module 601 is configured to search a target topic corresponding to a topic of interest of a user according to a preset topic switching table;
a generating module 602, configured to generate a standby statement corresponding to the target topic;
a matching module 603, configured to find a target sentence with the highest score value of semantic matching from all the standby sentences;
an output module 604, configured to output the target sentence in voice.
According to a specific embodiment of the present disclosure, the search module 601 is further configured to:
acquiring a first type of historical dialogue between the user and the voice terminal;
extracting topics with the highest frequency in the first type of historical conversation according to a preset topic statistical method;
and taking the topic with the highest frequency as the topic of interest of the user.
According to a specific embodiment of the present disclosure, the generating module 602 is configured to:
acquiring a second type of history dialogue between the user and the voice terminal;
and splicing the target topic and the corresponding ontology word with the second type of history dialogue of the user respectively to generate a standby sentence corresponding to the target topic.
According to a specific embodiment of the present disclosure, the generating module 602 is configured to:
keeping a plurality of latest generated dialogues with time meeting preset requirements in the second type of historical dialogues;
splicing the target topic to each newly generated dialogue through a random strategy to generate a random statement;
calculating probability values of all random conversations;
and reserving a random statement with the probability value meeting a preset value as the standby statement corresponding to the target topic.
According to a specific embodiment of the present disclosure, the matching module 603 is configured to:
extracting actual topics of each standby statement according to a preset topic statistical method;
if a first type of standby sentences of which the actual topics belong to any target topic exist;
obtaining score values of semantic matching between each first type of standby statement and each target topic;
and screening the standby statement with the largest score value as the target statement.
According to a specific embodiment of the present disclosure, the matching module 603 is further configured to:
if all the actual topics do not belong to any target topic, screening second standby sentences of which the actual topics belong to the interested topic;
obtaining score values of semantic matching between the second type of standby sentences and the interesting topics;
and screening the standby statement with the largest score value as the target statement.
According to a specific embodiment of the present disclosure, the matching module 603 is further configured to:
if all the actual topics do not belong to the target topic and the interesting topic, obtaining score values of semantic matching between each standby statement and the first type of historical conversation;
and screening the standby statement with the largest score value as the target statement.
According to a specific embodiment of the present disclosure, the matching module 603 is further configured to: respectively carrying out coding processing and pooling processing on the standby sentences and the topics to obtain first feature vectors corresponding to the standby sentences and second feature vectors corresponding to the topics;
and inputting the first feature vector and the second feature vector into a pre-trained vector correlation analysis model to obtain a score value of semantic matching.
According to a specific embodiment of the present disclosure, the preset topic statistics manner includes:
segmenting words of the conversation, and recording basic topics to which the segmented words belong;
counting the occurrence frequency of each basic topic;
and taking the basic topic with the highest occurrence frequency as the topic of the conversation.
According to a specific embodiment of the present disclosure, the output module 604 is configured to:
determining a preset moment of the target statement;
and outputting the target sentence voice at the preset moment.
The specific implementation process of the active dialog apparatus provided may refer to the specific implementation process of the active dialog method shown in fig. 1, and is not described in detail here.
Furthermore, an embodiment of the present invention provides a voice terminal, which includes a memory and a processor, where the memory is used to store a computer program, and the computer program executes the active dialog method when the processor runs.
Specifically, as shown in fig. 7, the voice terminal 700 provided in this embodiment includes:
a radio frequency unit 701, a network module 702, an audio output unit 703, an input unit 704, a sensor 705, a display unit 706, a user input unit 707, an interface unit 708, a memory 709, a processor 710, a power supply 711, and the like. Those skilled in the art will appreciate that the computer device architecture illustrated in FIG. 7 is not intended to be limiting of computer devices, which may include more or fewer components than those illustrated, or some of the components may be combined, or a different arrangement of components. In the embodiment of the present application, the computer device includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.
It should be understood that, in the embodiment of the present application, the radio frequency unit 701 may be used for receiving and sending signals during a message sending and receiving process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 710; in addition, the uplink data is transmitted to the base station. In general, radio frequency unit 701 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 701 may also communicate with a network and other devices through a wireless communication system.
The computer device provides wireless broadband internet access to the user via the network module 702, such as to assist the user in sending and receiving e-mails, browsing web pages, and accessing streaming media.
The audio output unit 703 may convert audio data received by the radio frequency unit 701 or the network module 702 or stored in the memory 709 into an audio signal and output as sound. Also, the audio output unit 703 may also provide audio output related to a specific function performed by the computer apparatus 700 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 703 includes a speaker, a buzzer, a receiver, and the like.
The input unit 704 is used to receive audio or video signals. The input Unit 704 may include a Graphics Processing Unit (GPU) 7041 and a microphone 7042, and the Graphics processor 7041 processes image data of still pictures or videos obtained by an image capturing computer device (such as a camera) in a video capturing mode or an image capturing mode. The processed image frames may be video played on the display unit 706. The image frames processed by the graphic processor 7041 may be stored in the memory 709 (or other storage medium) or transmitted via the radio unit 701 or the network module 702. The microphone 7042 may receive sounds and may be capable of processing such sounds into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 701 in case of a phone call mode.
The computer device 700 further comprises at least one sensor 705 comprising at least the barometer mentioned in the above embodiments. In addition, the sensor 705 may also be other sensors such as light sensors, motion sensors, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 7071 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 7061 and/or the backlight when the computer device 700 is moved to the ear. As one type of motion sensor, an accelerometer sensor can detect the magnitude of acceleration in various directions (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of a computer device (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 705 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.
The display unit 706 is used for video playing of information input by the user or information provided to the user. The Display unit 706 may include a Display panel 7061, and the Display panel 7061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
The user input unit 707 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer device. Specifically, the user input unit 707 includes a touch panel 7071 and other input devices 7072. The touch panel 7071, also referred to as a touch screen, may collect touch operations by a user on or near the touch panel 7071 (e.g., operations by a user on or near the touch panel 7071 using a finger, a stylus, or any other suitable object or attachment). The touch panel 7071 may include two parts, a touch detection computer device and a touch controller. The touch detection computer equipment detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch-sensing computer device, converts it to touch point coordinates, and sends the touch point coordinates to the processor 710, receives commands from the processor 710, and executes the commands. In addition, the touch panel 7071 can be implemented by various types such as resistive, capacitive, infrared, and surface acoustic wave. The user input unit 707 may include other input devices 7072 in addition to the touch panel 7071. In particular, the other input devices 7072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described herein again.
Further, the touch panel 7071 may be overlaid on the display panel 7071, and when the touch panel 7071 detects a touch operation thereon or nearby, the touch operation is transmitted to the processor 710 to determine the type of the touch event, and then the processor 710 provides a corresponding visual output on the display panel 7061 according to the type of the touch event. Although the touch panel 7071 and the display panel 7061 are shown in fig. 5 as two separate components to implement the input and output functions of the computer device, in some embodiments, the touch panel 7071 and the display panel 7061 may be integrated to implement the input and output functions of the computer device, which is not limited herein.
The interface unit 708 is an interface for connecting an external computer device to the computer device 700. For example, the external computer device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a computer device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 708 may be used to receive input (e.g., data information, power, etc.) from an external computer device and transmit the received input to one or more elements within the computer device 700 or may be used to transmit data between the computer device 700 and an external computer device.
The memory 709 may be used to store software programs as well as various data. The memory 709 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 709 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The processor 710 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by operating or executing software programs and/or modules stored in the memory 709 and calling data stored in the memory 709, thereby monitoring the computer device as a whole. Processor 710 may include one or more processing units; preferably, the processor 710 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 710.
The computer device 700 may also include a power supply 711 (e.g., a battery) for powering the various components, and preferably, the power supply 711 may be logically coupled to the processor 710 via a power management system that may enable managing charging, discharging, and power consumption management functions.
In addition, the computer device 700 includes some functional modules that are not shown, and are not described in detail herein.
The memory is used for storing a computer program which, when the processor is running, executes the active dialog method described above.
In addition, the present application provides a computer readable storage medium, which stores a computer program, wherein the computer program runs the active dialogue method on a processor.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part of the technical solution that contributes to the prior art in essence can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims (13)

1. An active dialogue method, applied to a voice terminal, the method comprising:
searching a target topic corresponding to the topic of interest of the user according to a preset topic switching table;
generating a standby sentence corresponding to the target topic;
finding out a target sentence with the highest score value of semantic matching from all the standby sentences;
and outputting the target sentence voice.
2. The method according to claim 1, wherein before the step of searching for the target topic corresponding to the topic of interest of the user according to the preset topic switching table, the method further comprises:
acquiring a first type of historical dialogue between the user and the voice terminal;
extracting topics with the highest frequency in the first type of historical conversation according to a preset topic statistical method;
and taking the topic with the highest frequency as the topic of interest of the user.
3. The method of claim 2, wherein the step of generating the alternative sentence corresponding to the target topic comprises:
acquiring a second type of history dialogue between the user and the voice terminal;
and splicing the target topic and the corresponding ontology word with the second type of history dialogue of the user respectively to generate a standby sentence corresponding to the target topic.
4. The method according to claim 3, wherein the step of generating the standby sentence corresponding to the target topic by splicing the target topic and the corresponding ontology word with the second type of history dialogue of the user comprises:
keeping a plurality of latest generated dialogues with time meeting preset requirements in the second type of historical dialogues;
splicing the target topic to each newly generated dialogue through a random strategy to generate a random statement;
calculating probability values of all random conversations;
and reserving a random statement with the probability value meeting a preset value as the standby statement corresponding to the target topic.
5. The method according to claim 2, wherein the step of finding the target sentence with the highest score value of semantic matching from all the alternative sentences comprises:
extracting actual topics of each standby statement according to a preset topic statistical method;
if a first type of standby sentences of which the actual topics belong to any target topic exist;
obtaining score values of semantic matching between each first type of standby statement and each target topic;
and screening the standby statement with the largest score value as the target statement.
6. The method of claim 5, wherein after the step of extracting the actual topic of each alternative sentence, the method further comprises:
if all the actual topics do not belong to any target topic, screening second standby sentences of which the actual topics belong to the interested topic;
obtaining score values of semantic matching between the second type of standby sentences and the interesting topics;
and screening the standby statement with the largest score value as the target statement.
7. The method of claim 6, wherein after the step of extracting the actual topic of each alternative sentence, the method further comprises:
if all the actual topics do not belong to the target topic and the interesting topic, obtaining score values of semantic matching between each standby statement and the first type of historical conversation;
and screening the standby statement with the largest score value as the target statement.
8. The method of any one of claims 5 to 7, wherein the step of semantically matching the alternative sentence with the topic to obtain the score value comprises:
respectively carrying out coding processing and pooling processing on the standby sentences and the topics to obtain first feature vectors corresponding to the standby sentences and second feature vectors corresponding to the topics;
and inputting the first feature vector and the second feature vector into a pre-trained vector correlation analysis model to obtain a score value of semantic matching.
9. The method of claim 5, wherein the preset topic statistics method comprises:
segmenting words of the conversation, and recording basic topics to which the segmented words belong;
counting the occurrence frequency of each basic topic;
and taking the basic topic with the highest occurrence frequency as the topic of the conversation.
10. The method of claim 1, wherein the step of speech-outputting the target sentence comprises:
determining a preset moment of the target statement;
and outputting the target sentence voice at the preset moment.
11. An active dialog device, comprising:
the searching module is used for searching a target topic corresponding to the topic of interest of the user according to a preset topic switching table;
the generating module is used for generating a standby sentence corresponding to the target topic;
the matching module is used for finding out a target sentence with the highest score value of semantic matching from all the standby sentences;
and the output module is used for outputting the target sentence voice.
12. A speech terminal, characterized in that it comprises a memory for storing a computer program which, when run by the processor, performs the active dialog method of any of claims 1 to 10, and a processor.
13. A computer-readable storage medium, characterized in that it stores a computer program which, when run on a processor, performs the active dialog method of any of claims 1 to 10.
CN202110855181.6A 2021-07-26 2021-07-26 Active dialogue method and device and voice terminal Active CN113535926B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110855181.6A CN113535926B (en) 2021-07-26 2021-07-26 Active dialogue method and device and voice terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110855181.6A CN113535926B (en) 2021-07-26 2021-07-26 Active dialogue method and device and voice terminal

Publications (2)

Publication Number Publication Date
CN113535926A true CN113535926A (en) 2021-10-22
CN113535926B CN113535926B (en) 2023-11-10

Family

ID=78089367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110855181.6A Active CN113535926B (en) 2021-07-26 2021-07-26 Active dialogue method and device and voice terminal

Country Status (1)

Country Link
CN (1) CN113535926B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005157602A (en) * 2003-11-25 2005-06-16 Aruze Corp Conversation control device, conversation control method, and those programs
CN106649405A (en) * 2015-11-04 2017-05-10 陈包容 Method and device for acquiring reply prompt content of chat initiating sentence
CN109086329A (en) * 2018-06-29 2018-12-25 出门问问信息科技有限公司 Dialogue method and device are taken turns in progress based on topic keyword guidance more
CN109582970A (en) * 2018-12-12 2019-04-05 科大讯飞华南人工智能研究院(广州)有限公司 A kind of semantic measurement method, apparatus, equipment and readable storage medium storing program for executing
CN111026932A (en) * 2019-12-20 2020-04-17 北京百度网讯科技有限公司 Man-machine conversation interaction method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005157602A (en) * 2003-11-25 2005-06-16 Aruze Corp Conversation control device, conversation control method, and those programs
CN106649405A (en) * 2015-11-04 2017-05-10 陈包容 Method and device for acquiring reply prompt content of chat initiating sentence
CN109086329A (en) * 2018-06-29 2018-12-25 出门问问信息科技有限公司 Dialogue method and device are taken turns in progress based on topic keyword guidance more
CN109582970A (en) * 2018-12-12 2019-04-05 科大讯飞华南人工智能研究院(广州)有限公司 A kind of semantic measurement method, apparatus, equipment and readable storage medium storing program for executing
CN111026932A (en) * 2019-12-20 2020-04-17 北京百度网讯科技有限公司 Man-machine conversation interaction method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113535926B (en) 2023-11-10

Similar Documents

Publication Publication Date Title
WO2021196981A1 (en) Voice interaction method and apparatus, and terminal device
CN110418208B (en) Subtitle determining method and device based on artificial intelligence
CN107943860B (en) Model training method, text intention recognition method and text intention recognition device
CN109145303B (en) Named entity recognition method, device, medium and equipment
CN110598046B (en) Artificial intelligence-based identification method and related device for title party
CN110096580B (en) FAQ conversation method and device and electronic equipment
CN110890093B (en) Intelligent equipment awakening method and device based on artificial intelligence
CN110570840B (en) Intelligent device awakening method and device based on artificial intelligence
CN111402866B (en) Semantic recognition method and device and electronic equipment
WO2021159877A1 (en) Question answering method and apparatus
CN111159338A (en) Malicious text detection method and device, electronic equipment and storage medium
CN112749252A (en) Text matching method based on artificial intelligence and related device
CN111314771B (en) Video playing method and related equipment
CN111522592A (en) Intelligent terminal awakening method and device based on artificial intelligence
WO2022227507A1 (en) Wake-up degree recognition model training method and speech wake-up degree acquisition method
CN109063076B (en) Picture generation method and mobile terminal
CN113822038A (en) Abstract generation method and related device
CN112328783A (en) Abstract determining method and related device
CN113505596B (en) Topic switching marking method and device and computer equipment
CN111723783B (en) Content identification method and related device
CN112307198B (en) Method and related device for determining abstract of single text
CN111062200B (en) Speaking generalization method, speaking recognition device and electronic equipment
CN113535926B (en) Active dialogue method and device and voice terminal
CN114065168A (en) Information processing method, intelligent terminal and storage medium
CN113569043A (en) Text category determination method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant