CN118398009A

CN118398009A - Calling method, device and equipment of intelligent assistant

Info

Publication number: CN118398009A
Application number: CN202410566901.0A
Authority: CN
Inventors: 许益峰; 李鹏; 王友玲; 韩娟
Original assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd; MIGU Comic Co Ltd
Current assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd; MIGU Comic Co Ltd
Priority date: 2024-05-09
Filing date: 2024-05-09
Publication date: 2024-07-26

Abstract

The application discloses a calling method, a device and equipment of an intelligent assistant, and relates to the technical field of communication, wherein the method is applied to intelligent assistant capability platform equipment and comprises the following steps: acquiring call information between a first terminal and a second terminal in the process of communicating between the first terminal and the second terminal; acquiring problem intention information of the first terminal according to the call information; the first terminal is a terminal for starting an intelligent assistant service; selecting a target intelligent assistant corresponding to the problem intention information according to the problem intention information; and calling the target intelligent assistant to obtain service information provided by the target intelligent assistant. According to the scheme provided by the application, the intelligent assistant can be called in the call process without the need of a user to issue a command, so that the service provided by the intelligent assistant is obtained, the intelligent assistant is called more intelligently and conveniently, and the call experience of the user is improved.

Description

Calling method, device and equipment of intelligent assistant

Technical Field

The present application relates to the field of communications technologies, and in particular, to a method, an apparatus, and a device for calling an intelligent assistant.

Background

The existing intelligent (voice) assistant mainly performs voice assisting hand calling through a voice instruction display mode, and then sends out a specific voice instruction to perform corresponding assistant capability calling. The above manner is not intelligent and convenient enough to call an intelligent (voice) assistant in a call (such as a fifth generation mobile communication technology (5 ^th Generation Mobile Communication Technology, 5G) video call) scenario, which requires a user to perform an explicit voice instruction operation and call the related service of the intelligent (voice) assistant after interrupting the call.

Disclosure of Invention

The application aims to provide a calling method, a device and equipment of an intelligent assistant, so as to solve the problems of insufficient intelligence and convenience in calling of the existing intelligent assistant.

In order to achieve the above object, an embodiment of the present application provides a method for calling an intelligent assistant, including:

Acquiring call information between a first terminal and a second terminal in the process of communicating between the first terminal and the second terminal;

acquiring problem intention information of the first terminal according to the call information; the first terminal is a terminal for starting an intelligent assistant service;

selecting a target intelligent assistant corresponding to the problem intention information according to the problem intention information;

and calling the target intelligent assistant to obtain service information provided by the target intelligent assistant.

Optionally, acquiring the problem intention information of the first terminal according to the call information includes:

carrying out semantic analysis on the acquired call information, and dividing the call information into N sections of call information, wherein the similarity of semantic analysis results of two adjacent sections of call information is smaller than a similarity threshold value, each section of call information contains a semantic meaning, and N is a positive integer;

Determining first section of call information in the N sections of call information as target call information, wherein the time corresponding to the first section of call information is earlier than the time corresponding to other sections of call information except the first section of call information;

and carrying out intention recognition on the target call information to acquire the problem intention information.

Optionally, performing semantic analysis on the acquired call information, and dividing the call information into N segments of call information, including:

Dividing the call information into M sections of call information according to the time length, wherein M is a positive integer;

Starting from the first section of call information in the M sections of call information, carrying out semantic analysis on the current section of call information, wherein the current section of call information is the first section of call information in all sections of call information which are not subjected to semantic analysis in the M sections of call information;

If the current section of call information contains P kinds of semantics, splitting the current section of call information into P sections of call information according to the P kinds of semantics, and carrying out semantic analysis on the next section of call information adjacent to the current section of call information to obtain semantic similarity between the last section of call information in the P sections of call information and the adjacent next section of call information; or if the current section of call information contains a semantic meaning, carrying out semantic meaning analysis on the adjacent next section of call information to obtain semantic meaning similarity between the current section of call information and the adjacent next section of call information; p is an integer greater than or equal to 2;

After the obtained semantic similarity is greater than or equal to a similarity threshold, merging two sections of call information corresponding to the semantic similarity, or under the condition that the obtained semantic similarity is smaller than the similarity threshold, continuing to perform semantic analysis on the next adjacent section of call information until segmentation and/or merging of the M sections of call information are completed;

and obtaining the N-section call information according to the segmentation and/or combination result of the M-section call information.

Optionally, performing intention recognition on the target call information to obtain the problem intention information, including:

Inputting the target call information into an intention recognition model, and obtaining first information output by the intention recognition model, wherein the first information is used for indicating a potential problem or existing intention of the first terminal;

Under the condition that the potential problem or the existing intention belongs to a preset field, inputting the target call information into a topic generation model to obtain a call topic output by the topic generation model;

And carrying out entity identification on the first information and the conversation theme to generate the problem intention information.

Optionally, performing entity identification on the first information and the call theme, and generating the problem intention information includes:

calculating the association degree of the first information and the conversation theme;

under the condition that the association degree is larger than an association degree threshold value, carrying out text entity identification on the first information and the conversation theme to obtain entity information;

and filling slots based on the entity information to generate the problem intention information.

Optionally, the method further comprises:

Converting the service information into a target media format, wherein the target media format is adapted to the format of call information between the first terminal and the second terminal;

And sending second information to an intelligent assistant service application server AS according to the converted service information, wherein the second information is used for indicating that the converted service information is synthesized with the current call information between the first terminal and the second terminal.

In order to achieve the above object, an embodiment of the present application provides a calling device of an intelligent assistant, including:

The first acquisition module is used for acquiring call information between the first terminal and the second terminal in the process of the call between the first terminal and the second terminal;

The determining module is used for acquiring the problem intention information of the first terminal according to the call information; the first terminal is a terminal for starting an intelligent assistant service;

The second acquisition module is used for selecting a target intelligent assistant corresponding to the problem intention information according to the problem intention information;

and the calling module is used for calling the service of the target intelligent assistant to obtain the service information provided by the target intelligent assistant.

In order to achieve the above object, an embodiment of the present application provides a communication device including: a transceiver, a processor, a memory, and a program or instruction stored on the memory and executable on the processor, which when executed by the processor, implements a method of invoking a smart assistant as described in the first aspect.

In order to achieve the above object, an embodiment of the present application provides a readable storage medium having stored thereon a program or instructions which, when executed by a processor, implement the method for invoking an intelligent assistant according to the first aspect.

In a fifth aspect, to achieve the above object, an embodiment of the present application provides a computer program product comprising computer instructions which, when executed by a processor, implement a method of invoking an intelligent assistant according to the first aspect.

The technical scheme of the application has at least the following beneficial effects:

The calling method of the intelligent assistant is applied to the intelligent assistant capability platform equipment, and comprises the following steps: firstly, in the process of communicating between a first terminal and a second terminal, acquiring communication information between the first terminal and the second terminal; secondly, acquiring problem intention information of the first terminal according to the call information; the first terminal is a terminal for starting an intelligent assistant service; thus, the problem intention information of the user is intelligently identified based on the call information, and the user is prevented from carrying out clear voice instruction operation; thirdly, selecting a target intelligent assistant corresponding to the problem intention information according to the problem intention information; and finally, calling the target intelligent assistant to obtain service information provided by the target intelligent assistant. Therefore, the service of calling the intelligent assistant in the user communication process is realized, so that the service information provided by the intelligent assistant is obtained, and the intelligent and convenience of calling the intelligent assistant are improved.

Drawings

FIG. 1 is a flow chart of a method for calling an intelligent assistant according to an embodiment of the present application;

FIG. 2 is a second flowchart of a method for calling an intelligent assistant according to an embodiment of the present application;

FIG. 3 is a workflow diagram of an intelligent assistant capability platform device in an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a potential instruction analysis of call content according to an embodiment of the present application;

FIG. 5 is a second schematic diagram of a potential instruction analysis of call content according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a calling device of an intelligent assistant according to an embodiment of the present application;

Fig. 7 is a schematic structural diagram of a communication device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the application may be practiced otherwise than as specifically illustrated or described herein. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The method, the device and the equipment for calling the intelligent assistant provided by the embodiment of the application are described in detail through specific embodiments and application scenes thereof by combining the attached drawings.

An embodiment of the present application provides a method for calling an intelligent assistant, where the method may be applied to an intelligent assistant capability platform device, as shown in fig. 1, and the method includes:

Step 101, acquiring call information between a first terminal and a second terminal in the process of the call between the first terminal and the second terminal; as a specific example, this step may specifically be obtaining call information sent by a network element device (such as the new air interface supported voice (Voice over New Radio, voNR +) media plane network element in fig. 2); wherein the call information comprises uplink media stream data of the first terminal (call information sent by the first terminal to the second terminal) and uplink media stream data of the second terminal (call information sent by the second terminal to the first terminal);

step 102, acquiring problem intention information of the first terminal according to the call information; the first terminal is a terminal for starting an intelligent assistant service; for example, the step may determine the problem intention information through operations such as semantic analysis of the call information; therefore, the user intention is determined based on the call content, the situation that the user needs to call only an assistant through clear voice instruction operation in the call process is avoided, and the call fluency is improved;

Here, it should be noted that, in the embodiment of the present application, the call of the intelligent assistant is only performed on the terminal that signs up and starts the intelligent assistant service; therefore, before executing the method of the embodiment of the present application, as shown in fig. 2, it is required to first determine whether the terminal signs up and starts the intelligent assistant service, that is, step 101 of the embodiment of the present application may specifically be: acquiring call information between a first terminal and a second terminal when the first terminal starts an intelligent assistant service in the process of communicating with the second terminal;

step 103, selecting a target intelligent assistant corresponding to the problem intention information according to the problem intention information; for example, this step may be extracting keywords of the problem intention information, matching/mapping the keywords with the keywords of each intelligent assistant configured in advance, and determining the intelligent assistant with the highest matching degree as the target intelligent assistant;

And 104, calling the target intelligent assistant to obtain service information provided by the target intelligent assistant. Here, the intelligent assistant capability platform device may obtain service information fed back by the target intelligent assistant according to the problem intention information by invoking a service of the target intelligent assistant.

Here, it should be noted that the intelligent assistant is, for example, an intelligent (voice) assistant, and the intelligent assistant capability platform device is, for example, a voice recognition intelligent assistant capability platform device; the method provided by the embodiment of the application is suitable for a 5G new call scene, namely, the user can call the intelligent assistant based on the method provided by the embodiment of the application in the process of the 5G new call so as to provide service information for the user by the intelligent assistant.

In the calling method of the intelligent assistant in the embodiment of the application, firstly, in the process of communicating between a first terminal and a second terminal, an intelligent assistant capability platform device acquires communication information between the first terminal and the second terminal; secondly, acquiring problem intention information of the first terminal according to the call information; the first terminal is a terminal for starting an intelligent assistant service; therefore, the problem intention information of the user is intelligently identified based on the call information, the operation that the user actively inputs an explicit voice instruction is avoided, and the operation flow of the user is simplified; thirdly, selecting a target intelligent assistant corresponding to the problem intention information according to the problem intention information; and finally, calling the target intelligent assistant to obtain service information provided by the target intelligent assistant. Therefore, the service of the required intelligent assistant can be accurately invoked based on the call information without interrupting the call process in the user call process, so that the service information provided by the intelligent assistant is obtained, the intelligent and convenience of the intelligent assistant invocation are improved, the smoothness of the user call is ensured, and the user experience is improved.

It should be noted here that, during a call, the intention and the theme may change frequently due to the continuous interaction of the two parties, which requires an efficient dialog management system to handle this situation, wherein the dialog management system needs to be able to accurately track each dialog round and make a reasonable decision according to the change of the context. Based on this, as an alternative implementation, step 102 includes:

Carrying out semantic analysis on the acquired call information, and dividing the call information into N sections of call information, wherein the similarity of semantic analysis results of two adjacent sections of call information is smaller than a similarity threshold value, each section of call information contains a semantic meaning, and N is a positive integer; that is, the acquired call information is subjected to segmentation operation according to the semantics contained in the call information, so that each piece of call information only contains one kind of semantics and the semantics contained in the call information at two adjacent ends are different, wherein one piece of call information contains one kind of semantics, which indicates that the piece of call information has no conversion topic, and therefore, the intention recognition of the user can be performed on the piece of call information;

Determining first section of call information in the N sections of call information as target call information, wherein the time corresponding to the first section of call information is earlier than the time corresponding to other sections of call information except the first section of call information; that is, this step extracts the earliest piece of call information in time as target call information (call information currently required for intention recognition) in time sequence;

By adopting the implementation mode, the acquired call information is subjected to segmentation processing, and semantic analysis and processing are carried out on each segment of call information, so that the problem intention information of the user is determined based on the call information corresponding to the same semantic, the problem intention information of the user is intelligently identified on the basis of not interrupting the call process of the user, so that the intelligent assistant is conveniently called based on the problem intention information, accurate service information is provided for the user by the intelligent assistant, the intelligent degree of the intelligent assistant is improved, the smoothness of the call of the user is maintained, and the user experience is improved.

As a specific implementation manner, performing semantic analysis on the acquired call information, and dividing the call information into N segments of call information, including:

(1) Dividing the call information into M sections of call information according to the time length, wherein M is a positive integer; for example, the time length is 30s, that is, the step uses the time length as a segmentation standard to segment the acquired call information;

(2) Starting from the first section of call information in the M sections of call information, carrying out semantic analysis on the current section of call information, wherein the current section of call information is the first section of call information in all sections of call information which are not subjected to semantic analysis in the M sections of call information; that is, semantic recognition is sequentially performed on the M-segment call information;

(3) If the current section of call information contains P kinds of semantics, splitting the current section of call information into P sections of call information according to the P kinds of semantics, and carrying out semantic analysis on the next section of call information adjacent to the current section of call information to obtain semantic similarity between the last section of call information in the P sections of call information and the adjacent next section of call information; or if the current section of call information contains a semantic meaning, carrying out semantic meaning analysis on the adjacent next section of call information to obtain semantic meaning similarity between the current section of call information and the adjacent next section of call information; p is an integer greater than or equal to 2;

(4) After the obtained semantic similarity is greater than or equal to the similarity threshold, merging two sections of call information corresponding to the semantic similarity, or under the condition that the obtained semantic similarity is smaller than the similarity threshold, continuing to perform semantic analysis on the next adjacent section of call information until segmentation and/or merging of the M sections of call information are completed; here, when the semantic similarity is smaller than the similarity threshold, it is indicated that topic conversion has occurred in the two pieces of call information corresponding to the semantic similarity, and therefore, the two pieces of call information cannot be combined;

Here, the above steps (2), (3) and (4) may be understood as steps performed in a loop, and one specific example is as follows: after the acquired call information is divided into 3 (M is 3) sections of call information according to the time length, firstly, carrying out semantic recognition on the first section of call information, determining that the first section of call information contains a semantic, secondly, continuously carrying out semantic recognition on the second section of call information (the next section of call information adjacent to the first section of call information), determining that the second section of call information contains 3 semantics, and dividing the second section of call information into 3 subsections according to the 3 semantics; thirdly, calculating the semantic similarity of the first section of call information and the first sub-section of call information, and determining that the semantic similarity of the first section of call information and the first sub-section of call information is larger than a similarity threshold value, and combining the first section of call information and the first sub-section of call information, wherein the combined call information is first target call information; fourth, on the one hand, carrying out intention recognition on the target call information to obtain corresponding problem intention information, and carrying out intention recognition by taking the second subsection as the target call information to obtain corresponding problem intention information; on the other hand, continuing to perform semantic recognition on the third-section call information, determining that the third-section call information only contains one semantic, in this case, calculating the semantic similarity of the third-section call information and the third-section call information, and determining that the semantic similarity of the third-section call information and the third-section call information is smaller than a similarity threshold value, wherein at the moment, the call information of the third-section is used as third target call information, and the call information of the third-section is used as fourth target call information; fifth, after intention recognition is performed on the second target call information, intention recognition is performed on the third target call information and the fourth target call information in sequence, and corresponding problem intention information is obtained respectively.

(5) Obtaining the N-section call information according to the segmentation and/or combination result of the M-section call information; namely: and segmenting or combining the M sections of call information into N short call information according to semantics.

The implementation of step 102 (the alternative implementations and the specific implementations described above) is described in another way below:

The acquired call information is subjected to segmentation processing, wherein each segment of call information is call information in a first duration; that is, the step uses the time length as the segmentation standard to segment the acquired call information, for example, the call information in every 30s is a segment of call information;

carrying out semantic analysis and processing on at least one section of call information to obtain target call information; after the semantic analysis is performed on a section of call information, when the section of call information comprises two kinds of semantics, the semantic analysis and the processing are performed on the section of call information; when the section of call information only comprises one semantic meaning, the step is to perform semantic meaning analysis and processing on at least two sections of call information; for example, in this step, semantic analysis and processing may be performed on the first K sections of call information in the stored multiple sections of call information, where K is specifically 1, 2, 3, etc.; here, the target call information includes at least part of the first section of call information;

And determining the problem intention information according to the target call information.

Specifically, semantic analysis and processing are performed on at least one section of call information to obtain target call information, including:

(1) Semantic analysis is carried out on the first section of call information and the second section of call information respectively, and a first semantic analysis result and a second semantic analysis result are obtained; the first semantic analysis result corresponds to the first section of call information, and the second semantic analysis result corresponds to the second section of call information; here, the first section of call information and the second section of call information are two sections of adjacent call information, the first section of call information is the call information with the earliest time in the call information stored in the intelligent assistant capability platform device, and the second section of call information is the call information after the first section of call information;

(2) When the first semantic analysis result comprises a semantic and the similarity between the first semantic analysis result and the second semantic analysis result is greater than or equal to a similarity threshold, merging the second section of call information into the first section of call information to obtain updated first section of call information, and executing semantic analysis on the first section of call information and the second section of call information respectively according to the updated first section of call information to obtain a first semantic analysis result and a second semantic analysis result;

Here, the first semantic analysis result includes a semantic meaning that the first piece of call information has no conversion topic; the similarity between the first semantic analysis result and the second semantic analysis result is greater than or equal to a similarity threshold value, which indicates that the first section of call information and the second section of call information are related to the same topic, so that the first section of call information and the second section of call information can be combined into one section of call information, semantic analysis is performed on the first section of call information and the second section of call information again to obtain the first semantic analysis result and the second semantic analysis result, and the step of performing again performs semantic analysis and processing on the updated first section of call information and the updated second section of call information (the next section of call information adjacent to the original third section of call information) until the similarity of the semantic analysis results corresponding to the two adjacent sections of call information is greater than or equal to the similarity threshold value (namely, topic conversion is performed on the two adjacent sections of call information);

(3) Determining that the target call information is the first section of call information when the first semantic analysis result comprises a semantic and the similarity between the first semantic analysis result and the second semantic analysis result is smaller than the similarity threshold; that is, in the case where the first piece of call information corresponds to only one topic and is different from the topic of the adjacent next piece of call information, the first piece of call information is target call information for determining problem intention information;

(4) And under the condition that the first semantic analysis result comprises multiple semantics, determining that the target call information is call information corresponding to a first semantic in the multiple semantics. Similarly, the first semantic analysis result includes a plurality of semantics indicating that the first piece of call information is subject to the theme switching (each of the semantics corresponds to one of the subjects), so that call information corresponding to the first semantic (corresponding to a part of call information with earliest time in the first piece of call information) can be determined as the target call information.

That is, by tracking context information of a conversation, changes and transitions of a user in the conversation can be observed. If the user switches from one specific topic to another topic, the semantic similarity of the current dialog turn and the previous turn can be calculated by using a semantic similarity analysis algorithm, and if the two semantics differ greatly, the user intention or the switching of the topics can be deduced.

As a specific example of the above implementation manner, a dictionary structure may be used to dynamically save a dialogue text (call information), specifically, firstly, manually define 30s to save the dialogue and make corresponding labels, then start to calculate similarity of adjacent dialogue texts (call information), if the two are similar, perform text merging operation and dynamically adjust the labels, if there are different semantics in a saved section of dialogue text (call information), separate the dialogue into two sections of text (call information) and adjust the text labels, if there are no two sections, continue to compare the similarity with the next round of dialogue (next section of call information), and if the two sections of adjacent texts (call information) are dissimilar, determine problem intention information according to the first section of text (call information).

In the above specific implementation manner, the semantic analysis is performed on the multiple pieces of call information stored in the intelligent assistant capability platform device to obtain a corresponding semantic analysis result, so that the multiple pieces of call information are split or combined according to the semantic analysis result to obtain call information corresponding to the first semantic (call information corresponding to the first theme), and the call information is determined to be the target call information, so that the problem intention information at different moments is sequentially determined according to the time sequence, and service information corresponding to the problem intention information is sequentially provided for the user.

Here, it should be noted that, after determining the first session information in the N session information as the target session information, or after performing semantic analysis and processing on at least one session information to obtain the target session information, the method further includes: and deleting the target call information. Therefore, the dynamic management of the call information is realized, the influence of excessive call information stored by the intelligent assistant capability platform equipment on the performance of the intelligent assistant capability platform equipment is avoided, and the intelligent assistant capability platform equipment always process the call information with earliest time when acquiring the target call information, so that the information processing flow is simplified.

As another specific implementation manner, determining the problem intention information according to the target call information includes:

(A) Inputting the target call information into an intention recognition model, and obtaining first information output by the intention recognition model, wherein the first information is used for indicating a potential problem or existing intention of the first terminal;

Specifically, the intention recognition model is a pre-trained model, and the first information can be obtained by the intention recognition model through word vector conversion and semantic similarity analysis, wherein the working process of the intention recognition model comprises the following steps: firstly, converting text content (text representation of a user sentence) corresponding to target call information into sentence vector representation by using a word vector model (such as word2 Vec); secondly, calculating semantic similarity between the sentence vector and a predefined problem template; finally, the most similar question templates are selected as potential questions or intentions that exist based on the semantic similarity scores. If the user sentence is a statement sentence rather than a direct question, the user intention can be identified by extracting the keyword, for example, by extracting the keywords "three-word", "weather", it can be determined that the user may want to know about the weather condition of three-word.

(B) Under the condition that the potential problem or the existing intention belongs to a preset field, inputting the target call information into a topic generation model to obtain a call topic output by the topic generation model; here, the preset domain is, for example, a public domain; that is, this step includes the following two sub-steps:

The method comprises the following substeps: judging whether the potential problem or the existing intention information belongs to the preset field or not; one example of a specific implementation of sub-step one is: extracting text features of an intention sentence I (namely the first information) by adopting a Textcnn +Senet network model, using a full-connection layer to take n feature values Flatten as a 1*n-dimension feature map, and finally normalizing the feature values with an activation function and outputting the category of the feature values, namely public field/private field.

Sub-step two: if the potential problem or the existing intention information does not belong to the preset field, interrupting the process of determining the problem intention information according to the target call information, and if the potential problem or the existing intention information belongs to the preset field, inputting the target call information into a topic generation model to obtain a call topic output by the topic generation model. Namely: performing problem classification judgment on the extracted potential problems or the existing intentions, if the problems are classified as public domain problems, continuing to analyze and execute, otherwise interrupting the analysis of the latest conversation potential instructions of the first terminal; for example, "I don't know the weather of three kinds of information", "I can check the price of the air ticket to three kinds of information at night" is a public domain problem, and information can be known through public channels; "I return to confirm my vacation" is a private domain question, which cannot be queried by public questions, and therefore, the method of embodiments of the present application can continue only if the potential question or existing question is intended to be a public domain question.

Here, it should be noted that, the context topic generation method can be generally divided into two methods, namely a method of extraction and a method of generation, where the extraction generally uses an algorithm to extract ready-made keywords and sentences from the dialogue text as topic sentences, but the method introduces excessive redundant information; the generation formula is to generate natural language description by an algorithm model according to dialogue text content through natural language generation (Natural Language Generation, NLG) technology instead of directly extracting sentences of the original text. The topic generation model in this substep combines the two methods of extraction and generation, and uses a Pointer generation network (Pointer-Generator Network, PGN) based on Fastformer based on the multi-round dialog management mechanism, where Fastformer is a neural network model designed for natural language processing tasks. The topic generation model firstly utilizes Fastformer model to obtain the embedded vector with the context information, and then utilizes PGN network to generate the context topic, wherein the most important point of the network in the embodiment of the application is to introduce a cluster Search (Beam Search) optimization algorithm in the decoding stage, so that a decoder can obtain more accurate conversation topic. That is, the working procedure of the topic generation model in this step is: inputting text content corresponding to the target call information into a PGN based on Fastformer to obtain an embedded vector of the text content, and decoding the embedded vector by using a decoding method introducing a Beam Search optimization algorithm to obtain a final call theme.

(C) Generating the problem intention information according to the first information and the conversation topic, namely, generating complete problem intention information of a user by fusing the first information and the conversation topic; for example: the first information indicates that: the price of the air ticket for the Chongqing is checked at the evening, the talk theme is that the Chongqing goes three times, and the generated problem intention information is as follows: and inquiring the price of the air ticket going to three parties in the national celebration festival. Compared with a mode of generating the problem intention information only based on the first information, the accuracy of the user problem intention information generated by the step based on the first information and the conversation theme is higher, so that the user experience is better.

As a more specific implementation manner, generating the problem intention information according to the first information and the conversation topic includes:

Calculating the association degree of the first information and the conversation theme; a specific example of this step is: firstly, respectively extracting the first information and the text embedded vector of the call theme, and secondly, calculating the relevance of the first information and the text embedded vector by using a cosine similarity measure, wherein the closer the cosine similarity is to 1, the higher the relevance of the cosine similarity is. The step may be specifically implemented by a pre-trained relevance judgment model, that is: inputting the first information and the conversation topic into a pre-trained relevance judgment model, and acquiring the relevance outputted by the relevance judgment model, but not limited to the method;

Under the condition that the association degree is larger than an association degree threshold value, carrying out text entity identification on the first information and the conversation theme to obtain entity information; that is, when the association degree of the first information and the call topic reaches a preset association degree threshold, it indicates that the first information and the call topic are aimed at the same topic, at this time, text entity identification is further performed on the first information and the call topic, so as to identify an entity associated with the first information and the call topic. Specifically, the step may use a Bi-directional Long Term Short-Term Memory (Bi-LSTM) +conditional random field (Conditional Random Field, CRF) method to perform a contextual entity identification on the first information and the call topic, so as to extract entity information such as a name, a place name, an organization, and the like;

Performing slot filling based on the entity information to generate the problem intention information; for example, in connection with the example of the step (C) above, the "there" in the sentence "i inquire about the ticket there later" may be intelligently filled with "three".

As an alternative implementation, step 104 includes:

sending a service request to the target intelligent assistant according to the problem intention information; here, the service request may carry the problem intention information or a problem related to the problem intention information, so that the target intelligent assistant generates service information according to the problem intention information, wherein the service information may be of a type that voice, picture or text and the like can be combined with call information interacted between the first terminal and the second terminal; for example, the problem intention information or the problem related to the problem intention information is "query Chongqing to three-layer tickets during a national celebration festival", and the service information generated by the target intelligent assistant may be pictures of Chongqing to three-layer tickets of different dates during the national celebration festival;

And receiving service information sent by the target intelligent assistant.

Further, as an optional implementation manner, after receiving the service information sent by the target intelligent assistant, the method further includes:

Converting the service information into a target media format, wherein the target media format is adapted to the format of call information between the first terminal and the second terminal; specifically, the intelligent assistant capability platform device may perform native interaction information conversion according to session description protocol (Session Description Protocol, SDP) information of the media stream between calls, and convert the service information into media format content that may be superimposed on the media stream between calls, for example, the format of the call information between the first terminal and the second terminal is a video format, and then the target media format may be a picture or the like;

According to the converted service information, sending second information to an intelligent assistant service application server (Application Server, AS), wherein the second information is used for indicating that the converted service information is synthesized with current call information between the first terminal and the second terminal; the second information carries the converted service information or information such as a download address of the converted service information.

Further, after receiving the second information, the intelligent assistant service AS sends first indication information to the VoNR + capability network element according to the second information, where the first indication information is used to indicate the synthesis of the specified content video, and the first indication information carries information such AS a content (intelligent interactive media content) download address of the converted service information; the VoNR + capacity network element indicates VoNR + media surface to request media element synthesis for specified content, the VoNR + media surface network element downloads intelligent interactive media content in real time according to a content download address in the request, and respectively synthesizes the intelligent interactive media content with downlink media streams of the first terminal and the second terminal, and finally sends the processed media streams to the corresponding first terminal and the second terminal respectively, so that users of the first terminal and the second terminal can obtain service information provided by a target intelligent assistant under the condition of not interrupting conversation, and experience of the users is improved.

As an alternative implementation, step 103 includes:

performing text entity recognition on the problem intention information to obtain a first keyword of the problem intention information;

And matching the first keywords with second keywords of each intelligent assistant to obtain the target intelligent assistant, wherein the matching degree of the first keywords and the second keywords is larger than a matching degree threshold.

Here, it should be noted that in the embodiment of the present application, the configuration of the intelligent assistant is required in advance, specifically, the configuration is performed according to the assistant capability requirement parameter, where in the voice recognition intelligent assistant registers the configured assistant capability, one assistant capability interface corresponds to one intelligent assistant scene. The configuration flow comprises the following steps:

Carrying out assistant energy force scene application configuration (scene name, corresponding interface address and the like) on a voice recognition intelligent assistant ability platform;

The helper capability integrity capability parameters (keywords of the intelligent helper) are configured, and the parameters correspond to the associated dictionary. Where the parameters represent all of the content required to initiate the helper capability content generation in its entirety, the parameter association dictionary identifies the user's intent and possible parameter keywords in the contextual theme guidance process.

That is, as shown in fig. 3, the processing procedure of the intelligent assistant capability platform device in the embodiment of the present application includes:

firstly, assistant capability integration configuration; namely: configuring each (speech recognition) intelligent assistant of the third party;

Secondly, call context potential instruction analysis, namely: the (speech recognition) intelligent assistant capability platform implements a call latency instruction analysis. Specifically, in the process of a user call, when a potential problem or an intention of the user is identified, the user is confirmed to need an intelligent assistant to intervene in the opportunity, and a user problem intention instruction is generated by combining the topic analysis result of the call. Matching the user problem intention instruction with the configured third party assistant capability configuration, and if the user problem intention instruction is matched with the configured third party assistant capability configuration, initiating an assistant capability calling condition;

Again, helper capability service invocation; namely: and initiating a request call to a third party assistant capability service to acquire corresponding assistant information result content.

When the intelligent assistant capability platform for (voice recognition) realizes analysis of the conversation potential instructions, semantic analysis is carried out on sentences newly generated by a user in real time, when the user is judged to encounter a problem or have a possible intention, topic scene analysis is carried out on the user in the conversation process, the correlation importance of the user problem and topic scenes is calculated, and when the correlation importance is higher, a corresponding user problem intention instruction is generated. And finally, matching the user problem intention instruction with the configured assistant capability service, and initiating a corresponding assistant capability service call.

Specifically, as shown in fig. 4, an example of analyzing a potential instruction according to call information, analyzing call information interacted between a first terminal and a second terminal, and obtaining a problem to be analyzed includes: today, what is suggested by our travel plan, when to go, i't know the weather of three years/i's evening to check the price of the tickets to the Chongqing there; thereafter, dialog topic context analysis is combined to derive the user's potential questions or intent to exist. The overall call context potential instruction analysis implementation steps are shown in fig. 5, and include:

Firstly, after the call starts, the latest conversation/sentence of the user A (the intelligent assistant service is started) is extracted in real time.

Secondly, extracting user problems; the method comprises the following steps: extracting potential user problems by a user problem extraction algorithm through semantic analysis;

Thirdly, classifying and judging public domain problems; the method comprises the following steps: the extracted user questions or intentions are subjected to question classification judgment, if the questions are classified into public domain questions, analysis and execution can be continued, otherwise, the analysis of the latest conversation potential instructions of the user A is interrupted;

later, the latest dialog topic analysis; the method comprises the following steps: after analysis and judgment that the public domain problem or intention exists in the user, comprehensively analyzing the above of the two-party call, and determining the latest relevant context T completely containing one theme;

Thirdly, analyzing the importance of the problems and the main body; the method comprises the following steps: carrying out correlation importance judgment on the extracted potential problems or intentions and the latest analyzed topic context T, if the correlation is higher, continuing to execute, otherwise, interrupting the latest conversation potential instruction analysis aiming at the user A;

Then, fusing the potential questions or intents with the related subject context T to regenerate a complete user question intent instruction M;

finally, generating information by the ability of the docking assistant; the method comprises the following steps: by analyzing the helper capability of M with the most likely match, namely: and extracting key parameters and configured intelligent assistant capability requirement parameters in the M, and generating relevant assistant information by interfacing with an assistant capability service.

The following describes, with reference to fig. 2, an interaction procedure of a terminal, an intelligent assistant service AS, a vonr+ capability network element, a VoNR + media plane network element, an intelligent assistant capability platform device, and an intelligent assistant related to a calling method of an intelligent assistant according to an embodiment of the present application:

1. the intelligent assistant service AS completes service pre-triggering according to service logic in the call of the first terminal and the second terminal;

2. the intelligent assistant service AS judges that a user starts an intelligent assistant function;

3. The intelligent assistant service AS sends a media stream copying request to a VoNR + capacity network element;

4. the VoNR + capacity network element, the VoNR + media surface network element and the intelligent assistant capacity platform device mutually interact to complete a media stream copying application;

5. The VoNR + capability network element reports the response (carrying the media control uniform resource locator (Uniform Resource Locator, URL) 1) to the intelligent helper service AS; here, the response is used to indicate the copy result (such as copy success or copy failure, etc.), and URL1 is the address of the intelligent assistant capability platform device;

6. The VoNR + media surface copies the uplink audio stream of the first terminal to the intelligent assistant capability platform equipment;

7. The intelligent assistant service AS transmits an intelligent assistant control instruction to the intelligent assistant capability platform equipment; here, in this step, the intelligent assistant service AS issues the intelligent assistant control instruction to the intelligent assistant capability platform device based on URL 1; wherein the control instruction carries intent and context scene control, URL2 (URL of intelligent helper service AS) for processing result notification, and the like; the control instruction instructs the intelligent assistant capability platform to start an intelligent assistant function (such as a voice intelligent assistant function) in the communication process;

8. The intelligent assistant capability platform device feeds back a response to the intelligent assistant service AS;

9. The intelligent assistant capability platform device starts voice monitoring; namely: the intelligent assistant capability platform equipment starts to monitor the voice of the audio stream copied to the local in real time according to the received control instruction;

10. The intelligent assistant service AS starts ticket recording; namely: the intelligent assistant service AS starts ticket recording after receiving the result of the intelligent assistant capability platform (the result can be referred to AS the result of the media stream copy application);

11. The method comprises the steps that a first terminal receives a user call media stream of a first user;

12. The first terminal sends the uplink media stream of the first terminal to the second terminal; here, the uplink media stream of the first terminal is the user call media stream of the first user in step 11;

13. VoNR + media surface network element copies the uplink media stream of the first terminal to the intelligent assistant capability platform device;

14. The intelligent assistant capability platform equipment performs potential instruction analysis and user intention instruction generation; namely: in the process of user communication, the intelligent assistant capability platform equipment identifies user problems from the copied audio stream, confirms the theme scene of communication in combination with the communication context content, and finally generates user problem intention instructions;

15. the intelligent assistant capability platform device initiates a request for related third party assistant capability; namely: after the intelligent assistant capability platform equipment identifies the user problem intention instruction, matching with all the configured third-party assistant capability services, and butting the matched third-party assistant capability services to request the intelligent assistant capability platform equipment to provide service information generation;

16. The intelligent assistant sends a service information receipt to the intelligent assistant capability platform equipment; namely: the third party assistant capability service generates corresponding service information (the generated service information can be voice, picture, text and other information) according to the service request, and feeds back the corresponding generated service information to the voice recognition intelligent assistant capability platform equipment;

17. The intelligent assistant capability platform equipment performs intelligent assistant information calling; namely: the intelligent assistant capability platform equipment performs primary interactive information conversion according to the SDP information of the media stream between calls, and converts the service information into media format content which can be overlapped on the media stream of the call;

18. the intelligent assistant capability platform device sends intelligent assistant information call (graph, text, audio information, etc.) to the intelligent assistant service AS;

19. the intelligent assistant service AS feeds back a response to the intelligent assistant capability platform equipment; here, the response is directed to the information received in step 18;

20. The intelligent assistant service AS sends a media element synthesis request (request for specifying content video synthesis and carrying an intelligent assistant information address) to the VoNR + capability network element; here, the intelligent assistant information address is a download address of the converted service information content (intelligent interactive media content);

21. the VoNR + capacity network element sends a response to the intelligent helper service AS;

22. A media element composition request; namely: the VoNR + capacity network element interacts with the VoNR + media plane network element, and requests the VoNR + media plane network element for synthesizing the media elements;

23. VoNR + media surface network element downloads the content to the intelligent assistant capability platform in real time according to the content download address;

24. VoNR + media surface network element makes content synthesis treatment for the downlink media streams of the first terminal and the second terminal; namely: voNR + media face the downlink audio and video streams of the first terminal and the second terminal, and corresponding content synthesis processing is carried out according to the format of the intelligent media interactive content;

25. VoNR + media surface network element forwards the uplink audio/video stream of the first terminal to the second terminal; namely: voNR + media surface network element sends the processed user B downlink video stream to the second terminal;

26. VoNR + media surface network element forwards the uplink audio/video stream of the second terminal to the first terminal; namely: voNR + media face the first terminal to send the processed user A downlink video stream.

Here, it should be noted that, in the call continuation stage, the 14-26 process in the above procedure is continuously and circularly performed, and the call user problem or intention and the call theme are continuously changed, so that it is necessary to intelligently interface with the corresponding third party intelligent assistant (assistant capability service) to generate service information by identifying the user problem or intention in real time and combining the theme scene confirmed by the call context. Specifically, in combination with the above steps, one of the steps is that, during the user call, the (voice recognition) intelligent assistant capability platform device recognizes the user problem from the copied audio stream, confirms the topic of the call in combination with the content of the call, generates the intention instruction of the user problem, and finally generates service information for the third party assistant capability service. The intelligent assistant service AS is responsible for triggering the service and generating and scheduling control of assistant service media information. Through coordination of (voice recognition) intelligent helper service AS, timeliness and accuracy of answers and services are ensured, and multi-mode intelligent interaction information scheduling pushing to the terminal is achieved. In the above-mentioned business process 20, the interactive media information pushing capability at the terminal side is realized by overlapping and synthesizing the multi-mode interactive media information which is up to VoNR +. Therefore, the intelligent assistant capability platform device implementing the method of the embodiment of the application needs to be mutually matched with VoNR +capability network elements, voNR +media surface network elements, intelligent assistant service AS and third-party intelligent assistants, so AS to realize the process of providing service information for each terminal based on the interaction information between the terminals.

In short, the method of the embodiment of the application determines whether the user encounters a mode of failing to answer the problem or having obvious intention through analyzing the latest word meaning of the user in real time by means of deep learning and machine learning algorithms, determines that the user possibly needs an intelligent assistant opportunity, further carries out related topic identification by combining the conversation context of the two parties at the opportunity, and determines whether to initiate corresponding intelligent assistant related service call by combining the related importance of the related topic of the user problem or intention. For example, the latest analysis user semantic recognition finds that the user encounters a problem or intention that 'uncertain weather/weather under late investigation' can not be solved, and the latest context of both parties in conversation is combined to travel to related topics of three-way travel in chat, and then the three-way weather inquiry is initiated to a weather inquiry service and displayed to the user by combining with the judgment that the relevance of the problem or intention to the related topics is higher.

The application condition of the method of the embodiment of the application can be that the intelligent assistant service is provided in various call scenes; the method has the advantages that the application effect is obvious, firstly, a user can operate without explicit voice instructions in the conversation process, the system can understand the problems or intentions of the user, and corresponding services are provided by combining the conversation upper identification theme, so that the operation flow of the user is greatly simplified, and the user experience is improved. And secondly, the scheme can accurately generate related topics in the conversation context, provide more accurate service and meet the personalized requirements of users. That is, the method of the present application provides a more intelligent and convenient way of intelligent (voice) assistant interaction. By analyzing the context to call up the relevant assistant directly, the user does not need to perform explicit voice instruction operations, but is naturally supported by the corresponding assistant's capabilities during the call. The method reduces the active operation of the user, improves the intelligent degree of the intelligent (voice) assistant, and simultaneously maintains the fluency and user experience of the call. In addition, the method provided by the embodiment of the application can also cope with some scenes without obvious voice starting instruction, accurately identify the intention and the requirement of the user and provide more accurate service.

The embodiment of the application also provides a calling device of the intelligent assistant, as shown in fig. 6, the device comprises:

A first obtaining module 601, configured to obtain call information between a first terminal and a second terminal during a call between the first terminal and the second terminal;

a determining module 602, configured to determine problem intention information of the first terminal according to the call information; the first terminal is a terminal for starting an intelligent assistant service;

A second obtaining module 603, configured to select, according to the problem intention information, a target intelligent assistant corresponding to the problem intention information;

And the calling module 604 is used for calling the service of the target intelligent assistant to obtain the service information provided by the target intelligent assistant.

Optionally, the determining module 602 includes:

The first processing sub-module is used for carrying out semantic analysis on the acquired call information and dividing the call information into N sections of call information, wherein the similarity of semantic analysis results of two adjacent sections of call information is smaller than a similarity threshold value, each section of call information contains a semantic meaning, and N is a positive integer;

The second processing sub-module is used for determining first section of call information in the N sections of call information as target call information, wherein the time corresponding to the first section of call information is earlier than the time corresponding to other sections of call information except the first section of call information;

And the intention recognition sub-module is used for carrying out intention recognition on the target call information and acquiring the problem intention information.

Optionally, the second processing sub-module includes:

the segmentation unit is used for dividing the call information into M sections of call information according to the time length, wherein M is positive integer;

the analysis unit is used for carrying out semantic analysis on current section of call information from the first section of call information in the M sections of call information, wherein the current section of call information is the first section of call information in all sections of call information which are not subjected to semantic analysis in the M sections of call information;

The first processing unit is used for splitting the current section of call information into P sections of call information according to P kinds of semantics if the current section of call information contains P kinds of semantics, and carrying out semantic analysis on the next section of call information adjacent to the current section of call information to obtain semantic similarity between the last section of call information in the P sections of call information and the adjacent next section of call information; or if the current section of call information contains a semantic meaning, carrying out semantic meaning analysis on the adjacent next section of call information to obtain semantic meaning similarity between the current section of call information and the adjacent next section of call information; p is an integer greater than or equal to 2;

The second processing unit is used for continuing to perform semantic analysis on the next adjacent section of call information after the obtained semantic similarity is greater than or equal to a similarity threshold value and combining the two sections of call information corresponding to the semantic similarity or under the condition that the obtained semantic similarity is smaller than the similarity threshold value until segmentation and/or combination of the M sections of call information are completed;

and the N sections of call information are obtained according to the segmentation and/or combination result of the M sections of call information.

Optionally, the intention recognition submodule includes:

The first acquisition unit is used for inputting the target call information into an intention recognition model and acquiring first information output by the intention recognition model, wherein the first information is used for indicating a potential problem or existing intention of the first terminal;

the second obtaining unit is used for inputting the target call information into a topic generation model to obtain a call topic output by the topic generation model under the condition that the potential problem or the existing intention belongs to a preset field;

And the generating unit is used for generating the problem intention information according to the first information and the conversation theme.

Optionally, the generating unit includes:

The calculating subunit is used for calculating the association degree between the first information and the conversation theme;

The identification subunit is used for carrying out text entity identification on the first information and the conversation topic under the condition that the association degree is larger than an association degree threshold value, and obtaining entity information;

And the generating subunit is used for carrying out slot filling based on the entity information and generating the problem intention information.

Optionally, the apparatus further comprises:

The conversion module is used for converting the service information into a target media format, wherein the target media format is adapted to the format of call information between the first terminal and the second terminal;

And the sending module is used for sending second information to the intelligent assistant service application server AS according to the converted service information, wherein the second information is used for indicating to synthesize the converted service information with the current call information between the first terminal and the second terminal.

It should be noted that, the calling device of the intelligent assistant provided by the embodiment of the present application can implement all the method steps implemented by the calling method embodiment of the intelligent assistant, and can achieve the same technical effects, and specific details of the same parts and beneficial effects as those of the method embodiment in the embodiment are not repeated here.

Embodiments of the present application also provide a communication device comprising a transceiver 710, a processor 700, a memory 720, and a program stored on the memory 720 and executable on the processor 700; wherein the processor 700 implements the method of invoking the intelligent assistant as described above when executing the program.

The transceiver 710 is configured to receive and transmit data under the control of the processor 700.

Wherein in fig. 7, a bus architecture may comprise any number of interconnected buses and bridges, and in particular one or more processors represented by processor 700 and various circuits of memory represented by memory 720, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. The transceiver 710 may be a number of elements, i.e. comprising a transmitter and a receiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 700 is responsible for managing the bus architecture and general processing, and the memory 720 may store data used by the processor 700 in performing operations.

It should be noted that, the communication device provided by the embodiment of the present application can implement all the method steps implemented by the calling method embodiment of the intelligent assistant, and can achieve the same technical effects, and detailed descriptions of the same parts and beneficial effects as those of the method embodiment in the embodiment are omitted.

The embodiment of the application also provides a readable storage medium, and the readable storage medium stores a program which, when executed by a processor, realizes the procedures of the calling method embodiment of the intelligent assistant as described above, and can achieve the same technical effects, and for avoiding repetition, the description is omitted here. The readable storage medium is, for example, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a magnetic disk, an optical disk, or the like.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. With such understanding, the aspects of the present application, in essence or contributing to the prior art, may be embodied in the form of a software product stored on a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) comprising instructions for performing the methods described in the various embodiments of the present application.

Therefore, the embodiments of the present application further provide a computer program product, which includes computer instructions, where the computer instructions, when executed by a processor, implement steps in a method for invoking an intelligent assistant as described above, and achieve the same technical effects, and are not described herein again for avoiding repetition.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

While the foregoing is directed to the preferred embodiments of the present application, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations are intended to be comprehended within the scope of the present application.

Claims

1. A method for invoking an intelligent assistant, the method comprising:

2. The method of claim 1, wherein obtaining problem intention information of the first terminal based on the call information comprises:

3. The method of claim 2, wherein performing semantic analysis on the acquired call information to divide the call information into N pieces of call information comprises:

4. The method of claim 2, wherein performing intent recognition on the target call information to obtain the question intent information comprises:

5. The method of claim 4, wherein performing entity identification on the first information and the call topic to generate the question intention information comprises:

6. The method according to claim 1, wherein the method further comprises:

7. A calling device of an intelligent assistant, the device comprising:

8. A communication device, comprising: a transceiver, a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor, implements a method of invoking the intelligent assistant of any of claims 1-6.

9. A readable storage medium, wherein a program or instructions is stored on the readable storage medium, which when executed by a processor, implements a method of invoking an intelligent assistant according to any of claims 1-6.

10. A computer program product comprising computer instructions which, when executed by a processor, implement a method of invoking a smart assistant as claimed in any of claims 1 to 6.