CN111949240A

CN111949240A - Interaction method, storage medium, service program, and device

Info

Publication number: CN111949240A
Application number: CN201910406508.4A
Authority: CN
Inventors: 沈浩翔; 姜飞俊; 风翮; 王鹏程; 张增明; 孙尧
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-05-16
Filing date: 2019-05-16
Publication date: 2020-11-17

Abstract

The embodiment of the invention provides an interaction method, a storage medium, a service program and equipment, wherein the method comprises the following steps: the service program for providing the voice interaction service for the application program receives the voice information, extracts keywords from the voice information, generates an interaction instruction according to the extracted keywords and user interaction intention information corresponding to the application program, and controls the application program to respond to the interaction instruction. In the scheme, the service program with the voice interaction service function is provided for the application program, and various user interaction intention information is provided for the application program, so that the application program which originally can only support a touch interaction mode can also support the voice interaction mode, the interaction mode of the application program is expanded, and the interaction convenience of a user is improved.

Description

Interaction method, storage medium, service program, and device

Technical Field

The present invention relates to the field of internet technologies, and in particular, to an interaction method, a storage medium, a service program, and a device.

Background

Various human-computer interaction modes have been widely applied to different human-computer interaction scenes, such as touch interaction on a view component displayed in an interface, voice interaction with an application program, and somatosensory interaction, gesture interaction and the like in scenes such as virtual reality.

In the prior art, various human-computer interaction modes are mutually independent, and one application program often singly supports a certain interaction mode. For example, an application program supports a touch interaction mode, an interface is displayed on a screen, in response to a touch operation of a user on an interface element in the interface, an operating system notifies the application program that the interface element is triggered, so that the application program calls a corresponding callback function to respond, and a response result is represented by, for example, skipping to display another interface in the screen.

Disclosure of Invention

The embodiment of the invention provides an interaction method, a storage medium, a service program and equipment, which are used for expanding the interaction mode of an application program.

In a first aspect, an embodiment of the present invention provides an interaction method, executed by a service program that provides a voice interaction service for an application program, where the method includes:

receiving voice information;

extracting keywords from the voice information;

generating an interaction instruction according to the extracted keywords and user interaction intention information corresponding to the application program;

and controlling the application program to respond to the interaction instruction.

In a second aspect, an embodiment of the present invention provides an interaction apparatus, where the apparatus includes:

the receiving module is used for receiving voice information;

the extraction module is used for extracting keywords from the voice information;

the generating module is used for generating an interaction instruction according to the extracted keywords and the user interaction intention information corresponding to the application program;

and the control module is used for controlling the application program to respond to the interactive instruction.

In a third aspect, an embodiment of the present invention provides an electronic device, including a first processor and a first memory, where the first memory stores executable code, and when the executable code is executed by the first processor, the first processor is caused to implement at least the interaction method described in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to implement at least the interaction method in the first aspect.

In a fifth aspect, an embodiment of the present invention provides a service program, where the service program provides a voice interaction service for an application program, and the service program includes:

the input/output interface is coupled with the application program and is used for receiving the voice information corresponding to the application program and controlling the application program to respond to the interactive instruction;

and the interaction engine is used for extracting keywords from the voice information and generating an interaction instruction according to the extracted keywords and the user interaction intention information corresponding to the application program.

In a sixth aspect, an embodiment of the present invention provides an interaction method, which may be executed by a service program that provides a voice interaction service for an application program, where the method includes:

acquiring user interaction intention information corresponding to an application program, wherein the user interaction intention information comprises interaction behavior information and at least one behavior object parameter;

and sending the user interaction intention information to the application program or a server corresponding to the application program, so that the application program or the server establishes the corresponding relation between the multiple interfaces and the user interaction intention information according to the data categories corresponding to the multiple interfaces contained in the application program and the data categories corresponding to the at least one behavior object parameter.

In a seventh aspect, an embodiment of the present invention provides an interaction apparatus, where the apparatus includes:

the acquisition module is used for acquiring user interaction intention information corresponding to the application program, wherein the user interaction intention information comprises interaction behavior information and at least one behavior object parameter.

And the sending module is used for sending the user interaction intention information to the application program or a server corresponding to the application program so as to enable the application program or the server to establish the corresponding relation between the multiple interfaces and the user interaction intention information according to the data categories corresponding to the multiple interfaces contained in the application program and the data categories corresponding to the at least one behavior object parameter.

In an eighth aspect, an embodiment of the present invention provides an electronic device, which includes a second processor and a second memory, where the second memory stores executable code, and when the executable code is executed by the second processor, the second processor is caused to implement at least the interaction method according to the sixth aspect.

In a ninth aspect, the present invention provides a non-transitory machine-readable storage medium, on which an executable code is stored, and when the executable code is executed by a processor of an electronic device, the processor is caused to implement at least the interaction method in the sixth aspect.

In a tenth aspect, an embodiment of the present invention provides an interaction method, which may be executed by a service program that provides a voice interaction service for an application program, where the method includes:

receiving first voice information in response to an operation of starting a voice interaction function of an application program;

controlling the application program to respond to the first voice information;

receiving second voice information;

and if the second voice message conforms to the set continuous conversation characteristic, controlling the application program to respond to the second voice message.

In an eleventh aspect, an embodiment of the present invention provides an interaction apparatus, where the apparatus includes:

the receiving module is used for responding to the operation of starting the voice interaction function of the application program and receiving first voice information;

the control module is used for controlling the application program to respond to the first voice information;

the receiving module is further configured to receive second voice information;

the control module is further configured to control the application program to respond to the second voice message if the second voice message meets the set continuous conversation feature.

In a twelfth aspect, an embodiment of the present invention provides an electronic device, including a third processor and a third memory, where the third memory stores executable code, and when the executable code is executed by the third processor, the third processor is caused to implement at least the interaction method according to the tenth aspect.

In a thirteenth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium, having executable code stored thereon, which, when executed by a processor of an electronic device, causes the processor to implement at least the interaction method in the tenth aspect.

In a fourteenth aspect, an embodiment of the present invention provides an interaction method, which may be executed by an application program or a server corresponding to the application program, where the method includes:

receiving user interaction intention information which is sent by a service program corresponding to an application program and corresponds to the application program, wherein the user interaction intention information comprises interaction behavior information and at least one behavior object parameter, and the service program provides voice interaction service for the application program;

and establishing the corresponding relation between the plurality of interfaces and the user interaction intention information according to the data types corresponding to the plurality of interfaces contained in the application program and the data types corresponding to the at least one behavior object parameter.

In a fifteenth aspect, an embodiment of the present invention provides an interaction apparatus, where the apparatus includes:

the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving user interaction intention information which is sent by a service program corresponding to an application program and corresponds to the application program, the user interaction intention information comprises interaction behavior information and at least one behavior object parameter, and the service program provides voice interaction service for the application program;

and the creating module is used for establishing the corresponding relation between the plurality of interfaces and the user interaction intention information according to the data types corresponding to the plurality of interfaces contained in the application program and the data types corresponding to the at least one behavior object parameter.

In a sixteenth aspect, an embodiment of the present invention provides an electronic device, including a fourth processor and a fourth memory, where the fourth memory stores executable code, and when the executable code is executed by the fourth processor, the fourth processor is caused to implement at least the interaction method of the fourteenth aspect.

In a seventeenth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium having executable code stored thereon, which, when executed by a processor of an electronic device, causes the processor to implement at least the interaction method in the fourteenth aspect.

In the embodiment of the invention, a scheme for controlling the application program in a voice interaction mode is provided, and the voice sent by the user can be responded in a visual mode, namely an interface response mode, so that the fusion of the voice and the visual mode is realized. To implement this solution, a service program is designed for providing voice interaction services for applications, wherein the service program can be implemented as a software package or a tool package.

Specifically, when a user wants to interact with an application program through a voice interaction mode, voice information is output. The voice information is collected and transmitted to the service program, the service program extracts keywords representing the current interaction intention of the user through voice processing such as voice recognition and semantic understanding on the voice information, then generates an interaction instruction according to the extracted keywords and the user interaction intention information corresponding to the application program, and controls the application program to respond to the interaction instruction. The final response result of the application to the interactive instruction is, for example: and performing certain operation on certain interface elements in the current interface, jumping to other interfaces and the like.

In the scheme, the service program with the voice interaction service function is provided for the application program, and various user interaction intention information is provided for the application program, so that the application program which originally can only support a touch interaction mode can also support the voice interaction mode, the interaction mode of the application program is expanded, and the interaction convenience of a user is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a flowchart of an interaction method according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating a change of an interface state in a voice interaction process according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a change of an interface state in another voice interaction process according to an embodiment of the present invention;

fig. 4 is a schematic diagram illustrating a change of an interface state in another voice interaction process according to an embodiment of the present invention;

FIG. 5 is a flow chart of another interaction method provided by the embodiments of the present invention;

FIG. 6 is a flowchart of another interaction method provided by the embodiments of the present invention;

FIG. 7 is a flowchart of another interaction method provided by the embodiments of the present invention;

FIG. 8 is a flowchart of another interaction method provided by the embodiments of the present invention;

FIG. 9 is a schematic diagram illustrating a service program according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an interaction apparatus according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of an electronic device corresponding to the interaction apparatus provided in the embodiment shown in fig. 10;

FIG. 12 is a schematic structural diagram of another interactive apparatus according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of an electronic device corresponding to the interaction apparatus provided in the embodiment shown in fig. 12;

fig. 14 is a schematic structural diagram of another interaction apparatus according to an embodiment of the present invention;

fig. 15 is a schematic structural diagram of an electronic device corresponding to the interaction apparatus provided in the embodiment shown in fig. 14;

FIG. 16 is a schematic structural diagram of another interaction apparatus according to an embodiment of the present invention;

fig. 17 is a schematic structural diagram of an electronic device corresponding to the interaction apparatus provided in the embodiment shown in fig. 16.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well. "plurality" generally includes at least two unless the context clearly dictates otherwise.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.

In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

Fig. 1 is a flowchart of an interaction method according to an embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:

101. and receiving voice information.

102. And extracting keywords from the voice information.

103. And generating an interaction instruction according to the extracted keywords and the user interaction intention information corresponding to the application program.

104. The control application responds to the interactive instructions.

The interaction method can be executed by a service program for providing voice interaction service for the application program, and the service program can be specifically implemented as a Software package or a tool package or plug-in, such as a Software Development Kit (SDK). The service program provides the user with the ability to manipulate the application program in a voice interaction manner.

It should be noted that the application program in this document may include a client application program under a client/server architecture, and at this time, the service program in this document should not be understood as a service application program corresponding to the client application program. The service program in this context has the role of: the user is helped to use the application by voice.

In addition, in practical applications, the service program may provide voice interaction services for one or more application programs, that is, the service program may be shared by one or more application programs.

Conventionally, after a user opens an application, the application may display an interface in which the user may perform touch operation, and in response to the touch operation of the user, an interface state of the application may change, for example, a jump of the interface may be performed, a display state of some interface elements in a current interface may be changed, and the like. That is, conventionally, the user uses the application program by means of touch interaction.

However, for example, the application is an application in a vehicle-mounted environment, and a user may not be convenient to perform a touch operation during driving, which may cause a driving safety problem. The interaction method provided by the embodiment of the invention is provided for overcoming the defect that the application program can only be used in a touch interaction mode, and based on the interaction method, a user can use the application program in a voice interaction mode, and when the user controls the application program in the voice interaction mode, the response result of the application program can still be the original interface response effect, so that the fusion effect of voice control and interface response is realized.

In summary, after a user opens an application, when the application needs to be used in a voice interactive manner, the user can speak voice information. At this time, the voice collecting device, such as a microphone or an audio sensor, may collect the voice information output by the user, and the collected voice information may be finally transmitted to the service program.

In the embodiment of the present invention, there are two implementation manners for the service program to receive the voice information output by the user for a certain application program:

first, when the service program is used by a plurality of application programs, each application program is started, a notification may be sent to the service program, so that the service program starts a thread for the corresponding application program, and the priority of the started threads is determined according to the time sequence in which each application program is started. Based on this, after a certain user speaks voice information, the voice acquisition device directly sends the acquired voice information to the service program, and the service program determines that the voice information corresponds to the thread with the highest priority according to the priority of each thread started currently, namely determines that the voice information is specific to the application program corresponding to the thread.

Secondly, if the application program A is started later than the application program B, the application program A started at last can obtain the control right of the voice acquisition equipment, at the moment, the voice acquisition equipment sends the acquired voice information to the application program A, and the application program A sends the voice information to the service program.

It should be noted that, after a user opens an application, the user may not need to operate the application in a voice interaction manner, but the user may call another person at this time or chat with another person around the user, and a voice message may be generated at this time, and the voice message should not be sent to the service program because the voice message is not voice for operating the application.

Therefore, in practical applications, when a user wants to manipulate an application program in a voice interaction manner, a trigger signal should be output to inform the application program that the application program is to be manipulated in the voice interaction manner, and the trigger signal corresponds to an operation for starting a voice interaction function. Therefore, the voice information refers to the voice information output after the user starts the voice interaction function, and the starting of the application program in the two modes may refer to the application program being started with the voice interaction function.

Optionally, a specific button may be set in each interface of the application program, so that, no matter which interface is displayed in the display state, the user wants to manipulate the application program in a voice interaction manner, the voice interaction function may be triggered based on clicking the specific button displayed in the current interface, that is, the application program is informed that the user wants to manipulate the application program in the voice interaction manner.

Or, optionally, when the user wants to manipulate the application program in a voice interaction manner, the user may speak a set wakeup word to wake up the application program. For example, if the wake-up word of an application is "hello, smart speaker", the user needs to speak the wake-up voice of "hello, smart speaker" first, and at this time, the application starts the voice interaction function, and then the user can speak the voice information corresponding to the application that is actually used.

Regardless of the manner in which the voice interaction function is activated, the application program may notify the service program of the information about the activation of the voice interaction function, so that the service program opens a thread for the corresponding application program to provide the voice interaction service for the corresponding application program.

Optionally, after the voice interaction function of an application is started, the service program may output information for guiding the user to use the voice interaction function, and the output mode may be a voice output mode or a visual output mode. Taking the visual output manner as an example, the service program may control the application program to display information for guiding the user to use the voice interaction function on the currently displayed interface.

For example, referring to fig. 2, it is assumed that the currently displayed interface of a music application is interface 1, the specific button is displayed in interface 1, and a singer list is also displayed in interface 1. The user clicks the specific button to open the voice interaction function of the music application, so that the service program can output the guidance information illustrated in fig. 2 to the application program: try and i say your smart speaker; i want to listen to Zhou Jieren's song; i want to listen to balladry. The first piece of guiding information can inform the user what waking voice should be spoken when the voice interaction function is started, and based on this, the user can start the voice interaction function next time by speaking the content of the guiding information. The second piece of guiding information and the third piece of guiding information may inform the user what kind of voice information can be specifically spoken in the normal voice interaction with the music application, i.e. what kind of voice information the music application can respond to.

Next, the user may speak a voice message indicating the user's interaction intention, as shown in fig. 2, such as "i want to listen to songs of liudels". The voice collecting device may send the collected voice information to the music application program, and the music application program may then send the voice information to the service program.

After receiving the voice information sent by the user, the service program can firstly extract keywords from the voice information, namely extract keywords indicating the interaction intention of the user, namely identify what kind of operation and control are required to be specifically carried out on the application program by the user. The keyword extraction step can be completed locally in the service program, and the service program can also send the voice information to the cloud end, and the cloud end node completes the keyword extraction.

The process of extracting keywords from the voice information can be realized by an Automatic Speech Recognition (ASR) technology and a Natural Language Understanding (NLU) technology. Specifically, firstly, the speech information may be converted into text information by an ASR technique, and then the text is subjected to NLU processing, such as word segmentation, part of speech, entity category (i.e., data category) tagging, and the like, so as to extract each keyword included in the speech information. For example, for the speech information "i want to listen to songs of liu de hua", the final extracted keywords may include: listen, Liu De Hua, Song.

After extracting each keyword contained in the voice information, an interactive instruction can be generated according to the extracted keyword and the various user interaction intention information corresponding to the application program, and then the application program is controlled to respond to the interactive instruction. The process of providing various user interaction intention information for the application program will be described in the following embodiment, and only the generation process of the interaction instruction is described in this embodiment.

Generating an interaction instruction according to the extracted keyword and the user interaction intention information corresponding to the application program, which can be realized as follows: and determining target user interaction intention information matched with the keywords from the user interaction intention information corresponding to the application program, and generating an interaction instruction according to the target user interaction intention information.

The user interaction intention information is explained first. User interaction intention information can be provided for all or part of the interfaces in the application program in advance, and the user interaction intention information indicates what interaction behaviors can be executed in the corresponding interfaces by users and parameter information corresponding to the execution of the interaction behaviors.

Essentially, a user interaction intention information can be expressed as a structural body which is composed of interaction behavior information and at least one behavior object parameter. Wherein the interactive behavior information is generally expressed as an interactive behavior name.

For example, common interactive activities include playing, pausing playing, muting, jumping to a previous one, jumping to a next one, etc., such as for a music application.

For pausing the playing of the interactive behavior, the corresponding behavior object parameters may be: the name of the song. Therefore, the user interaction intention information formed by the interaction behavior information and the interaction object parameter indicates that the user can execute pause playing control on a certain song in a voice interaction mode. The user interaction intention information may be expressed as: pause (song name), wherein pause corresponds to interactive behavior information and the content in parentheses corresponds to behavior object parameters.

For playing the interactive behavior, the corresponding behavior object parameters may include: song name, singer name, language, heat, genre, album name, etc. Thus, the corresponding user interaction intention information may be expressed as: play (song name, singer name, language, heat, style, album name), where play corresponds to interactive behavior information and the content in parentheses corresponds to various behavior object parameters.

Based on the introduction of the structure of the user interaction intention information, an interaction instruction is generated according to the keyword extracted from the voice information and the user interaction intention information corresponding to the application program, which can be specifically realized as follows: firstly, determining target user interaction intention information which is matched with first words in keywords and contains interaction behavior information from user interaction intention information corresponding to an application program, wherein the first words are words which indicate interaction behaviors in the keywords; secondly, determining a parameter value of at least one behavior object parameter contained in the target user interaction intention information according to at least one second word in the keywords, wherein the extracted keywords are composed of the first word and the at least one second word; and finally, generating an interactive instruction according to the target user interactive intention information with the determined parameter values.

In actual application, a database containing information about each user interaction intention corresponding to a certain application program may be maintained in the service program. Therefore, in an optional embodiment, after obtaining the keyword included in the voice information of the user, the service program may compare the keyword with each piece of user interaction intention information in the database, respectively, to find out the target user interaction intention information matching the keyword.

In the comparison process, first, a first word representing the interaction may be screened out from the keywords, and the first word is a verb. For example, in the foregoing example, it is assumed that the extracted keywords are: listen, Liu De Hua, Song, the first word to indicate an interaction is "listen". And further searching the interactive behavior information matched with the first word from the interactive behavior information corresponding to the various user interactive intention information corresponding to the application program. Specifically, the similarity between the first word and the interaction behavior information (i.e., the interaction behavior name) included in each piece of user interaction intention information may be calculated, so as to find the interaction behavior information whose similarity with the first word meets the requirement, where the user interaction intention information corresponding to the interaction behavior information is the target user interaction intention information. Taking a plurality of interactive behavior information contained in the plurality of user interactive intention information as play, pause play, previous and next as an example, it can be known from the similarity calculation result that the interactive behavior information of "play" is matched with the first word "listen", and thus the user interactive intention information of play (song name, singer name, language, heat, style, album name) is the target user interactive intention information.

The similarity between two words can be calculated by using the related art, such as converting two words into word vectors, calculating the distance between word vectors, and the like. In addition, for convenience of calculation, when the language corresponding to the voice information output by the user is different from the language corresponding to the user interaction intention information, the languages of the voice information and the language corresponding to the user interaction intention information can be unified through translation, and then similarity calculation is performed.

After the target user interaction intention information is obtained, determining a parameter value of at least one behavior object parameter contained in the target user interaction intention information according to the remaining words (namely the at least one second word) except the first word in the keywords. In the above examples, the at least one second term is: liu De Hua and Ge. Therefore, the two terms can be assigned to the interaction object parameters of the corresponding data categories in the target user interaction intention information according to the data categories corresponding to the two terms respectively. The target user interaction intention information pl ay (song name, singer name, language, heat, style, album name) includes therein a plurality of interaction object parameters, "liu de hua", which is the second term corresponding to the singer name among the interaction object parameters, so that the parameter value of the interaction object parameter is "liu de hua".

Finally, generating an interactive instruction according to the target user interaction intention information with the determined parameter values, wherein in the above example, the generated interactive instruction may be represented as: play (Liudebua).

It should be noted that, as can be seen from the above example, when a plurality of behavior object parameters are included in some user interaction intention information, the relationship that the plurality of behavior object parameters are "or", that is, it is not required that all words corresponding to the behavior object parameters are necessarily included in the voice information of the user, as long as any word corresponding to any one of the behavior object parameters is included. In addition, which user interaction intention information the voice information of the user hits is determined by the word included in the voice information indicating the user interaction behavior.

For convenience of understanding, for example, the following two pieces of speech information are assumed:

and voice A: "I want to listen to the song of Liudebua". And B, voice B: "I want to listen to the water of forgetting of Liu De Hua". The keywords extracted from the speech a are: listen, Liu De Hua, Song. The keywords extracted from the speech B are: listen to, Liu De Hua, forget to love water. Because words in the two voices indicating the user interaction behavior are both 'listening' and are matched with the interaction behavior information of playing (namely play), the two voices are determined to hit the user interaction intention information of play (song name, singer name, language, heat, style and album name). Only, the keyword corresponding to the voice a only contains one interactive object parameter in the user interaction intention information: the singer name is liu de hua, and the keyword corresponding to the voice B contains two interactive object parameters in the user interactive intention information: singers are named as Liu German and songs are named as forgetful water. Therefore, the interactive instruction corresponding to the voice a is play (liu de hua), and the interactive instruction corresponding to the voice B is play (liu de hua, forgetting to do water).

In the foregoing embodiment, since the service program maintains the database containing the user interaction intention information corresponding to the application program, the service program may directly compare the keyword with the user interaction intention information in the database, respectively, to find the target user interaction intention information matching the keyword.

In addition, the service program can also determine the target user interaction intention information matched with the keyword through the following optional modes:

acquiring an interface identifier of a currently displayed interface of an application program;

determining whether the currently displayed interface is associated with target user interaction intention information matched with the keywords or not according to the corresponding relation between the created interface identification and the user interaction intention information;

and if the currently displayed interface is not associated with the target user interaction intention information matched with the keywords, determining the target user interaction intention information matched with the keywords from a database which is corresponding to the application program and stores the user interaction intention information.

In brief, in the above alternative, the service program may first find whether there is target user interaction intention information matching the keyword in the interface currently displayed by the application program, and if not, then find the target user interaction intention information matching the keyword from the database. This is because, in practical applications, users often speak voice information for some data contents in the currently displayed interface after seeing the data contents to realize interaction with the data contents. For example, a song of zhou jilun blue and white porcelain exists in the currently displayed interface, and the user only speaks the voice message of "i would like to listen to zhou jilun blue and white porcelain".

Based on this, in this alternative, after receiving the voice message, the service program may request the interface identifier of the interface currently displayed by the service program from the application program, or, when the voice message is sent to the service program by the application program, the application program may send the voice message and the interface identifier of the currently displayed interface to the service program together.

The user interaction intention information corresponding to the application program is specifically corresponding to the interfaces contained in the application program, that is, corresponding user interaction intention information is provided for one or more interfaces contained in the application program, so that when the user interaction intention information is provided for the application program, a corresponding relation between the interface identification and the user interaction intention information can be created, and the corresponding relation is maintained in the service program, and therefore the service program can determine whether target user interaction intention information matched with the keyword is associated on the currently displayed interface or not according to the corresponding relation. The determining process is to compare the keywords with the user interaction intention information associated with the interface, and the comparing process is as described above and is not repeated.

After the service program generates the interactive instruction according to the target user interactive intention information, the service program can control the application program to respond to the interactive instruction. Regardless of which of the above alternatives is based on determining the target user interaction intention information and generating the interactivity instructions, the process of responding to the interactivity instructions may be implemented in any of the following alternatives.

In an alternative embodiment, the control application, in response to the interactive instructions, may be implemented as: and sending the interactive instruction to the response object so that the response object responds to the interactive instruction. And the response object is the application program or a server corresponding to the application program.

That is, optionally, the service program may send the interactive instruction directly to the application program, and the application program responds to the interactive instruction, for example, the interactive instruction is play (liu de hua), and in order to respond to the interactive instruction, the application program may need to perform actions of searching for songs in liu de hua, adding the searched songs to a playlist, jumping to a playlist interface, playing the songs, and the like.

Optionally, the service program may also directly send the interactive instruction to a server corresponding to the application program, and the server responds to the interactive instruction, for example, if the interactive instruction is play (liudebua), the server may need to perform an action of searching for songs in liudebua in order to respond to the interactive instruction, and add the searched songs to the playlist, and further, the server sends a response instruction to the application program through the service program, where the response instruction is used to instruct the application program to jump to a playlist interface to play the songs.

Assuming that the interactive instruction is play (liu de hua), in the above alternative embodiment, the application program or the server, in response to the result of the interactive instruction, may jump to the interface 2 containing a plurality of songs in liu de hua, as shown in fig. 2, and play the songs in sequence.

In another alternative embodiment, the control application, in response to the interactive instructions, may be implemented as:

acquiring an interface identifier of an interface currently displayed by an application program and data content contained in the interface identifier;

if the currently displayed interface is determined to be associated with target user interaction intention information according to the corresponding relation between the created interface identification and the user interaction intention information, and the data content comprises target data content matched with the keywords, generating a response instruction according to the target data content;

and sending the response instruction to the application program so that the application program executes the response instruction.

In this embodiment, after the service program generates the interactive instruction, it first searches whether the data content matching the interactive instruction is found in the currently displayed interface of the application program, and if so, it can directly respond to the interactive instruction in the currently displayed interface. That is to say, the response to the interactive instruction can be completed by executing a control-level operation in the currently displayed interface, where the control-level operation refers to executing one or more touch operations in the currently displayed interface.

Specifically, the service program may first obtain an interface identifier of an interface currently displayed by the application program and data content included in the interface, where the data content may refer to data content included in the interface and supporting voice interaction. The interface identifier and the data content may be obtained by referring to the above-mentioned interface identifier obtaining method, which is not described in detail.

For example, as shown in fig. 3, it is assumed that the currently displayed interface is an interface a, the interface a includes a plurality of song names in liudeluxe, a play state graphic is displayed in association beside each song name, and the interface a further includes a search control and an operation control entry associated with a plurality of operation items. In the interface a, the data content supporting the voice interaction is the names of the songs.

After the service program obtains the interface identifier of the current display interface, first, it may determine whether the current display interface is associated with the target user interaction intention information corresponding to the interaction instruction according to the correspondence between the created interface identifier and the user interaction intention information. Specifically, the interaction instruction carries interaction behavior information in the target user interaction intention information, so that the service program can know whether the currently displayed interface is associated with the target user interaction intention information or not through comparison of the interaction behavior information.

For example, if the user speaks "hear luvial ice and rain in liu de hua" voice information based on the interface a, as shown in fig. 3, the interactive instruction generated by the service program is: play (liu de hua, ice rain), the interaction behavior information contained in the target user interaction intention information corresponding to the interaction instruction is: play. Therefore, if the currently displayed interface is associated with the user interaction intention information of play (song name, singer name, language, heat, style, album name), it is determined that the target user interaction intention information corresponding to the interaction instruction is registered on the currently displayed interface.

If it is determined that the currently displayed interface is associated with the target user interaction intention information corresponding to the interaction instruction, further, the service program may determine whether target data content matching the extracted keyword exists in the data content included in the currently displayed interface. Specifically, the data content matched with the keyword is the data content matched with the interactive object parameter value included in the interactive instruction, because the interactive object parameter value included in the interactive instruction is obtained based on the keyword. In the above example, it is determined whether the currently displayed interface includes the song "ice rain".

If it is determined that the currently displayed interface is associated with the target user interaction intention information corresponding to the interaction instruction and includes the target data content matched with the extracted keyword, a response instruction can be generated according to the target data content and sent to the application program, so that the application program executes the response instruction. In the above example, the response instruction generated according to the target data content is used to instruct to play the song "ice rain", so that, as shown in fig. 3, the interface state after the application executes the response instruction is shown as interface b, and the song "ice rain" is switched to the playing state.

In the alternative embodiment shown in fig. 3, it is assumed that the currently displayed interface is associated with target user interaction intention information, and the interface includes target data content matching the keyword. And when it is determined that the interface is not associated with the target user interaction intention information or the interface does not include the target data content matched with the keyword, the service program may send the interaction instruction to the application program or a server corresponding to the application program, and the application program or the server responds to the interaction instruction.

For example, as shown in fig. 4, it is assumed that the currently displayed interface is interface a in which a singer list including a plurality of singer names is displayed. For example, the speech spoken by the user is "forgetting water by playing liu de hua", and the interactive instruction generated by the service program is as follows: play (Liudebua, forgetting water). At this time, although the interface a is associated with the target user interaction intention information corresponding to the interaction instruction: play (song name, singer name, language, heat, style, album name), but since the interface a does not contain data content corresponding to the song name "forgetting water", the service program sends the interactive instruction to the application program or the server. And if the interactive instruction is sent to the server, based on the interactive instruction, the server determines a target interface supporting the interactive instruction in a plurality of interfaces contained in the application program, and responds to the interactive instruction according to the target interface. The target interface supporting the interactive instruction refers to that target user interactive intention information corresponding to the interactive instruction is associated on the target interface, and the data category corresponding to the target interface is matched with the data category corresponding to the keyword.

Specifically, the server traverses a plurality of interfaces associated with user interaction intention information in the application program, and firstly screens out the interfaces associated with target user interaction intention information, wherein the assumption comprises an interface B and an interface C. Interface B and interface C are both interfaces for carrying song playlists, with the difference that the paths of the two are different, assuming that interface B corresponds to a "favorite list" in the application and interface C corresponds to a "recent playlist" of the application. It can be known that both interface B and interface C correspond to the data category of the singer's name and song's name according to the data category attribute associated when interface B and interface C were initially designed. The data type corresponding to the keyword, that is, the data type corresponding to the interactive object parameter value included in the interactive instruction, is, under the assumption that the interactive instruction is play (liu de hua, forget about water), the interactive object parameter value is "liu de hua" and "forget about water", and the corresponding data type is the name of the singer and the name of the song, so the interface B and the interface C are the target interface corresponding to the interactive instruction. When a plurality of target interfaces exist, one of the target interfaces can be selected according to the priority of the target interface, and if the priority of the interface B is higher than that of the interface C, the interface B is finally selected, and the interaction instruction is responded according to the interface B.

The response to the interactive instruction according to the interface B may be, for example: and if the song of the forgetting water of Liu der Waals does not exist in the interface B, searching the song and adding the song to the interface B, determining the identifier of the song in the interface B, informing the application program of jumping to the interface B and playing the song according to the identifier. The final output interface state of the application is shown in fig. 4.

To sum up, the embodiments provide the service program with the voice interaction service function for the application program and provide various user interaction intention information for the application program, so that the application program which originally can only support the touch interaction mode can also support the voice interaction mode, the interaction mode of the application program is expanded, and the interaction convenience of the user is also improved. Moreover, the voice interaction mode is not limited to the mode of only responding in the voice, but also can respond in the interface, namely in the visual mode, so that the fusion of the voice and the visual is realized.

Fig. 5 is a flowchart of another interaction method provided in an embodiment of the present invention, and as shown in fig. 5, the method may include the following steps:

501. and receiving voice information.

502. And extracting effective keywords from the voice information.

The interaction method provided by the embodiment of the invention can also realize the function of continuous conversation, wherein the continuous conversation refers to that in a voice interaction scene, the voice interaction scene enters an effective conversation state only through the awakening word and the specific button during the initial conversation, whether the user is in conversation with the application program is automatically monitored subsequently, and under the condition that a certain condition is met, the voice information sent out by the user in sequence is considered to be in one continuous conversation, so that the voice interaction function is restarted in a mode of restarting the awakening word and the like.

Therefore, the extraction of the effective keywords from the voice information can be understood as performing validity check on the voice information, and extracting the keywords contained in the voice information if the voice information is valid.

The voice information is checked for validity, that is, whether the voice information and the previous voice information are in a continuous conversation is checked, and if the voice information and the previous voice information are in the continuous conversation, the voice information is considered to be valid. And if the voice information is considered to be effective, the subsequent response processing process is carried out, otherwise, error prompt information can be output through an interface or voice.

It will be appreciated that the voice information in the text is not meant to be a wake-up voice that speaks a wake-up word.

Alternatively, if the time difference between the receiving time of the voice message spoken by the current user and the receiving time of the previous voice message is less than the set time length (for example, 10 seconds), it may be determined that the currently spoken voice message is valid. This means that two adjacent pieces of speech information are considered to be in the same continuous conversation if the difference in reception time between the two pieces of speech information uttered after the user has started the speech interaction function does not exceed 10 seconds.

Optionally, if a time difference between the receiving time of the voice message spoken by the current user and the receiving time of the previous voice message is smaller than a set time length, and the sound source direction corresponding to the voice message matches the sound source direction of the previous valid voice message, it is determined that the voice message is valid. In this method, not only the reception time difference between two adjacent pieces of speech information but also the sound source direction of the speech information is taken into consideration.

The sound source direction is considered to avoid interference of the following phenomena: during the voice interaction process with the application program, the user speaks a few words with the friend nearby, and the words spoken with the friend should not be regarded as voice information interacted with the application program. When the user interacts with the application program, the user usually faces the position of the application program, when speaking with the friend, the user usually faces the position of the friend, and the two positions may be different, so that the interference voice can be filtered based on the direction of the sound source, and the subsequent steps are not performed on the interference voice.

In addition, there may often be a situation in a continuous dialog scenario: the result of the response of the user to the last speech message is that the application needs to output a certain audio signal, such as playing a certain song. Or, at the same time, the service program responds to the current user saying "i want to listen to the songs of liudeluxe", on one hand, the processing of the steps is performed on the voice information, and on the other hand, a response voice such as "good, recommend for you immediately" can also be output, so as to improve the human-computer interaction experience of the user. Alternatively, the user outputs the voice information while other sound sources also emit audio signals.

At this time, the collected voice information may include, in addition to the voice information of the user actually performing voice interaction with the application program, an interfering audio signal emitted by the application program, the service program, or another sound source, so that at this time, the collected voice information may be processed to remove the interfering audio signal therein. Alternatively, echo cancellation techniques may be used to remove interfering audio signals currently output by applications, services, etc. The procedure of the echo cancellation process can be implemented by referring to the related art, which is not described herein.

Combining the above contents, the effective extraction of the keywords for the voice information is as follows: if the time difference between the receiving time of the voice message and the receiving time of the previous voice message is less than the set time length, extracting the keywords contained in the voice message as effective keywords; or, if the time difference between the receiving time of the voice message and the receiving time of the previous voice message is less than the set time length, and the sound source direction of the voice message is matched with the sound source direction of the previous effective voice message, extracting the keyword contained in the voice message as an effective keyword.

503. And correcting the extracted keywords according to the established entity word association relation.

In practical application, in the process of extracting the keyword, firstly, the voice information is converted into the text information, and the converted text information may have errors, for example, the voice information actually spoken by the user is "i want to listen to zhou jilun blue and white porcelain", and the text information after text conversion may be: i want to listen to Zhoujie blue and white porcelain. Then, at this time, the keywords extracted based on the text information are: can, Zhou Jie, blue and white porcelain. Obviously, the method and the device are not in accordance with the real intention of the user, and for this reason, the embodiment of the present invention provides a scheme for correcting the extracted keyword based on the entity word association relationship. The entity word association relationship can be obtained by a knowledge graph mining means.

Also by way of example, assume that in the entity word: and searching the established entity word association relation by Zhouye and blue and white porcelain, and finally finding that the entity word associated with the blue and white porcelain is Zhouyeren but not Zhouye, so that the entity word of Zhouye can be replaced by Zhouyeren.

504. And replacing the original keyword with the corrected keyword.

In an optional embodiment, after the keyword correction is completed, the corrected keyword may replace the original keyword, and the original keyword is restored to the converted text information, so as to correct the text information. In addition, the service program can display the corrected text information on the interface currently displayed by the application program, so that the user can perceive that the voice information of the user is correctly understood. In addition, at the same time, in order to improve the human-computer interaction experience, optionally, the service program may also output voice response information according to the keyword, for example, in the above example, the service program may output voice response information such as "good, play blue and white porcelain immediately for you".

505. And generating an interaction instruction according to the corrected keywords and the user interaction intention information corresponding to the application program.

506. The control application responds to the interactive instructions.

The above generation process of the interactive instruction and the response process of the interactive instruction may refer to the description in the foregoing embodiments, which are not described herein again.

The following describes a process of registering a user interaction intention for an application in connection with the embodiment shown in fig. 6.

Fig. 6 is a flowchart of another interaction method provided in an embodiment of the present invention, and as shown in fig. 6, the method includes the following steps:

601. the service program obtains user interaction intention information corresponding to the application program, wherein the user interaction intention information comprises interaction behavior information and at least one behavior object parameter.

602. And the service program sends the user interaction intention information to the application program or a server corresponding to the application program, so that the application program or the server establishes the corresponding relation between the multiple interfaces and the user interaction intention information according to the data types corresponding to the multiple interfaces contained in the application program and the data types corresponding to the at least one behavior object parameter.

For any interface in the multiple interfaces, if the data type set corresponding to the at least one behavior object parameter includes the data type corresponding to the any interface, establishing a corresponding relationship between the any interface and the user interaction intention information as the user interaction intention information associated with the any interface, wherein the data type set is composed of the data types corresponding to the at least one behavior object parameter.

In an optional embodiment, the application program or the server establishes correspondence between a plurality of interfaces of the application program and the user interaction intention information, and may be implemented as: and registering the user interaction intention information for the plurality of interfaces. Based on this, a certain interface is associated with certain user interaction intention information, which may specifically mean that a certain user interaction intention information is registered on a certain interface.

For example, assuming that an interface is designed to be associated with two data categories of a singer name and a song name at first, and the user interaction intention information of play (the song name, the singer name, the language, the popularity, the genre, and the album name) includes interaction object parameters of the two data categories of the song name and the singer name, the user interaction intention information of play (the song name, the singer name, the language, the popularity, the genre, and the album name) can be registered on the interface.

As mentioned above, the service program may be used by one or more application programs, and the functions provided by different application programs are often different, so that the developer of the application program can set the targeted user interaction intention information for the application program.

Specifically, the service program may store a plurality of kinds of interaction domain information and a plurality of kinds of user interaction intention information that can be supported in each kind of interaction domain, so that a user (the user refers to a developer) may select the interaction domain information and the user interaction intention information according to the functions that can be provided by the application program. The interactive field often reflects the functions that the application can provide, that is, what field of voice interaction can be supported. For example, for a music application, a user may select a music interaction domain; for a navigation application, a user may select a navigation interaction domain. Further, for the field of music interaction, the plurality of user interaction intention information may be provided, for example, including user interaction intention information corresponding to interaction behaviors of playing, pausing playing, muting and the like.

Based on this, the means for the service program to obtain the user interaction intention information corresponding to the application program may be to obtain the user interaction intention information corresponding to the application program according to the selection operation of the developer on the user interaction intention information.

It should be noted that, in another alternative embodiment, the user interaction intention information corresponding to the application program may also be obtained by the application program or the server by learning the voice interaction behaviors of a large number of users in history, and providing the learned user interaction intention information to the service program. The voice samples generated under different interfaces of the application program can be collected respectively, and user interaction intention information corresponding to the different interfaces can be obtained through learning the voice samples.

Fig. 7 is a flowchart of another interaction method provided in an embodiment of the present invention, and as shown in fig. 7, the method includes the following steps:

701. the first voice information is received in response to an operation of starting a voice interaction function of the application program.

702. The control application is responsive to the first voice message.

703. And receiving second voice information.

704. And if the second voice message conforms to the set continuous conversation characteristic, controlling the application program to respond to the second voice message.

The present embodiment may be executed by the service program in the foregoing embodiments, and certainly, the present embodiment is not limited thereto. Such as by the operating system of the electronic device running the application program, or by the application program itself.

The core of the interaction method provided by the embodiment is as follows: and realizing the function of continuous conversation in the process of interacting with the application program in a voice mode by the user.

In this embodiment, it is assumed that the first voice message is an effective voice message that is sent after the user starts the voice interaction function of the application program, and the effective voice message is that the voice message is really a voice output by the user performing voice interaction with the application program.

Based on this, when the next piece of voice information, namely the second voice information, is received subsequently, if the second voice information accords with the continuous conversation characteristic, the second voice information and the previous piece of first voice information are considered to be in a continuous conversation process, and therefore the application program is controlled to respond to the second voice information. Otherwise, if the second voice message does not conform to the continuous conversation feature, an error prompt message can be output to prompt the user that the voice interaction function of the application program needs to be restarted.

Wherein, optionally, the continuous conversation feature may be: the time difference between the receiving time of the second voice message and the receiving time of the first voice message is smaller than the set time length. Or, alternatively, the continuous conversation feature may be: the time difference between the receiving time of the second voice message and the receiving time of the first voice message is smaller than the set time length, and the sound source direction of the second voice message is matched with the sound source direction of the first voice message.

Alternatively, the process of controlling the application program to respond to the first voice message and the second voice message may refer to the process of processing the voice messages in the foregoing.

For parts not described in detail in this embodiment, reference may be made to the related descriptions in the foregoing other embodiments, which are not described herein again.

Fig. 8 is a flowchart of another interaction method provided in an embodiment of the present invention, where the interaction method may be executed by the application or a server corresponding to the application mentioned in the foregoing embodiments. As shown in fig. 8, the method may include the steps of:

801. receiving user interaction intention information which is sent by a service program corresponding to the application program and corresponds to the application program, wherein the user interaction intention information comprises interaction behavior information and at least one behavior object parameter, and the service program provides voice interaction service for the application program.

802. And establishing the corresponding relation between the plurality of interfaces and the user interaction intention information according to the data types corresponding to the plurality of interfaces contained in the application program and the data types corresponding to the at least one behavior object parameter.

Specifically, for any interface in the multiple interfaces, if the data type set corresponding to the at least one behavior object parameter includes the data type corresponding to the any interface, the corresponding relationship between the any interface and the user interaction intention information is established, and the data type set is composed of the data types corresponding to the at least one behavior object parameter.

803. And responding to an interactive instruction generated by the service program, wherein the interactive instruction is generated by the service program according to the keywords extracted from the received voice information and the user interaction intention information corresponding to the application program.

Alternatively, the interactive instruction generated by the response service program may be implemented as: receiving an interactive instruction sent by a service program; determining a target interface which is associated with user interaction intention information corresponding to the interaction instruction and matched with a data category corresponding to the keyword in a plurality of interfaces contained in the application program; and responding to the interactive instruction according to the target interface.

And if the number of the determined target interfaces is at least two, selecting a target interface with a high priority from the at least two target interfaces according to the priorities of the at least two target interfaces.

From the foregoing, in order to implement the interaction method provided by the embodiment of the present invention, a service program providing a voice interaction function for an application program needs to be designed. On a macro level, functionally divided, as shown in fig. 9, the service program may include: an input output interface coupled with the application program and an interaction engine.

The input and output interface coupled with the application program is used for receiving the voice information corresponding to the application program and controlling the application program to respond to the interactive instruction.

As can be seen from the operations performed by the interaction engine, the interaction engine is subdivided, and may include the following functional modules:

an Automatic Speech Recognition (ASR) module and a Natural Language Understanding (NLU) module. Among other things, ASR enables the conversion of speech information into text information. The NLU can correct the text information, extract keywords from the text information, and the like.

In addition, as can be seen from the foregoing embodiments, the service program may also implement a man-machine interaction function with the user, that is, output response information to the user in a manner of voice and interface response, and therefore, optionally, a dialog question-answering system may also be included in the interaction engine.

In addition, in order to realize the communication between the application programs or the servers corresponding to the application programs, the dialogue question answering system or the interaction engine can also comprise a communication interface coupled with the servers.

The interaction means of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these interactive devices may each be constructed using commercially available hardware components configured through the steps taught by the present solution.

Fig. 10 is a schematic structural diagram of an interaction apparatus according to an embodiment of the present invention, and as shown in fig. 10, the apparatus includes: the device comprises a receiving module 11, an extracting module 12, a generating module 13 and a control module 14.

And the receiving module 11 is used for receiving the voice information.

And the extraction module 12 is configured to perform keyword extraction on the voice information.

And the generating module 13 is configured to generate an interaction instruction according to the extracted keyword and the user interaction intention information corresponding to the application program.

And the control module 14 is used for controlling the application program to respond to the interactive instruction.

The generating module 13 may specifically be configured to: and generating an interaction instruction according to the extracted keywords and the registered user interaction intention information in the application program.

Optionally, the extracting module 12 is specifically configured to: and extracting effective keywords from the voice information.

In particular, the extraction module 12 may be configured to: if the time difference between the receiving time of the voice message and the receiving time of the previous voice message is less than the set time length, extracting keywords contained in the voice message as effective keywords; or if the time difference between the receiving time of the voice information and the receiving time of the previous voice information is smaller than the set time length, and the sound source direction of the voice information is matched with the sound source direction of the previous effective voice information, extracting the keywords contained in the voice information as effective keywords.

Optionally, the apparatus may further include: and the noise elimination module is used for removing the interference audio signal in the voice information.

Optionally, the apparatus may further include: and the correction module is used for correcting the extracted keywords according to the established entity word association relation.

Optionally, the generating module 13 may specifically be configured to: determining target user interaction intention information matched with the keywords from user interaction intention information corresponding to the application program; and generating an interaction instruction according to the target user interaction intention information. .

Optionally, the generating module 13 may specifically be configured to: acquiring an interface identifier of a currently displayed interface of the application program; determining whether the interface is associated with target user interaction intention information matched with the keyword or not according to the corresponding relation between the created interface identification and the user interaction intention information; and if the target user interaction intention information matched with the keyword is not associated with the interface, determining the target user interaction intention information matched with the keyword from a user interaction intention information database corresponding to the application program.

The user interaction intention information comprises interaction behavior information and at least one behavior object parameter. Thus, the generating module 13 may be specifically configured to: determining target user interaction intention information which is matched with first words in the keywords and contains interaction behavior information from user interaction intention information corresponding to the application program, wherein the first words are words which indicate interaction behaviors in the keywords; determining a parameter value of at least one behavior object parameter contained in the target user interaction intention information according to at least one second word in the keywords, wherein the keywords are composed of the first word and the at least one second word; and generating the interaction instruction according to the target user interaction intention information with the determined parameter values.

Optionally, the control module 14 may specifically be configured to: sending the interactive instruction to a response object so that the response object responds to the interactive instruction; the response object is the application program or a server corresponding to the application program.

Optionally, the control module 14 may specifically be configured to: acquiring an interface identifier of an interface currently displayed by the application program and data content contained in the interface identifier; and if the interface is determined not to be associated with the target user interaction intention information according to the corresponding relation between the created interface identification and the user interaction intention information, or the data content does not comprise the target data content matched with the keyword, sending the interaction instruction to the response object.

The interactive instruction is used for enabling the response object to determine a target interface supporting the interactive instruction in a plurality of interfaces contained in the application program, and responding to the interactive instruction according to the target interface; the target interface supporting the interactive instruction means that the target interface is associated with the target user interaction intention information corresponding to the interactive instruction, and the data category corresponding to the target interface is matched with the data category corresponding to the keyword.

Optionally, the control module 14 may specifically be configured to: acquiring an interface identifier of an interface currently displayed by the application program and data content contained in the interface identifier; if the interface is determined to be associated with the target user interaction intention information according to the corresponding relation between the created interface identification and the user interaction intention information, and the data content comprises target data content matched with the keyword, generating a response instruction according to the target data content; and sending the response instruction to the application program so as to enable the application program to execute the response instruction.

Wherein, the user interaction intention information comprises at least one behavior object parameter, and based on the parameter, the device further comprises: the creating module is used for sending the user interaction intention information corresponding to the application program or a server corresponding to the application program, so that the application program or the server establishes the corresponding relation between the multiple interfaces and the user interaction intention information according to the data types corresponding to the multiple interfaces contained in the application program and the data types corresponding to the at least one behavior object parameter.

The apparatus shown in fig. 10 may perform the interaction method provided in the embodiments shown in fig. 1 to fig. 5, and a part not described in detail in this embodiment may refer to the related description of the foregoing embodiments, which is not described herein again.

In one possible design, the structure of the interaction device shown in fig. 10 may be implemented as an electronic device, which may be, for example, a mobile phone, a PC, a notebook computer, a smart wearable device, or the like. As shown in fig. 11, the electronic device may include: a first processor 21, a first memory 22. Wherein the first memory 22 has stored thereon executable code, which when executed by the first processor 21, makes the first processor 21 at least to perform the interaction method as provided in the embodiments of fig. 1 to 5.

In practice, the electronic device may also include a first communication interface 23 for communicating with other devices.

In addition, the embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to perform at least the interaction method in the embodiments shown in fig. 1 to 5.

Fig. 12 is a schematic structural diagram of another interaction apparatus according to an embodiment of the present invention, and as shown in fig. 12, the apparatus includes: an acquisition module 31 and a sending module 32.

The obtaining module 31 is configured to obtain user interaction intention information corresponding to an application, where the user interaction intention information includes interaction behavior information and at least one behavior object parameter.

A sending module 32, configured to send the user interaction intention information to the application program or a server corresponding to the application program, so that the application program or the server establishes a correspondence between the multiple interfaces and the user interaction intention information according to data categories corresponding to the multiple interfaces included in the application program and data categories corresponding to the at least one behavior object parameter.

For any interface in the plurality of interfaces, if a data type set corresponding to the at least one behavior object parameter includes a data type corresponding to the any interface, establishing a corresponding relationship between the any interface and the user interaction intention information, where the data type set is composed of data types corresponding to the at least one behavior object parameter.

The apparatus shown in fig. 12 may perform the interaction method provided in the embodiment shown in fig. 6, and a part not described in detail in this embodiment may refer to the related description of the foregoing embodiment, which is not described again here.

In one possible design, the structure of the interaction device shown in fig. 12 may be implemented as an electronic device, which may be, for example, a mobile phone, a PC, a notebook computer, a smart wearable device, or the like. As shown in fig. 13, the electronic device may include: a second processor 41, a second memory 42. Wherein the second memory 42 has stored thereon executable code, which when executed by the second processor 41, causes the second processor 41 to at least perform the interaction method as provided in the embodiment of fig. 6 described above.

In practice, the electronic device may also include a second communication interface 43 for communicating with other devices.

In addition, the embodiment of the present invention provides a non-transitory machine-readable storage medium, on which executable code is stored, and when the executable code is executed by a processor of an electronic device, the processor is caused to perform at least the interaction method in the embodiment shown in fig. 6.

Fig. 14 is a schematic structural diagram of another interaction apparatus according to an embodiment of the present invention, as shown in fig. 14, the apparatus includes: a receiving module 51 and a control module 52.

The receiving module 51 is configured to receive the first voice message in response to an operation of starting a voice interaction function of the application program.

A control module 52 for controlling the application program to respond to the first voice message.

The receiving module 51 is further configured to receive second voice information.

The control module 52 is further configured to control the application program to respond to the second voice message if the second voice message meets the set continuous session characteristic.

Wherein the continuous conversation feature comprises: the time difference between the receiving time of the second voice message and the receiving time of the first voice message is less than the set time length; or the time difference between the receiving time of the second voice message and the receiving time of the first voice message is smaller than the set time length, and the sound source direction of the second voice message is matched with the sound source direction of the first voice message.

The apparatus shown in fig. 14 may perform the interaction method provided in the embodiment shown in fig. 7, and a part not described in detail in this embodiment may refer to the related description of the foregoing embodiment, which is not described again here.

In one possible design, the structure of the interaction device shown in fig. 14 may be implemented as an electronic device, which may be, for example, a mobile phone, a PC, a notebook computer, a smart wearable device, or the like. As shown in fig. 15, the electronic device may include: a third processor 61, a third memory 62. Wherein the third memory 62 has stored thereon executable code, which when executed by the third processor 61, makes the third processor 61 at least perform the interaction method as provided in the embodiment of fig. 7 described above.

In practice, the electronic device may also include a third communication interface 63 for communicating with other devices.

In addition, the embodiment of the present invention provides a non-transitory machine-readable storage medium, on which executable code is stored, and when the executable code is executed by a processor of an electronic device, the processor is caused to perform at least the interaction method in the embodiment shown in fig. 7.

Fig. 16 is a schematic structural diagram of another interaction apparatus according to an embodiment of the present invention, as shown in fig. 16, the apparatus includes: a receiving module 71 and a creating module 72.

The receiving module 71 is configured to receive user interaction intention information corresponding to an application program, where the user interaction intention information includes interaction behavior information and at least one behavior object parameter, and the service program provides a voice interaction service for the application program.

The creating module 72 is configured to establish a correspondence between the multiple interfaces and the user interaction intention information according to the data categories corresponding to the multiple interfaces included in the application program and the data categories corresponding to the at least one behavior object parameter.

Optionally, the creating module 72 may specifically be configured to: for any interface in the plurality of interfaces, if the data type set corresponding to the at least one behavior object parameter comprises the data type corresponding to the any interface, establishing a corresponding relation between the any interface and the user interaction intention information, wherein the data type set is composed of the data types corresponding to the at least one behavior object parameter.

Optionally, the apparatus may further include: and the response processing module is used for responding to an interactive instruction generated by the service program, and the interactive instruction is generated by the service program according to the keyword extracted from the received voice information and the user interaction intention information corresponding to the application program.

Wherein the response processing module may be configured to: receiving the interactive instruction sent by the service program; determining a target interface which is associated with user interaction intention information corresponding to the interaction instruction and matched with a data category corresponding to the keyword in a plurality of interfaces contained in the application program; and responding to the interaction instruction according to the target interface.

Wherein the response processing module is further configured to: and if the number of the determined target interfaces is at least two, selecting a target interface with a high priority from the at least two target interfaces according to the priorities of the at least two target interfaces.

The apparatus shown in fig. 16 may perform the interaction method provided in the embodiment shown in fig. 8, and a part not described in detail in this embodiment may refer to the related description of the foregoing embodiment, which is not described again here.

In one possible design, the structure of the interaction device shown in fig. 16 may be implemented as an electronic device, which may be, for example, a mobile phone, a PC, a notebook computer, a smart wearable device, a server, or the like. As shown in fig. 17, the electronic device may include: a fourth processor 81, and a fourth memory 82. Wherein the fourth memory 82 has stored thereon executable code, which when executed by the fourth processor 81, makes the fourth processor 81 at least perform the interaction method as provided in the embodiment of fig. 8.

In practice, the electronic device may also include a fourth communication interface 83 for communicating with other devices.

In addition, the embodiment of the present invention provides a non-transitory machine-readable storage medium, on which executable code is stored, and when the executable code is executed by a processor of an electronic device, the processor is caused to perform at least the interaction method in the embodiment shown in fig. 8.

The above-described apparatus embodiments are merely illustrative, wherein the units described as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An interactive method, characterized in that the method comprises:

receiving voice information;

extracting keywords from the voice information;

2. The method according to claim 1, wherein the generating of the interactive instruction according to the extracted keyword and the user interaction intention information corresponding to the application program comprises:

and generating an interaction instruction according to the extracted keywords and the registered user interaction intention information in the application program.

3. The method of claim 1, wherein the extracting the keywords from the voice message comprises:

and extracting effective keywords from the voice information.

4. The method of claim 3, wherein the extracting the effective keywords from the voice information comprises:

if the time difference between the receiving time of the voice message and the receiving time of the previous voice message is less than the set time length, extracting keywords contained in the voice message as effective keywords; alternatively, the first and second electrodes may be,

and if the time difference between the receiving time of the voice information and the receiving time of the previous voice information is less than the set time length, and the sound source direction of the voice information is matched with the sound source direction of the previous effective voice information, extracting the keywords contained in the voice information as effective keywords.

5. The method of claim 1, further comprising:

and removing the interference audio signal in the voice information.

6. The method of claim 1, wherein after the extracting the keywords from the voice message, further comprising:

and correcting the extracted keywords according to the established entity word association relation.

7. The method according to claim 1, wherein the generating of the interactive instruction according to the extracted keyword and the user interaction intention information corresponding to the application program comprises:

determining target user interaction intention information matched with the keywords from user interaction intention information corresponding to the application program;

and generating an interaction instruction according to the target user interaction intention information.

8. The method of claim 7, wherein the determining target user interaction intention information matching the keyword from the user interaction intention information corresponding to the application program comprises:

acquiring an interface identifier of a currently displayed interface of the application program;

determining whether the interface is associated with target user interaction intention information matched with the keyword or not according to the corresponding relation between the created interface identification and the user interaction intention information;

and if the target user interaction intention information matched with the keyword is not associated with the interface, determining the target user interaction intention information matched with the keyword from a user interaction intention information database corresponding to the application program.

9. The method according to claim 7, wherein the user interaction intention information comprises interaction behavior information and at least one behavior object parameter;

the determining target user interaction intention information matched with the keyword from the user interaction intention information corresponding to the application program comprises the following steps:

determining target user interaction intention information which is matched with first words in the keywords and contains interaction behavior information from user interaction intention information corresponding to the application program, wherein the first words are words which indicate interaction behaviors in the keywords;

the generating of the interaction instruction according to the target user interaction intention information comprises:

determining a parameter value of at least one behavior object parameter contained in the target user interaction intention information according to at least one second word in the keywords, wherein the keywords are composed of the first word and the at least one second word;

and generating the interaction instruction according to the target user interaction intention information with the determined parameter values.

10. The method of claim 7, wherein controlling the application to respond to the interactive instructions comprises:

sending the interactive instruction to a response object so that the response object responds to the interactive instruction;

the response object is the application program or a server corresponding to the application program.

11. The method of claim 10, wherein sending the interaction instruction to a response object comprises:

acquiring an interface identifier of an interface currently displayed by the application program and data content contained in the interface identifier;

and if the interface is determined not to be associated with the target user interaction intention information according to the corresponding relation between the created interface identification and the user interaction intention information, or the data content does not comprise the target data content matched with the keyword, sending the interaction instruction to the response object.

12. The method according to claim 11, wherein the interactive instruction is configured to enable the response object to determine a target interface supporting the interactive instruction among a plurality of interfaces included in the application program, and to respond to the interactive instruction according to the target interface;

the target interface supporting the interactive instruction means that the target interface is associated with the target user interaction intention information corresponding to the interactive instruction, and the data category corresponding to the target interface is matched with the data category corresponding to the keyword.

13. The method of claim 7, wherein controlling the application to respond to the interactive instructions comprises:

if the interface is determined to be associated with the target user interaction intention information according to the corresponding relation between the created interface identification and the user interaction intention information, and the data content comprises target data content matched with the keyword, generating a response instruction according to the target data content;

and sending the response instruction to the application program so as to enable the application program to execute the response instruction.

14. The method according to any one of claims 1 to 13, wherein the user interaction intention information includes at least one behavior object parameter, and the method further comprises:

and sending the user interaction intention information corresponding to the application program or a server corresponding to the application program, so that the application program or the server establishes the corresponding relation between the multiple interfaces and the user interaction intention information according to the data types corresponding to the multiple interfaces contained in the application program and the data types corresponding to the at least one behavior object parameter.

15. The method according to any one of claims 1 to 13, further comprising:

and outputting information for guiding a user to use the voice interaction function in response to an operation of starting the voice interaction function of the application program.

16. The method of claim 15, wherein the operations comprise: and clicking a set button in the application program, or receiving voice containing a set awakening word.

17. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the interaction method of any one of claims 1 to 16.

18. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the interaction method of any one of claims 1 to 16.

19. A service program, wherein the service program provides a voice interaction service for an application program, and the service program comprises:

20. An interaction method, comprising:

21. The method according to claim 20, wherein for any interface in the plurality of interfaces, if a data category set corresponding to the at least one behavior object parameter includes a data category corresponding to the any interface, a correspondence between the any interface and the user interaction intention information is established, and the data category set is composed of data categories corresponding to the at least one behavior object parameter.

22. An interaction method, comprising:

controlling the application program to respond to the first voice information;

receiving second voice information;

23. The method of claim 22, wherein the continuous conversation feature comprises:

the time difference between the receiving time of the second voice message and the receiving time of the first voice message is less than the set time length, or,

the time difference between the receiving time of the second voice message and the receiving time of the first voice message is smaller than the set time length, and the sound source direction of the second voice message is matched with the sound source direction of the first voice message.

24. An interaction method, comprising:

25. The method according to claim 24, wherein the establishing correspondence between the plurality of interfaces and the user interaction intention information according to the data categories corresponding to the interfaces included in the application program and the data categories corresponding to the at least one behavior object parameter comprises:

for any interface in the plurality of interfaces, if the data type set corresponding to the at least one behavior object parameter comprises the data type corresponding to the any interface, establishing a corresponding relation between the any interface and the user interaction intention information, wherein the data type set is composed of the data types corresponding to the at least one behavior object parameter.

26. The method of claim 24, further comprising:

and responding to an interactive instruction generated by the service program, wherein the interactive instruction is generated by the service program according to the keyword extracted from the received voice information and the user interaction intention information corresponding to the application program.

27. The method of claim 26, wherein responding to the generated instructions of the service comprises:

receiving the interactive instruction sent by the service program;

determining a target interface which is associated with user interaction intention information corresponding to the interaction instruction and matched with a data category corresponding to the keyword in a plurality of interfaces contained in the application program;

and responding to the interaction instruction according to the target interface.

28. The method of claim 27, further comprising:

29. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the interaction method of any of claims 24 to 28.

30. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the interaction method of any one of claims 24 to 28.