CN113488033A

CN113488033A - User passive voice interaction method, device, terminal, server and medium

Info

Publication number: CN113488033A
Application number: CN202010188104.5A
Authority: CN
Inventors: 黄佳滢
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Apollo Zhilian Beijing Technology Co Ltd
Priority date: 2020-03-17
Filing date: 2020-03-17
Publication date: 2021-10-08

Abstract

The application discloses a user passive voice interaction method, a device, a terminal, a server and a medium, and relates to the technical field of voice recognition. The specific implementation scheme is as follows: acquiring state information of a terminal, wherein the state information comprises service providing state information and/or page display state information; if the state information of the terminal meets the set triggering condition, determining target guide content matched with the state information of the terminal; and actively playing the voice information of the target guide content. The embodiment can actively play the voice information of the guide content to the user, improves the flexibility and intelligence of the voice interaction mode, and improves the user experience.

Description

User passive voice interaction method, device, terminal, server and medium

Technical Field

The embodiment of the application relates to the computer technology, in particular to the technical field of voice recognition.

Background

Existing terminals are generally developed with a voice interaction function, and provide services such as music, news, navigation and the like through conversation with a user.

The existing voice interaction process of the terminal is as follows: the user utters a voice, such as "i want to listen to music", "i want to navigate home"; and after recognizing the voice as a text, the terminal analyzes the intention of the user so as to match with the appropriate recommended content and play the recommended content.

As described above, the existing voice interaction mode requires that the user actively sends out the query voice, and the terminal will respond to the query voice. It can be seen that the existing voice interaction needs to be actively initiated by the user, which may be referred to as user active voice interaction or terminal passive voice interaction. Obviously, the voice interaction mode is rigid, and the user experience has a rising space.

Disclosure of Invention

The embodiment of the application provides a user passive voice interaction method, a device, a terminal, a server and a medium, so as to provide a user passive voice interaction mode, improve the flexibility and intelligence of the voice interaction mode and improve the user experience.

In a first aspect, an embodiment of the present application discloses a user passive voice interaction method, which is applicable to a terminal, and includes:

acquiring state information of a terminal, wherein the state information comprises service providing state information and/or page display state information;

if the state information of the terminal meets the set triggering condition, determining target guide content matched with the state information of the terminal;

and actively playing the voice information of the target guide content.

In the embodiment of the application, the state information of the terminal comprises service providing state information and/or page display state information, the service providing state information and/or the page display state information can reflect the service consumption condition of a user and a favored terminal page from the side, and the requirements and interest points of the user are reflected; when the state information meets the set triggering condition, the target guiding content matched with the state information is determined, so that the target guiding content interested by the user is accurately determined, and the determination time of the target guiding content can be flexibly controlled by setting the triggering condition; after the target guide content is determined, the voice information of the target guide content is actively played to the user, namely, the terminal actively initiates voice interaction instead of the user, so that the flexibility and intelligence of a voice interaction mode are improved, and the user experience is improved.

Optionally, if the state information of the terminal meets the set trigger condition, determining the target guidance content matched with the state information of the terminal includes:

if the service providing state information comprises the set service content in the providing, determining the target guide content matched with the set service content in the providing; and/or the presence of a gas in the gas,

and if the page display state information comprises the currently displayed set page content, determining the target guide content matched with the currently displayed set page content.

One embodiment in the above application has the following advantages or benefits: because the service content being provided, namely the service content being consumed by the user reflects the current requirement of the user, the target guide content is matched with the service content being provided by the terminal at present, the guide content is positioned as required, and the acceptance and satisfaction of the user to the target guide content can be improved. Similarly, the currently displayed page content is also the page being viewed by the user, and the current requirements of the user are also reflected, so that the target guide content is matched with the page content, and the acceptance and satisfaction of the user on the target guide content can be improved.

if the state information of the terminal meets the set triggering condition, determining the service to which the set triggering condition belongs;

and acquiring target guide content matched with the state information of the terminal from the service to which the set trigger condition belongs.

One embodiment in the above application has the following advantages or benefits: the embodiment organizes the guide content by taking the service as a unit, so that the embodiment can provide the guide content of various services for the user, and improves the diversity of the guide content; different services are provided with corresponding trigger conditions, if the state information meets the set trigger conditions, the service to which the set trigger conditions belong needs to be determined at first, then the target guide content is obtained from the service, the matching degree of the state information and the target guide content is improved, and then the acceptance and satisfaction of the user on the target guide content are improved.

Optionally, the obtaining, from the service to which the set trigger condition belongs, the target guidance content matched with the state information of the terminal includes:

if the number of the services to which the set triggering conditions belong is at least two, determining a target service according to the priorities of the at least two services;

and acquiring target guide content matched with the state information of the terminal from the target service.

One embodiment in the above application has the following advantages or benefits: the method comprises the steps that the priority is set for the service, and the target service and the target guide content are determined according to the priority of the service, so that the guide content under the service dimension is screened according to the priority; and if the types of the guide contents of different services are different, screening out a certain type of guide contents for preferential playing through the guide content screening under the service dimensionality.

Optionally, the actively playing the voice information of the target guidance content includes:

acquiring current voice interaction information, and judging a current voice interaction type according to the current voice interaction information, wherein the current voice interaction information comprises current guide content;

if the current voice interaction type is the passive voice interaction of the user, acquiring the source service of the current guide content;

and if the priority of the source service of the current guide content is lower than that of the source service of the target guide content, stopping the current voice interaction operation and actively playing the voice information of the target guide content.

One embodiment in the above application has the following advantages or benefits: when the current voice interaction type is judged to be the passive voice interaction of the user, the real-time performance of the actively played voice is improved by judging the priority of the source service of the target guide content and the current guide content; meanwhile, the guide contents from different services can be intelligently adjusted according to the requirements of the user by setting the priority of the source service, so that the service use experience of the user is improved.

Optionally, after the determining the current voice interaction type according to the current voice interaction information, the method further includes:

and if the current voice interaction type is the user active voice interaction, continuing to execute the current voice interaction operation.

One embodiment in the above application has the following advantages or benefits: when the current voice interaction type is the user active voice interaction type, the current active voice interaction operation of the user is protected, continuous execution is carried out, the intellectualization of the terminal for actively playing the guide voice to the user is improved, and the interference to the normal voice interaction of the user caused by actively playing the voice is avoided.

In a second aspect, an embodiment of the present application discloses a passive voice interaction method, which is applied to a server, and includes:

acquiring state information of a terminal from the terminal, wherein the state information comprises service providing state information and/or page display state information;

and sending the target guide content to the terminal so that the terminal can actively play the voice information of the target guide content.

In the embodiment of the application, the server acquires the state information sent by the terminal, the state information of the terminal comprises service providing state information and/or page display state information, the service providing state information and/or the page display state information can reflect the service consumption condition of a user and a favored terminal page from the side, and the requirements and interest points of the user are reflected; when the server meets the set triggering condition through the terminal state information, the target guiding content matched with the terminal state information is determined, so that the server accurately determines the target guiding content interested by the user, and the determination time of the target guiding content can be flexibly controlled through setting the triggering condition; after the target guiding content is determined, the target guiding content is actively sent to the terminal, and the interactivity between the terminal and the user is ensured.

In a third aspect, an embodiment of the present application discloses a passive voice interaction method, which is applicable to a terminal, and includes:

sending state information of a terminal to a server so that the server can judge that the state information of the terminal meets a set trigger condition, determining target guide content matched with the state information of the terminal, and returning the target guide content; the state information comprises service providing state information and/or page display state information;

receiving target guide content returned by the server;

and actively playing the voice information of the target guide content.

In the embodiment of the application, the terminal sends the state information of the terminal to the server so as to obtain target guide content determined by the server according to the state information of the terminal from the server, wherein the target guide content reflects the requirements and interest points of a user; when the terminal receives the target guide content, the server judges that the set triggering condition is reached, the flexibility of the timing for determining the target guide content is realized; the terminal actively plays the voice information of the target guide content to the user after determining the target guide content, namely the terminal actively initiates voice interaction instead of the user, so that the flexibility and the intelligence of a voice interaction mode are improved, and the user experience is improved.

Optionally, before the actively playing the voice information of the target guidance content, the method further includes: if the number of the target guide contents is at least two, and the target guide contents are sourced from at least two servers;

determining a target server according to the priorities of the at least two servers;

and screening the target guide content from the target server from at least two target guide contents.

One embodiment in the above application has the following advantages or benefits: the method comprises the steps that the priority is set for a server, and a target server and target guide content are determined according to the priority of the server, so that the guide content under the server dimension is screened according to the priority; and if the types of the guide contents of different servers are different, screening out a certain type of guide contents for preferential playing through the guide content screening under the server dimensionality.

In a fourth aspect, an embodiment of the present application discloses a passive voice interaction apparatus, which is applicable to a terminal, and includes:

the system comprises a state information acquisition module, a state information acquisition module and a display module, wherein the state information acquisition module is used for acquiring the state information of a terminal, and the state information comprises service providing state information and/or page display state information;

the target guiding content determining module is used for determining the target guiding content matched with the state information of the terminal if the state information of the terminal meets the set triggering condition;

and the voice information active playing module is used for actively playing the voice information of the target guide content.

In a fifth aspect, an embodiment of the present application discloses a passive voice interaction apparatus, which is applied to a server, and includes:

the system comprises a state information acquisition module, a state information acquisition module and a display module, wherein the state information acquisition module is used for acquiring the state information of a terminal from the terminal, and the state information comprises service providing state information and/or page display state information;

and the target guiding content sending module is used for sending the target guiding content to the terminal so that the terminal can actively play the voice information of the target guiding content.

In a sixth aspect, an embodiment of the present application discloses a passive voice interaction apparatus, which is suitable for a terminal, and includes:

the state information sending module is used for sending the state information of the terminal to a server so that the server can judge that the state information of the terminal meets a set trigger condition, determine target guide content matched with the state information of the terminal and return the target guide content; the state information comprises service providing state information and/or page display state information;

the target guiding content receiving module is used for receiving the target guiding content returned by the server;

In a seventh aspect, an embodiment of the present application discloses a terminal, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as defined by embodiments of the first and third aspects of the application.

In an eighth aspect, an embodiment of the present application discloses a server, including:

at least one processor; and

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a method according to a second embodiment of the present application.

In a ninth aspect, embodiments of the present application disclose a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to perform a method according to any of the embodiments of the present application.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic flowchart of a passive voice interaction method according to a first embodiment of the present application;

FIG. 2 is a flowchart illustrating a passive voice interaction method according to a second embodiment of the present application;

FIG. 3 is a flowchart illustrating a passive voice interaction method according to a third embodiment of the present application;

FIG. 4 is a flowchart illustrating a passive voice interaction method according to a fourth embodiment of the present application;

fig. 5 is a schematic structural diagram of a passive voice interaction apparatus according to a fifth embodiment of the present application;

FIG. 6 is a schematic structural diagram of a passive speech interaction device according to a sixth embodiment of the present application

Fig. 7 is a schematic structural diagram of a passive voice interaction apparatus according to a seventh embodiment of the present application;

fig. 8 is a block diagram of a terminal or a server for implementing the passive voice interaction method according to the ninth embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

First embodiment

Fig. 1 is a schematic flowchart of a passive voice interaction method according to a first embodiment of the present application, where the present embodiment is suitable for a case where a terminal performs voice interaction with a user. The method can be executed by a passive voice interaction device, the device can be realized in a software and/or hardware mode, and can be integrated in a terminal, and the terminal can independently execute the passive voice interaction method of the embodiment of the invention. As shown in fig. 1, the passive voice interaction method provided in this embodiment may include:

s110, acquiring state information of the terminal, wherein the state information comprises service providing state information and/or page display state information.

The terminal is a device with a voice interaction function, such as a smart phone, a car machine, or both the smart phone and the car machine. When the terminal comprises the smart phone and the car machine, the car machine can be used as a recorder of the smart phone.

When the state information of the terminal is acquired, particularly, for a smart phone or a car machine, the state information of the smart phone or the car machine can be directly acquired; for the smart phone and the car machine, the smart phone can be used as an execution main body to acquire the state information of the smart phone, and the state information of the smart phone is acquired through communication connection with the smart phone; the vehicle machine can also be used as an execution main body to acquire the state information of the vehicle machine, and the state information of the vehicle machine is acquired through the communication connection with the vehicle machine.

In this embodiment, the status information includes service provision status information and/or page display status information. The service providing state information reflects the current service providing situation of the terminal, and may include a providing state, a non-providing state of the service, and a providing service content. Illustratively, the terminals include music services, map services, and news services. The service providing state information includes that the music service is in a providing state, the service content is that the song of the A singer is being played, and the map service and the news service are in a non-providing state. The page display state information includes whether a page is displayed and the content of the currently displayed page. Illustratively, the terminal displays a page, and the content of the displayed page includes music being played, navigation route or news, and the like.

And S120, if the state information of the terminal meets the set triggering condition, determining the target guiding content matched with the state information of the terminal.

The setting of the trigger condition refers to setting a condition related to the terminal state information in advance, and optionally, the setting of the trigger condition may be that the service is in a providing state and the set service content is provided, and/or that the page is displayed and the set page content is displayed. For example, setting the trigger condition includes: the music service is in the providing state and is playing a nostalgic type song, or the map service is in the providing state and the current time meets the commuting time point, or the news service is in the providing state and has the latest news release. Setting the trigger condition may further include: display lyric pages, display desktops, display map pages, and the like. The specific trigger condition may be set according to the actual service condition in the terminal and according to the requirement of the user, which is not limited herein. The target guidance content refers to service content that the terminal determines according to the state information and is likely to be interested by the user who is guided by the user, for example, whether a certain song of a certain singer is played or not is guided to the user.

Optionally, the number of trigger conditions is set to at least one. Specifically, whether the service providing state information and/or the page display state information in the current state information meets at least one set triggering condition is judged, and if one set triggering condition is met, the target guidance content matched with the current state information can be determined.

Optionally, if the state information of the terminal meets the set trigger condition, determining the target guidance content matched with the state information of the terminal includes: if the service providing state information comprises the set service content in the providing, determining the target guide content matched with the set service content in the providing; and/or if the page display state information comprises the currently displayed set page content, determining the target guide content matched with the currently displayed set page content.

Specifically, if the service provision state information includes the set service content in provision, the state information of the terminal satisfies the set trigger condition, and further determines the target guidance content matching the set service content in provision. For example, if the service content being provided is video news in a news service, the guidance content of another news matching the video news is determined as the target guidance content, and the guidance content of another news is, for example, "whether to play another news".

If the page display state information comprises the currently displayed set page content, the state information of the terminal meets the set triggering condition, and then the target guide content matched with the currently displayed set page content is determined. For example, if the currently displayed set page content is a city map, the guidance content of the navigation route matching the city map is determined as the target guidance content, and the guidance content of the navigation route is, for example, "whether or not to navigate the route". For another example, the currently displayed content of the set page is the home page of the map service, and when the current time point is the commute time point, the target guidance content matched with the home page of the map service is determined to be "whether the navigation destination is set as a company", so that when the user opens the map service, the terminal autonomously determines the guidance destination according to the current page state information without actively determining the destination, and takes the guidance destination as the target guidance content, thereby improving the intelligent experience and saving the time of the user.

If the service providing state information includes the setting service content being provided, and the page display state information includes the setting page content being currently displayed, the target guidance content matching with the setting service content being provided or the setting page content being currently displayed may be determined.

Optionally, if the state information of the terminal meets the set trigger condition, determining the target guidance content matched with the state information of the terminal includes: if the state information of the terminal meets the set triggering condition, determining the service to which the set triggering condition belongs; and acquiring target guide content matched with the state information of the terminal from the service to which the set trigger condition belongs.

The present embodiment organizes the guide content in units of services, for example, the guide content of the music service includes "whether to play a song of a singer", and the guide content of the map service includes "whether to set a navigation destination as a company". The embodiment can provide the guide content of various services for the user, and improves the diversity of the guide content. Different services are provided with corresponding trigger conditions, if the state information meets the set trigger conditions, the service to which the set trigger conditions belong needs to be determined firstly, and then target guide content is obtained from the service, for example, the currently displayed set page content is a city map, and the map service to which the set trigger conditions belong, the guide content of a navigation route matched with the city map is obtained from the map service as the target guide content. As can be seen, the guiding content of the navigation route is more matched with the page content of a city map. If guidance content matching a city map, such as "whether or not to listen to a song of a city map", is acquired from a music service, it is apparently not matched with the page content of a city map. Therefore, the matching degree of the state information and the target guide content can be improved, and the acceptance and satisfaction of the user on the target guide content are further improved; meanwhile, the situation that the target guide content is matched with other information of the non-affiliated service in the state information of the terminal is avoided, and the reasonability of the target guide content is guaranteed.

Optionally, the obtaining, from the service to which the set trigger condition belongs, the target guidance content matched with the state information of the terminal includes: if the number of the services to which the set triggering conditions belong is at least two, determining a target service according to the priorities of the at least two services; and acquiring target guide content matched with the state information of the terminal from the target service.

The priority of the service refers to the result of differentiating the importance of the service according to the demand of the user for the service. The priority of the service can be determined according to the self-property of the service, and can also be set according to the personalized requirements of the user. Illustratively, based on the above example, the priority of the service may be that the map service takes precedence over the news service, which takes precedence over the music service. The target service is the service with the highest priority in the services which meet the set triggering condition, and indicates that the attention of the user to the service is the highest.

Specifically, since a plurality of trigger conditions are set, it is determined that at least two of the satisfied trigger conditions may be satisfied simultaneously according to the state information of the terminal, and at least two of the determined services to which the satisfied trigger conditions belong are also satisfied. In this case, the service with the highest priority among the at least two services is set as the target service according to the priority order of the predetermined services. And acquiring target guide content matched with the state information, such as 'whether to navigate the route' from the target service with the highest priority.

The method comprises the steps that the priority is set for the service, and the target service and the target guide content are determined according to the priority of the service, so that the guide content under the service dimension is screened according to the priority; if the types of the guide contents of different services are different, screening out a certain type of guide contents for preferential playing through the guide content screening under the service dimensionality; moreover, the requirement degree of the service to which the target guide content belongs to the user is ensured through the determination of the service priority, and the satisfaction degree of the user to the target guide content and the intelligent degree of the target guide content are improved.

And S130, actively playing the voice information of the target guide content.

Wherein, the terminal actively plays compared with the prior art that the terminal passively plays. The terminal passive playing refers to playing in response to a user's awakening word or other voice information with intention; in contrast, the terminal actively plays the voice without recognizing a user's wakeup word or other intentional voice information. Optionally, after the target guidance content is determined, voice synthesis is performed on the target guidance content to obtain voice information, and the voice information of the target guidance content is played through a voice player.

Optionally, after actively playing the voice message of the target guidance content, the method further includes: and receiving feedback information of the user, and responding according to the feedback information. The feedback information of the user comprises positive feedback, negative feedback, no feedback and instruction feedback.

Specifically, positive feedback refers to receiving a positive response from the user, such as determining, through speech recognition, that the received voice of the user includes words with positive intentions such as yes, play, and the like. Negative feedback refers to receiving a response from the user's opposite side, such as determining by speech recognition that the received user's speech includes words that are not intended positively, played negatively, or not. The non-feedback means that voice information related to the target guidance content is not received, such as that the voice of the user is not received within a set time, or no related information is received through voice recognition. The instruction feedback refers to receiving an instruction of executing other commands which are irrelevant to the target guide content.

After the user receives the voice information of the target guide content played by the terminal, if the terminal receives the positive feedback of the user, the service instruction of the target guide content is executed, if the latest news issue exists in the playing process, and if the playing instruction of the user is received, the voice information of the latest news is played by the terminal. And if the terminal receives negative feedback or no feedback of the user, the terminal keeps the current state and does not respond. And if the terminal receives instruction feedback of the user, executing the content fed back by the instruction, and if the target guide content is a news service and the voice information fed back by the user is an instruction for executing music playing service, opening the music service by the terminal to play music. The target guide content is actively played through the terminal according to the state information, and under the vehicle-mounted condition, the user does not need to actively operate, so that the safety can be greatly improved. And the terminal actively inquires about the user, so that the interaction between the terminal and the user is more natural, the terminal can understand the behavior of the user, and the intelligent experience of the user on the terminal is improved.

According to the technical scheme provided by the embodiment of the application, the state information of the terminal comprises service providing state information and/or page display state information, the service consuming condition of a user and a favored terminal page can be reflected from the side by the two kinds of information, and the requirements and interest points of the user are reflected; when the state information meets the set triggering condition, the target guiding content matched with the state information is determined, so that the target guiding content interested by the user is accurately determined, and the determination time of the target guiding content can be flexibly controlled by setting the triggering condition; after the target guide content is determined, the voice information of the target guide content is actively played to the user, namely, the terminal actively initiates voice interaction instead of the user, so that the flexibility and intelligence of a voice interaction mode are improved, and the user experience is improved.

Furthermore, as the service content being provided, namely the service content being consumed by the user, reflects the current requirement of the user, the target guide content is matched with the service content being provided by the terminal at present, the guide content is positioned as required, and the acceptance and satisfaction of the user to the target guide content can be improved. Similarly, the currently displayed page content is also the page being viewed by the user, and the current requirements of the user are also reflected, so that the target guide content is matched with the page content, and the acceptance and satisfaction of the user on the target guide content can be improved.

Second embodiment

Fig. 2 is a flowchart of a passive speech interaction method in the second embodiment of the present application, and the second embodiment of the present application is optimized based on the technical solutions of the foregoing embodiments.

Optionally, the operation "actively playing the voice information of the target guidance content" is refined to "obtaining current voice interaction information, and the current voice interaction type is judged according to the current voice interaction information, wherein the current voice interaction information includes the current guidance content; if the current voice interaction type is the passive voice interaction of the user, acquiring the source service of the current guide content; and if the priority of the source service of the current guide content is lower than that of the source service of the target guide content, stopping the current voice interaction operation and actively playing the voice information of the target guide content so as to improve the service use experience of the user.

Optionally, after the operation "determining the current voice interaction type according to the current voice interaction information", additionally "if the current voice interaction type is the user active voice interaction, continuing to execute the current voice interaction operation", so as to avoid interference on the user normal voice interaction caused by actively playing the voice.

A passive voice interaction method as shown in fig. 2, comprising:

s210, acquiring state information of the terminal, wherein the state information comprises service providing state information and/or page display state information.

And S220, if the state information of the terminal meets the set triggering condition, determining the target guiding content matched with the state information of the terminal.

S230, obtaining current voice interaction information, and judging a current voice interaction type according to the current voice interaction information, wherein the current voice interaction information comprises current guide content.

The current voice interaction information comprises user voice information sent by a user and terminal voice information sent by a terminal. The current voice interaction type is an active initiator for representing the current voice playing content, for example, the current voice interaction type includes user active voice interaction and user passive voice interaction. The user active voice interaction refers to a voice instruction actively initiated by a user, for example, a map service actively initiated by the user, or a call interaction initiated by the user. The passive voice interaction of the user refers to a voice instruction initiated by the terminal actively, such as voice information of the target guidance content, or voice information responded by positive feedback of the user obtained after the terminal plays the voice information of the target guidance content.

Optionally, the determining the current voice interaction type according to the current voice interaction information includes any one of the following: 1) and if the current voice interaction information is initiated by the user, namely comprises a wakeup word, judging that the current voice interaction type is the active voice interaction of the user. 2) And acquiring the intention of the user voice information in the current voice interaction information and the intention of the terminal voice information, and if the intention of the user voice information in the current voice interaction information and the intention of the terminal voice information are not consistent, judging that the current voice interaction type is the active voice interaction of the user. 3) And acquiring the current voice interaction information, namely the current voice interaction information is initiated by the terminal, namely the current voice interaction information does not comprise a wakeup word, the first sentence of the current voice interaction information is the terminal voice information, the intention of the user voice information in the current voice interaction information is consistent with the intention of the terminal voice information, and the current voice interaction type is judged to be the passive voice interaction of the user.

The user initiation means that the user actively initiates a voice instruction to the terminal through a specific word. Illustratively, the user wakes up the terminal voice response through a specific wake-up word and executes a corresponding voice instruction. For example, when the user opens the map service, the user actively sends a voice instruction of adding the wake-up word to the destination, and the voice information after the terminal recognizes the wake-up word is the service content actively initiated by the user. The terminal initiation refers to a voice interaction behavior initiated by the terminal actively when the terminal receives a wake-up word of a user. Illustratively, the terminal actively plays the voice message according to the target guidance content to initiate the terminal.

The intention of the user speech information is information obtained by recognizing the intention of the speech information based on the speech input by the user. Illustratively, voice information input by a user in the terminal is acquired, and intention recognition is performed. The intention of the terminal voice information is information obtained by performing intention recognition on the voice information according to the voice transmitted by the terminal to the user.

Specifically, the current voice is determined to be initiated by the user through the wakeup word according to the current voice interaction information, and then the current voice interaction type is determined to be the user active voice interaction. For example, if it is determined from the input record that the current voice interaction is initiated by "wake word + command content", e.g., "a small degree i want to navigate to a company", it means that the current voice interaction information is initiated by the user.

Determining an initiation source conversation of the current voice interaction information according to a voice input record in the terminal, wherein the conversation can be a round of conversation between the terminal and a user or a plurality of rounds of conversations, acquiring the intention of each sentence of the two parties in the conversation, judging the intention, and if the intention of the voice information of the user is inconsistent with the intention of the voice information of the terminal, indicating that the current voice interaction information is initiated by the user actively and is active voice interaction of the user. Illustratively, the voice input in the terminal is recorded as: a terminal: whether to navigate to the company, user: navigate to a mall or play a music of Zhou Jieren. And determining that the terminal intention is different from the user intention according to the intention identification, and indicating that the voice interaction information currently navigated to the market or the Zhouguer music information being played is actively initiated by the user.

And determining that the originating source of the current voice interaction information is the terminal according to the voice input record in the terminal, namely, the originating voice conversation does not include a wakeup word, and the first sentence of the originating source is the terminal voice information, identifying the voice intentions of the terminal and the user, and if the identification results are consistent, determining that the current voice interaction type is the user passive voice interaction. Illustratively, the voice input in the terminal is recorded as: a terminal: whether to navigate to the company, user: navigate to the company or yes. And determining that the terminal intention is the same as the user intention according to the intention recognition, and indicating that the voice interaction information currently navigated to the company is passively accepted by the user.

And S240, judging the current voice interaction type. And skipping to the step S250 if the current voice interaction type is the user active voice interaction, and skipping to the step S260 if the current voice interaction type is the user passive voice interaction.

And S250, continuing to execute the current voice interaction operation. And finishing the operation.

And when the current voice interaction type is determined to be the user active voice interaction according to the current voice interaction information, keeping the current voice interaction operation unchanged, and not playing the determined target guide content. Illustratively, it is determined according to the state information of the terminal that the user is using the navigation service, and the target guidance content is whether to navigate home or not, and it is determined according to the current voice interaction type that the voice interaction operation of navigating to the market, which is actively initiated by the user, is currently being executed, and the operation of navigating to the market is maintained, and the target guidance content is not played.

And S260, acquiring the source service of the current guide content. Execution continues with S270.

And when the current voice interaction type is determined to be the passive voice interaction of the user according to the current voice interaction information, indicating that the content played by the current voice is the guide content actively played by the terminal, and acquiring the source service of the current guide content. A determination is made as to the source service of the current guidance content.

S270, if the priority of the source service of the current guide content is lower than that of the source service of the target guide content, stopping the current voice interaction operation and actively playing the voice information of the target guide content.

Determining the priority of the source service of the current guide content and the priority of the source service of the target guide content, if the priority of the source service of the current guide content is lower than the priority of the source service of the target guide content, indicating that the importance of the current guide content is lower than the target guide content, stopping the current voice interaction operation, and actively playing the voice information of the target guide content. The setting of the service priority is described in detail in the above embodiments, and is not described in detail here. For example, if the current guidance content is a music service and the determined target guidance content is a news service, the current music service is stopped from actively playing the voice information of the target guidance content of the news service because the news service has higher priority than the music service.

And if the priority of the source service of the current guide content is higher than that of the source service of the target guide content, indicating that the importance of the current guide content is higher than that of the target guide content, continuing to execute the current voice interaction operation. For example, if the current guidance content is a news service and the determined target guidance content is a music service, because the priority of the news service is higher than that of the music service, the voice interaction operation of the guidance content of the current news service is maintained, and the voice playing of the target guidance content is not performed.

The current voice interaction information may be the current guidance content, or may be specific voice information of the guidance content played after the user performs a positive feedback response to the history target guidance content. For example, the current voice interaction information may be "whether to play the music of the zhou-jilun" or that the music of the zhou-jilun is being played, which is played according to the guidance content of the terminal.

According to the technical scheme provided by the embodiment of the application, when the current voice interaction type is identified as the user active voice interaction according to the judgment of the current voice interaction information of the terminal, the current voice interaction operation is ensured, and the response to the target guide content is not carried out. The intelligent playing operation of the target guide content is realized, the normal service use of the user is prevented from being interfered, and the use experience of the user is improved. When the current voice interaction type is identified as the user passive voice interaction, the real-time performance of the actively played voice is improved by judging the priority of the source service of the current user passive voice interaction and the guide content; meanwhile, the guide contents from different services can be intelligently adjusted according to the requirements of the user by setting the priority of the source service, so that the service use experience of the user is improved.

Third embodiment

Fig. 3 is a flowchart illustrating a passive voice interaction method according to a third embodiment of the present application. The method can be executed by a passive voice interaction device, which can be implemented in software and/or hardware and can be integrated in a server. As shown in fig. 3, the passive voice interaction method provided in this embodiment may include:

s310, acquiring state information of the terminal from the terminal, wherein the state information comprises service providing state information and/or page display state information.

The terminal is a device with a voice playing function, and needs to cooperate with the server to perform passive voice interaction. The server is used for determining the target guide content according to the state information sent by the terminal. The server can be set to be a plurality of servers, which respectively correspond to different services in the terminal, for example, the music server determines the guide content of the music service; the navigation server determines the guidance contents of the navigation service, etc. The efficiency of the guidance content determination can be improved by the setting of a plurality of servers.

And the terminal sends the service type of the state information to a server corresponding to the service. Illustratively, the terminal sends the state information of the music service in the state information to the music server, so that the music server determines whether the state information meets the set triggering condition and determines the target guidance content.

The server obtains service providing state information and/or page display state information of the terminal, illustratively, a plurality of servers respectively obtain state information of corresponding services, and interference is avoided.

And S320, if the state information of the terminal meets the set triggering condition, determining the target guiding content matched with the state information of the terminal.

And the server determines whether a set triggering condition is met or not according to the acquired state information, and if so, determines target guide content. Illustratively, the music server determines whether a music service setting triggering condition is met according to the acquired state information of the music service in the terminal, and if so, determines the target guide content of the music service matched with the state information of the music service.

S330, the target guiding content is sent to the terminal so that the terminal can actively play the voice information of the target guiding content.

The target guide content sent to the terminal by the server is text information, and the terminal synthesizes voice information according to the text information to play. The distortion of voice information in the transmission process is avoided, and the experience of a user is prevented from being influenced. Illustratively, the target guidance content only includes guidance content of a music service, the server sends the guidance content to the terminal, and the terminal actively plays the guidance content through voice, and if the target guidance content determined by the server is whether to play the music of the zhou jilun, the terminal converts the target guidance content into voice information to actively play.

In the embodiment of the application, the server acquires the state information sent by the terminal, the state information of the terminal comprises service providing state information and/or page display state information, the service providing state information and/or the page display state information can reflect the service consumption condition of a user and a favored terminal page from the side, and the requirements and interest points of the user are reflected; when the server meets the set triggering condition through the terminal state information, the target guiding content matched with the terminal state information is determined, the server accurately determines the target guiding content interested by the user, and the determination time of the target guiding content can be flexibly controlled through setting the triggering condition; after the target guiding content is determined, the target guiding content is actively sent to the terminal, and the interactivity between the terminal and the user is ensured.

Fourth embodiment

Fig. 4 is a flowchart illustrating a passive voice interaction method according to a fourth embodiment of the present application. The method can be executed by a passive voice interaction device, which can be implemented in software and/or hardware and can be integrated in a terminal. As shown in fig. 4, the passive voice interaction method provided in this embodiment may include:

s410, sending state information of a terminal to a server so that the server can judge that the state information of the terminal meets a set trigger condition, determining target guiding content matched with the state information of the terminal, and returning the target guiding content; the state information includes service provision state information and/or page display state information.

And S420, receiving the target guide content returned by the server.

And S430, actively playing the voice information of the target guide content.

Optionally, before the actively playing the voice information of the target guidance content, the method further includes: if the number of the target guide contents is at least two, and the target guide contents are sourced from at least two servers; determining a target server according to the priorities of the at least two servers; and screening the target guide content from the target server from at least two target guide contents.

The terminal receives at least two target guidance contents returned by the servers and the target guidance contents are from at least two servers, for example, if both the servers send the target guidance contents, the server with the higher priority is determined to be the target server according to the priority of the server, and the priority setting of the server may refer to the setting of the service priority in the above embodiment. And then, screening the target guide contents from the target server from all the received target guide contents, and actively playing the voice information of the screened target guide contents.

According to the technical scheme provided by the embodiment of the application, the terminal actively plays the received target guide content, and the actively played target guide content is determined by the server according to the terminal state information, so that the target guide information actively played is matched with the real intention of the user, and the satisfaction degree of the user on actively played voice is improved.

Fifth embodiment

Fig. 5 is a structural diagram of a passive voice interaction apparatus in a fifth embodiment of the present application, where the present embodiment is used in a case where a terminal performs voice interaction with a user, and the apparatus is implemented by software and/or hardware and is specifically configured in the terminal with a certain data operation capability.

Fig. 5 shows a passive voice interaction apparatus 500, which includes: a status information acquisition module 51, a target guidance content determination module 52 and a voice information active playing module 53.

The status information acquiring module 51 is configured to acquire status information of the terminal, where the status information includes service providing status information and/or page display status information.

And a target guidance content determining module 52, configured to determine, if the state information of the terminal meets a set trigger condition, a target guidance content that matches the state information of the terminal.

And the voice information active playing module 53 is configured to actively play the voice information of the target guidance content.

Optionally, the target guidance content determining module 52 is specifically configured to: if the service providing state information comprises the set service content in the providing, determining the target guide content matched with the set service content in the providing; and/or if the page display state information comprises the currently displayed set page content, determining the target guide content matched with the currently displayed set page content.

Optionally, the target guidance content determining module 52 includes: the service determining unit is used for determining the service to which the set triggering condition belongs if the state information of the terminal meets the set triggering condition; and the target guiding content determining unit is used for acquiring the target guiding content matched with the state information of the terminal from the service to which the set triggering condition belongs.

Optionally, the target guidance content determining unit is specifically configured to: if the number of the services to which the set triggering conditions belong is at least two, determining a target service according to the priorities of the at least two services; and acquiring target guide content matched with the state information of the terminal from the target service.

Optionally, the voice information active playing module 53 is specifically configured to: acquiring current voice interaction information, and judging a current voice interaction type according to the current voice interaction information, wherein the current voice interaction information comprises current guide content; if the current voice interaction type is the passive voice interaction of the user, acquiring the source service of the current guide content; and if the priority of the source service of the current guide content is lower than that of the source service of the target guide content, stopping the current voice interaction operation and actively playing the voice information of the target guide content.

The device further comprises: and the continuous execution module is used for continuously executing the current voice interaction operation if the current voice interaction type is the user active voice interaction type.

The passive voice interaction device can execute the passive voice interaction method provided by any embodiment of the application, and has corresponding functional modules and beneficial effects for executing the passive voice interaction method.

Sixth embodiment

Fig. 6 is a structural diagram of a passive voice interaction apparatus in a sixth embodiment of the present application, where the present embodiment is used in a case where a terminal performs voice interaction with a user, and the apparatus is implemented by software and/or hardware and is specifically configured in a server with a certain data computation capability.

Fig. 6 shows a passive voice interaction device 600, comprising: a status information acquisition module 61, a target guidance content determination module 62 and a target guidance content transmission module 63.

The status information acquiring module 61 is configured to acquire status information of a terminal from the terminal, where the status information includes service providing status information and/or page display status information.

And a target guidance content determining module 62, configured to determine, if the state information of the terminal meets a set trigger condition, a target guidance content that matches the state information of the terminal.

And a target guidance content sending module 63, configured to send the target guidance content to the terminal, so that the terminal actively plays the voice information of the target guidance content.

Seventh embodiment

Fig. 7 is a structural diagram of a passive voice interaction apparatus in a seventh embodiment of the present application, where the present embodiment is used in a case where a terminal performs voice interaction with a user, and the apparatus is implemented by software and/or hardware and is specifically configured in the terminal with a certain data operation capability.

A passive voice interaction device 700, as shown in fig. 7, includes: a status information sending module 71, a target guidance content receiving module 72 and a voice information active playing module 73.

A status information sending module 71, configured to send status information of a terminal to a server, so that the server determines that the status information of the terminal meets a set trigger condition, determines a target guidance content matched with the status information of the terminal, and returns the target guidance content; the state information includes service provision state information and/or page display state information.

And a target guiding content receiving module 72, configured to receive the target guiding content returned by the server.

And the voice information active playing module 73 is configured to actively play the voice information of the target guidance content.

Optionally, the apparatus further includes a target guidance content screening module, specifically configured to: if the number of the target guide contents is at least two, and the target guide contents are sourced from at least two servers; determining a target server according to the priorities of the at least two servers; and screening the target guide content from the target server from at least two target guide contents.

Eighth embodiment

According to an embodiment of the present application, a terminal, a server and a readable storage medium are also provided.

Fig. 8 is a block diagram of a terminal of a passive voice interaction method according to an embodiment of the present application. Terminals are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Terminals may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable electronics, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 8, the terminal includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the terminal, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output device (such as display electronics coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple terminals may be connected, with each terminal providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 8 illustrates an example of a processor 801.

The memory 802 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the passive voice interaction method provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the passive voice interaction method provided by the present application.

The memory 802 is a non-transitory computer readable storage medium, and can be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the passive voice interaction method in the embodiment of the present application (for example, the state information acquiring module 51, the target guidance content determining module 52, and the voice information active playing module 53 shown in fig. 5 or the state information transmitting module 71, the target guidance content receiving module 72, and the voice information active playing module 73 shown in fig. 7). The processor 801 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 802, that is, implements the passive voice interaction method in the above method embodiment.

The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal for passive voice interaction, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 802 optionally includes memory located remotely from processor 801, which may be connected to a passive voice interaction terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, blockchain networks, local area networks, mobile communication networks, and combinations thereof.

The terminal of the passive voice interaction method may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or other means, and are exemplified by a bus in fig. 8.

The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the terminal for passive voice interaction, such as an input device like a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. The output devices 804 may include display electronics, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display electronics may include, but are not limited to, Liquid Crystal Displays (LCDs), Light Emitting Diode (LED) displays, and plasma displays. In some implementations, the display electronics can be a touch screen.

The present embodiment further provides a server, a structure of which is shown in fig. 8, and details of each part in fig. 8 are described in the foregoing embodiments, which are not described herein again. Except that the memory 802 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the passive voice interaction method in the embodiment of the present application (for example, the system including the status information acquiring module 61, the target guidance content determining module 62, and the target guidance content transmitting module 63 shown in fig. 6).

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, electronic device, and/or apparatus (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the user query information related to the real-time positioning information is determined according to the acquired real-time positioning information of the user, the query result is acquired, and the query result is displayed on the background image of the input method interface used by the user, so that the scene prompt of the user is realized by utilizing the background image of the input method interface, and the functions of the input method system are enriched.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A passive voice interaction method for a user is applicable to a terminal, and comprises the following steps:

acquiring state information of the terminal, wherein the state information comprises service providing state information and/or page display state information;

and actively playing the voice information of the target guide content.

2. The method according to claim 1, wherein the determining the target guidance content matching with the state information of the terminal if the state information of the terminal satisfies a set trigger condition comprises:

3. The method according to claim 1, wherein the determining the target guidance content matching with the state information of the terminal if the state information of the terminal satisfies a set trigger condition comprises:

4. The method according to claim 3, wherein the obtaining of the target guidance content matching with the state information of the terminal from the service to which the setting triggering condition belongs comprises:

5. The method according to any one of claims 1-4, wherein actively playing the voice message of the target guidance content comprises:

6. The method of claim 5, wherein after the determining a current voice interaction type according to the current voice interaction information, the method further comprises:

and if the current voice interaction type is the active voice interaction of the user, continuing to execute the current voice interaction operation.

7. A passive voice interaction method for a user is applicable to a server, and comprises the following steps:

acquiring state information of the terminal from the terminal, wherein the state information comprises service providing state information and/or page display state information;

8. A passive voice interaction method for a user is applicable to a terminal, and comprises the following steps:

sending state information of a terminal to the server, so that the server judges that the state information of the terminal meets a set trigger condition, determines target guiding content matched with the state information of the terminal, and returns the target guiding content; the state information comprises service providing state information and/or page display state information;

receiving target guide content returned by the server;

and actively playing the voice information of the target guide content.

9. The method according to claim 8, further comprising, before the actively playing the voice message of the target guidance content:

if the number of the target guide contents is at least two, and the target guide contents are sourced from at least two servers;

10. A passive voice interaction device for a terminal, the device comprising:

the terminal comprises a state information acquisition module, a state information acquisition module and a display module, wherein the state information acquisition module is used for acquiring the state information of the terminal, and the state information comprises service providing state information and/or page display state information;

11. A passive voice interaction device adapted for use with a server, the device comprising:

the terminal comprises a state information acquisition module, a state information acquisition module and a display module, wherein the state information acquisition module is used for acquiring the state information of the terminal from the terminal, and the state information comprises service providing state information and/or page display state information;

12. A passive voice interaction device for a terminal, the device comprising:

the state information sending module is used for sending the state information of the terminal to the server so that the server can judge that the state information of the terminal meets a set trigger condition, determine target guide content matched with the state information of the terminal and return the target guide content; the state information comprises service providing state information and/or page display state information;

13. A terminal, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of user passive speech interaction of any of claims 1-6 or 8-9.

14. A server, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the user passive speech interaction method of claim 7.

15. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the user passive speech interaction method of any of claims 1-9.