CN112735411A

CN112735411A - Control method, client, vehicle, voice system, and storage medium

Info

Publication number: CN112735411A
Application number: CN202011562171.5A
Authority: CN
Inventors: 易晖; 杨如栋; 鲍鹏丽; 赵耀; 翁志伟
Original assignee: Guangzhou Xiaopeng Motors Technology Co Ltd; Guangzhou Chengxingzhidong Automotive Technology Co., Ltd
Current assignee: Guangzhou Xiaopeng Motors Technology Co Ltd; Guangzhou Chengxingzhidong Automotive Technology Co., Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-04-30
Also published as: WO2022135492A1

Abstract

The application discloses a control method, a client, a vehicle, a voice system and a storage medium. The control method is used for controlling a client, the client comprises a voice assistant, and the control method comprises the following steps: controlling the voice assistant to activate to enable the voice assistant to enter a listening state, acquiring a voice signal by the voice assistant in the listening state, and determining whether a control instruction exists; and when the control instruction exists, controlling the voice assistant to enter an execution state, controlling the client by the voice assistant according to the control instruction in the execution state, and recovering the listening state after the control is finished. The voice assistant is controlled to enter a listening state, the voice assistant acquires a voice signal in the listening state, whether a control instruction exists is determined, reactivation is not needed at the moment, a function of activating continuous conversation at one time is achieved, and the voice assistant is more convenient to use. And in the execution state, the voice assistant correspondingly controls the client according to the control instruction, and manages the voice assistant through a two-stage conversation state (namely, a listening state and an execution state), so that the voice assistant can conveniently perform different works.

Description

Control method, client, vehicle, voice system, and storage medium

Technical Field

The present application relates to the field of voice technologies, and in particular, to a control method, a client, a vehicle, a voice system, and a storage medium.

Background

In the related art, voice interaction needs to activate a voice assistant with a fixed wake-up word each time, and then a conversation can be completed, and the voice assistant automatically exits after the conversation is finished, so that the wake-up interaction is inconvenient to use.

Disclosure of Invention

Embodiments of the present application provide a control method, a client, a vehicle, a voice system, and a storage medium.

The control method of the embodiment of the application is used for controlling a client, the client comprises a voice assistant, and the control method comprises the following steps: controlling the voice assistant to activate to cause the voice assistant to enter a listening state in which the voice assistant is capable of acquiring a voice signal to determine whether control instructions are present directly from the voice signal; and when the control instruction exists, controlling the voice assistant to enter an execution state, and in the execution state, the voice assistant can correspondingly control the client according to the control instruction and restore the listening state after the control is finished.

In some embodiments, the client includes a display screen for displaying the avatar of the voice assistant, and the control method includes: and when the voice assistant enters the execution state, controlling the display screen to display card information corresponding to the control instruction and a first preset action or a first preset expression of the image.

In some embodiments, the client includes a display screen for displaying the avatar of the voice assistant, and the control method includes: and when the voice assistant enters the listening state, controlling the display screen to display a second preset action or a second preset expression of the image.

In some embodiments, the control instruction includes opening a preset application, and the control method includes: and controlling the voice assistant to change the execution state into the listening state when a closing signal of the preset application is detected.

In some embodiments, the client further includes a control key, and the control method includes: and controlling the voice assistant to change the execution state into the listening state according to the trigger information of the control key.

In certain embodiments, the control method comprises: controlling the voice assistant to activate to cause the voice assistant to enter the listening state when the voice assistant receives a turn-on voice instruction; controlling the voice assistant to shut down when the voice assistant receives a shut down voice instruction.

In certain embodiments, the control method comprises: creating a listening state node and pushing the listening state node into a dialog state stack when the voice assistant is activated; when the voice assistant changes from the listening state to the executing state, creating an executing state node and pushing the executing state node to the conversation state stack; refreshing the executing state node when the voice assistant maintains the executing state; popping the executive state node up the dialog state stack when the voice assistant changes from the executive state to the listening state; and when the voice assistant is closed, popping the listening state node up the dialog state stack.

In some embodiments, the listening state node includes state information and session information, and the executing state node includes state information, session information, and executing information.

In some embodiments, the client communicates with a server, and the control method includes: and sending the voice signal to the server, wherein the server is used for determining whether the control instruction exists according to the voice signal and feeding back the result to the client.

In certain embodiments, the control method comprises: and controlling the client and the server to synchronize the conversation state of the voice assistant, wherein the conversation state comprises the listening state and the executing state.

The client of the embodiment of the application comprises a voice assistant and a processor, wherein the processor is used for: controlling the voice assistant to activate to cause the voice assistant to enter a listening state in which the voice assistant is capable of acquiring a voice signal to determine whether control instructions are present directly from the voice signal; and when the control instruction exists, controlling the voice assistant to enter an execution state, and in the execution state, the voice assistant can correspondingly control the client according to the control instruction and restore the listening state after the control is finished.

The vehicle of the embodiment of the application comprises a vehicle body and the client of any one of the embodiments, wherein the client is arranged on the vehicle body.

The voice system of the embodiment of the application comprises a server and the client of any one of the embodiments, wherein the server is communicated with the client.

The computer-readable storage medium of the embodiments of the present application has stored thereon a computer program that, when executed by a processor, implements the control method of any of the embodiments described above.

The control method, the client, the vehicle, the voice system and the storage medium of the embodiment of the application enter the listening state after controlling the voice assistant to be activated, the voice assistant can acquire the voice signal in the listening state to directly determine whether the control instruction exists according to the voice signal, and the voice assistant does not need to be activated again at the moment, so that the function of activating continuous conversation once can be realized, and the voice assistant is more convenient to use. In addition, the voice assistant can correspondingly control the client according to the control instruction in the execution state, so that the voice assistant is managed through a two-stage conversation state (namely, a listening state and an execution state), and different work of the voice assistant is facilitated.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flow chart of a control method according to an embodiment of the present application;

FIG. 2 is a block diagram of a client according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a control method according to an embodiment of the present application;

FIG. 4 is a schematic illustration of a dialog state for an embodiment of the present application;

fig. 5 to 8 are schematic flow charts of a control method according to an embodiment of the present application;

FIGS. 9-13 are schematic diagrams of dialog states according to embodiments of the present application;

FIG. 14 is a schematic illustration of a vehicle according to an embodiment of the present application;

FIG. 15 is a schematic diagram of a speech system according to an embodiment of the present application;

fig. 16 is a schematic diagram of a connection between a processor and a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

In the description of the embodiments of the present application, the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the embodiments of the present application, "a plurality" means two or more unless specifically defined otherwise.

Referring to fig. 1, a control method according to an embodiment of the present application is used for controlling a client 100, where the client 100 includes a voice assistant 10, and the control method includes:

step 01: control voice assistant 10 to activate to cause voice assistant 10 to enter a listening state in which voice assistant 10 is able to acquire a voice signal to determine whether control instructions are present directly from the voice signal;

step 02: when the control instruction exists, the voice assistant 10 is controlled to enter an execution state, and in the execution state, the voice assistant 10 can perform corresponding control on the client 100 according to the control instruction and restore the listening state after the control is finished.

Referring to fig. 2, the present application further discloses a client 100. Specifically, the control method of the embodiment of the present application may be implemented by the client 100 of the embodiment of the present application, and the client 100 includes the voice assistant 10 and the processor 20. Wherein, the

steps

01 and 02 can be implemented by the processor 20, that is, the processor 20 is configured to: control voice assistant 10 to activate to cause voice assistant 10 to enter a listening state in which voice assistant 10 is able to acquire a voice signal to determine whether control instructions are present directly from the voice signal; when the control instruction exists, the voice assistant 10 is controlled to enter an execution state, and in the execution state, the voice assistant 10 can perform corresponding control on the client 100 according to the control instruction and restore the listening state after the control is finished.

The control method and the client 100 can control the voice assistant 10 to enter the listening state after being activated, and in the listening state, the voice assistant 10 can acquire the voice signal to directly determine whether a control instruction exists according to the voice signal, and at this time, the voice assistant does not need to be activated again, and can realize the function of activating continuous conversation once, so that the voice assistant 10 is more convenient to use. In addition, in the execution state, voice assistant 10 can perform corresponding control on client 100 according to the control instruction, so that voice assistant 10 is managed through the two-stage conversation state (i.e. listening state and execution state), which facilitates different operations performed by voice assistant 10.

The voice interaction of the related art requires that voice assistant 10 be activated with a fixed wake-up word each time before the conversation is completed, and voice assistant 10 automatically exits after the conversation is completed, which is inconvenient to use. The control method and client 100 of the present embodiment can implement the function of activating continuous dialog once without reactivation, so that the voice assistant 10 is more convenient to use. The control method and client 100 of the present embodiment manage voice assistant 10 through two-stage dialog states (i.e., listening state and execution state), so that voice assistant 10 can perform different operations.

Specifically, the embodiment of the present application includes two levels of dialog states (i.e., a listening state and an execution state), the listening state includes dialog information, and the execution state includes execution information. The listening state has convenience and the execution state has strong perception. Controlling voice assistant 10 to activate voice assistant 10 enters a listening state in which voice assistant 10 is able to acquire a voice signal to determine whether control instructions are present directly from the voice signal. The control instruction corresponding to the voice signal may be set before leaving a factory, or may be set by a user in a self-defined manner, which is not limited herein. When the control instruction exists, the voice assistant 10 is controlled to enter an execution state, and in the execution state, the voice assistant 10 can perform corresponding control on the client 100 according to the control instruction and restore the listening state after the control is finished. In this manner, voice assistant 10 is managed through the two-stage dialog state, such that voice assistant 10 performs corresponding control in the execution state and resumes the listening state after control ends. In this way, voice assistant 10 exits the execution state, but keeps listening, which better considers the strong perceptibility and convenience of voice interaction, and simultaneously facilitates voice assistant 10 to perform different tasks.

Referring to FIG. 3, in some embodiments, the client 100 includes a display 30, the display 30 is used to display the image of the voice assistant 10, and the control method includes:

step 03: when the voice assistant 10 enters the execution state, the control display screen 30 displays a first preset action or a first preset expression of the card information and the image corresponding to the control instruction.

The control method of the embodiment of the present application can be implemented by the client 100 of the embodiment of the present application, and the client 100 further includes a display screen 30. Wherein, step 03 can be implemented by the processor 20, that is, the processor 20 is configured to: when the voice assistant 10 enters the execution state, the control display screen 30 displays a first preset action or a first preset expression of the card information and the image corresponding to the control instruction.

Specifically, referring to fig. 2 and fig. 4 together, the client 100 includes a display screen 30, where the display screen 30 is used to display an image of the voice assistant 10, and when the voice assistant 10 enters the execution state, the image of the voice assistant 10 is a first preset action or a first preset expression. In one example, when voice assistant 10 enters the execution state, the avatar of voice assistant 10 may be the avatar of a virtual robot, and the first preset action of the avatar of voice assistant 10 may be the action of the virtual robot with the avatar enlarged and eyes open, while the virtual robot may be located in the middle of display 30. The first preset expression of the avatar of the voice assistant 10 may be a strongly blinking expression. When the voice assistant 10 enters the execution state, the control display screen 30 displays card information corresponding to the control command, and the card information may be content corresponding to the control command. In one example, the control display 30 may display the control command "navigate to the middle customs", and the display 30 may display the card information corresponding to the control command as several routes to the middle customs. In some embodiments, the virtual robot may be displayed above the card information when the voice assistant 10 enters the execution state.

Referring again to FIG. 3, in some embodiments, client 100 includes a display 30, display 30 is used to display the image of voice assistant 10, and the control method includes:

step 04: when the voice assistant 10 enters the listening state, the display screen 30 is controlled to display a second preset action or a second preset expression of the character.

The control method of the embodiment of the present application can be implemented by the client 100 of the embodiment of the present application, and the client 100 further includes a display screen 30. Wherein, step 03 can be implemented by the processor 20, that is, the processor 20 is configured to: when the voice assistant 10 enters the listening state, the display screen 30 is controlled to display a second preset action or a second preset expression of the character.

Specifically, referring to fig. 4, client 100 includes a display screen 30, where display screen 30 is used to display an image of voice assistant 10, and when voice assistant 10 enters a listening state, the image of voice assistant 10 is a second preset action or a second preset expression. In one example, when the voice assistant 10 enters the listening state, the avatar of the voice assistant 10 may be the avatar of a virtual robot, and the second preset action of the avatar of the voice assistant 10 may be the action of the virtual robot that the head becomes bigger and the eyes show ripples, and the virtual robot may be located in the middle of the display screen 30. The second preset expression of the avatar of the voice assistant 10 may be a small-amplitude blinking expression. It is worth mentioning that display screen 30 displays no card information when voice assistant 10 enters the listening state. The display 30 may display a small text box, the content of which may be speech information, such as: "you are at dry and lama", etc.

When voice assistant 10 is in an un-awakened state, the avatar of voice assistant 10 may be displayed in the upper left corner of display screen 30, and the avatar of voice assistant 10 may be relatively small in size. After voice assistant 10 is activated, voice assistant 10 may display in the middle of display screen 30, the avatar of voice assistant 10 becoming larger. When voice assistant 10 receives a voice, voice assistant 10 may enter a listening state. Voice assistant 10 may enter the execution state when voice assistant 10 executes corresponding control instructions. After voice assistant 10 exits, voice assistant 10 may become in an un-awakened state again.

When the server 200 is in the listening state, it may receive the voice signal transmitted by the client 100 to collect voice for a dialog, and then perform Natural Language Understanding (NLU) on the received voice signal, and at this time, it may recognize whether the dialog represented by the voice signal is an meaningless dialog or a meaningful dialog, and may reject the voice signal when the dialog is a meaningless dialog, and prepare to execute a control command corresponding to the voice signal when the dialog is a meaningful dialog. At this time, the client 100 and the server 200 start to enter an execution state from a listening state, and when the server 200 is in the execution state, the server 200 determines whether the dialog belongs to a multi-turn dialog or a single-turn dialog, where the multi-turn dialog refers to a function (e.g., navigation, etc.) that can be realized only by multi-turn dialog information, the multi-turn dialog refers to a function (e.g., brightness of a screen is adjusted to 100%) that can be realized by single-turn dialog information, and the like, and when the multi-turn dialog enters a script mode (e.g., a user speaks a certain place during navigation, and then location confirmation can be performed next step), and when the single-turn dialog, the client 100 is directly issued with.

Referring to fig. 5, in some embodiments, the control command includes opening a predetermined application, and the control method includes:

step 05: upon detecting a close signal of a preset application, the voice assistant 10 is controlled to change from the execution state to the listening state.

The control method of the embodiment of the present application may be implemented by the client 100 of the embodiment of the present application, and the client 100 includes the voice assistant 10 and the processor 20. Wherein step 05 can be implemented by the processor 20, that is, the processor 20 is configured to: upon detecting a close signal of a preset application, the voice assistant 10 is controlled to change from the execution state to the listening state.

Specifically, the preset application may be navigation, a music player, a search engine, and the like, and in one example, the voice assistant 10 is controlled to activate so as to enable the voice assistant 10 to enter a listening state, and the voice assistant 10 acquires a voice signal of "i want to listen to a sunny day of zhou jiron" in the listening state, and the voice signal of "i want to listen to a sunny day of zhou jiron" determines that there is a control instruction, so that the music player may be turned on to play "sunny day" of zhou jiron. When the preset application, i.e. the music player, is closed by the user, and a closing signal of the preset application can be detected, the voice assistant 10 is controlled to change from the execution state to the listening state. In the listening state, voice assistant 10 can again obtain the voice signal to determine whether there is a control command directly according to the voice signal, and at this time, it is not necessary to reactivate the voice signal, and the function of activating continuous conversation at a time can be realized, so that voice assistant 10 is more convenient to use.

Referring to fig. 6, in some embodiments, the client 100 further includes a control button, and the control method includes:

step 06: and controlling the voice assistant 10 to change from the execution state to the listening state according to the triggering information of the control key.

The control method of the embodiment of the present application can be implemented by the client 100 of the embodiment of the present application, and the client 100 further includes a control key. Wherein step 06 can be implemented by the processor 20, that is, the processor 20 is configured to: and controlling the voice assistant 10 to change from the execution state to the listening state according to the triggering information of the control key.

Specifically, the control keys may be virtual keys arranged on the display screen 30 (for example, the display screen 30 is a touch screen, and the virtual keys are icons displayed on the touch screen), or may be separately arranged physical keys. Client 100 may change voice assistant 10 from the execution state to the listening state by tapping on the virtual keys; client 100 may also quickly exit the current execution state by pressing a physical key to change voice assistant 10 from the execution state to the listening state. The control key can be used for conveniently managing the voice assistant and quickly switching the two-stage conversation state of the voice assistant.

Referring to fig. 7, in some embodiments, the control method includes:

step 071: control voice assistant 10 to activate to put voice assistant 10 into a listening state when voice assistant 10 receives a turn-on voice command;

step 072: voice assistant 10 is controlled to shut down when voice assistant 10 receives a shut down voice command.

The control method of the embodiment of the present application may be implemented by the client 100 of the embodiment of the present application, and the client 100 includes the voice assistant 10 and the processor 20. Wherein, step 071 and step 072 can be implemented by the processor 20, that is, the processor 20 is configured to: control voice assistant 10 to activate to put voice assistant 10 into a listening state when voice assistant 10 receives a turn-on voice command; voice assistant 10 is controlled to shut down when voice assistant 10 receives a shut down voice command.

Specifically, voice assistant 10 is controlled to activate to put voice assistant 10 into a listening state when voice assistant 10 receives a turn-on voice command. The voice opening instruction may be set before leaving a factory, or may be set by a user in a self-defined manner, which is not limited herein. In one example, the turn-on voice command may be "hello, small P," which when received by voice assistant 10 controls voice assistant 10 to activate to cause voice assistant 10 to enter a listening state. The voice closing instruction may be set before leaving a factory, or may be set by a user in a self-defined manner, which is not limited herein. In one example, the close voice command may be "exit," which controls voice assistant 10 to close when voice assistant 10 receives the "exit.

Referring to fig. 8, in some embodiments, the control method includes:

step 081: creating a listening state node and pushing the listening state node onto the dialog state stack upon activation of the voice assistant 10;

step 082: when the voice assistant 10 changes from the listening state to the executing state, creating an executing state node and pushing the executing state node to the dialog state stack;

step 083: while voice assistant 10 maintains the execution state, the execution state nodes are refreshed;

step 084: when the voice assistant 10 changes from the execution state to the listening state, popping up the execution state node to a dialog state stack;

step 085: when the voice assistant 10 closes, the listening state node is popped up the dialog state stack.

The control method of the application embodiment can be implemented by the client 100 of the application embodiment, and the client 100 comprises the voice assistant 10 and the processor 20. Step 081, step 082, step 083, step 084 and step 085 may all be implemented by the processor 20, that is, the processor 20 is configured to: creating a listening state node and pushing the listening state node onto the dialog state stack upon activation of the voice assistant 10; when the voice assistant 10 changes from the listening state to the executing state, creating an executing state node and pushing the executing state node to the dialog state stack; while voice assistant 10 maintains the execution state, the execution state nodes are refreshed; when the voice assistant 10 changes from the execution state to the listening state, popping up the execution state node to a dialog state stack; when the voice assistant 10 closes, the listening state node is popped up the dialog state stack.

Referring to FIG. 9, the use of a state stack may facilitate the management of two-level dialog states. In particular, as meaningful speech is executed, the speech assistant 10 may switch from a listening state to an executing state, at which point an executing state node may be created and pushed into the dialog state stack. The executive nodes may be updated when a button on the card is clicked to trigger a new execution event or when multiple rounds of dialog scripts are executed, at which time the executive nodes in the dialog state stack are refreshed. The speech assistant 10 may change from the execution state to the listening state when the card is closed, the current application is closed or the current dialog execution ends, or the execution exits the current dialog class, at which point the execution state node may pop from the dialog state stack.

Referring to FIG. 10, in some embodiments, the listening state node includes state information and session information, and the executing state node includes state information, session information, and executing information. In one example, the state information of the listening state node may be a wait state. The dialogue information of the listening state node includes text information, response information, and category information. The text information can be a voice signal 'haha', and because the 'haha' has no control instruction, the response information is null and the category information is rejected. The state information of the execution state node may be an execution state. The dialogue information of the execution state node comprises text information, response information and category information. The text message may be a voice signal "navigate to north", the response message may be a Point of interest (POI) list, and the category information is a selection status. The execution information of the execution state node comprises a conversation turn, a script name and an execution interface. The number of dialog turns may be a first turn, a second turn, a third turn, etc. The scenario name is a Point of interest (POI) selection, and may be common dialog information set before shipment. The execution interface may be card information. It is worth mentioning that the card information may be content of the interest point selection, and the interest point selection may include: north large south door, north large east door, north large parking lot, north large bus station, etc.

Referring to fig. 4 and 11 together, in some embodiments, the client 100 communicates with the server 200, and the control method includes:

and sending the voice signal to the server 200, wherein the server 200 is configured to determine whether a control instruction exists according to the voice signal and feed back the result to the client 100.

The control method of the embodiment of the present application can be implemented by the client 100 of the embodiment of the present application, the client 100 includes the voice assistant 10 and the processor 20, and the client 100 communicates with the server 200. The above steps may be implemented by the processor 20, that is, the processor 20 is configured to: and sending the voice signal to the server 200, wherein the server 200 is configured to determine whether a control instruction exists according to the voice signal and feed back the result to the client 100.

Referring to fig. 4 again, the ue 100 communicates with the server 200, and can send a voice signal to the server 200, and the server 200 is configured to determine whether a control command exists according to the voice signal and feed back the result to the ue 100. Thus, the client 100 is configured to receive the voice signal and send the voice signal to the server 200, and the server 200 may be utilized to determine whether a control instruction exists and feed back the result to the client 100.

Referring to fig. 4, 11, 12 and 13 together, in some embodiments, the control method includes:

control client 100 synchronizes the dialog states of voice assistant 10, including listening state and execution state, with server 200.

The control method of the embodiment of the present application can be implemented by the client 100 of the embodiment of the present application, the client 100 includes the voice assistant 10 and the processor 20, and the client 100 communicates with the server 200. The above steps may be implemented by the processor 20, that is, the processor 20 is configured to: control client 100 synchronizes the dialog states of voice assistant 10, including listening state and execution state, with server 200.

Specifically, client 100 and server 200 synchronize the dialog state of voice assistant 10, and client 100 and server 200 always maintain a high degree of consistency in state.

Referring to FIG. 11, in some embodiments, the two-stage dialog state may be managed using full speech. For example: the user may wake up voice assistant 10 with the turn-on voice instruction "hello, small P," which controls voice assistant 10 to activate to cause voice assistant 10 to enter a listening state when voice assistant 10 receives "hello, small P. The user may speak a meaningful voice "navigate to middle guancun" and as such, voice assistant 10 enters the executive mode and voice assistant 10 may ask the user while listing the point of interest selection "did you find the result, which you are going to? The "user may answer" first "with a voice, and the user may also exit using the close voice command to exit voice assistant 10.

Referring to FIG. 12, in some embodiments, the two-stage dialog state may be managed using a combination of voice and key presses. For example: the user may wake up voice assistant 10 with the turn-on voice instruction "hello, small P," which controls voice assistant 10 to activate to cause voice assistant 10 to enter a listening state when voice assistant 10 receives "hello, small P. The user may speak a meaningful voice "navigate to middle guancun" and as such, voice assistant 10 enters the executive mode and voice assistant 10 may ask the user while listing the point of interest selection "did you find the result, which you are going to? ", the user may manually close the navigation application as desired. The user may also manually turn off voice assistant 10 by pressing a button.

Referring to fig. 14, the present application discloses a vehicle 300, the vehicle 300 includes a vehicle body 301 and the client 100 of any one of the above embodiments, and the client 100 is disposed on the vehicle body 301.

In this way, the client 100 of the vehicle 300 according to the embodiment of the present application may control the voice assistant 10 to enter the listening state after being activated, and in the listening state, the voice assistant 10 may obtain the voice signal to directly determine whether there is a control instruction according to the voice signal, and at this time, the function of activating the continuous conversation at one time may be implemented without reactivation, so that the voice assistant 10 is more convenient to use. In addition, in the execution state, voice assistant 10 can perform corresponding control on client 100 according to the control instruction, so that voice assistant 10 is managed through the two-stage conversation state (i.e. listening state and execution state), which facilitates different operations performed by voice assistant 10. It is worth mentioning that the vehicle 300 may be connected to the client 100 through a wireless communication manner (e.g., WIFI, mobile communication network, etc.). The vehicle 300 includes, but is not limited to, a pure electric vehicle, a hybrid electric vehicle, an extended range electric vehicle, a fuel vehicle, and the like.

Referring to fig. 15, the present application discloses a speech system 500, where the speech system 500 includes a server 200 and a client 100 according to any of the above embodiments, and the server 200 communicates with the client 100.

Thus, the speech system 500 according to the embodiment of the present application may control the speech assistant 10 to enter the listening state after being activated, and in the listening state, the speech assistant 10 may obtain the speech signal to directly determine whether there is a control instruction according to the speech signal, and at this time, the speech assistant does not need to be activated again, and can implement a function of activating a continuous conversation at a time, so that the speech assistant 10 is more convenient to use. In addition, in the execution state, voice assistant 10 can perform corresponding control on client 100 according to the control instruction, so that voice assistant 10 is managed through the two-stage conversation state (i.e. listening state and execution state), which facilitates different operations performed by voice assistant 10.

Referring to fig. 16, the present application further provides a computer readable storage medium 1000, on which a computer program is stored, and when the computer program is executed by the processor 20, the processor 20 is enabled to execute the steps of the control method according to any of the above embodiments.

For example, in the case where the program is executed by the processor 20, the steps of the following control method are implemented:

In this way, the computer-readable storage medium 1000 according to the embodiment of the present application may control the voice assistant 10 to enter the listening state after being activated, and in the listening state, the voice assistant 10 may obtain the voice signal to directly determine whether the control instruction exists according to the voice signal, and at this time, the function of activating the continuous conversation once may be implemented without reactivation, so that the voice assistant 10 is more convenient to use. In addition, in the execution state, voice assistant 10 can perform corresponding control on client 100 according to the control instruction, so that voice assistant 10 is managed through the two-stage conversation state (i.e. listening state and execution state), which facilitates different operations performed by voice assistant 10.

It will be appreciated that the computer program comprises computer program code. The computer program code may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), software distribution medium, and the like.

The processor may refer to a processor included in the controller. The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc.

In the description herein, reference to the description of the terms "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example" or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processing module-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the embodiments of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations of the above embodiments may be made by those of ordinary skill in the art within the scope of the present application.

Claims

1. A control method for controlling a client, wherein the client comprises a voice assistant, the control method comprising:

controlling the voice assistant to activate to cause the voice assistant to enter a listening state in which the voice assistant is capable of acquiring a voice signal to determine whether control instructions are present directly from the voice signal;

and when the control instruction exists, controlling the voice assistant to enter an execution state, and in the execution state, the voice assistant can correspondingly control the client according to the control instruction and restore the listening state after the control is finished.

2. The control method of claim 1, wherein the client comprises a display screen for displaying the avatar of the voice assistant, the control method comprising:

and when the voice assistant enters the execution state, controlling the display screen to display card information corresponding to the control instruction and a first preset action or a first preset expression of the image.

3. The control method of claim 1, wherein the client comprises a display screen for displaying the avatar of the voice assistant, the control method comprising:

and when the voice assistant enters the listening state, controlling the display screen to display a second preset action or a second preset expression of the image.

4. The control method according to claim 1, wherein the control instruction includes opening a preset application, the control method comprising:

and controlling the voice assistant to change the execution state into the listening state when a closing signal of the preset application is detected.

5. The control method according to claim 1, wherein the client further comprises a control key, and the control method comprises:

and controlling the voice assistant to change the execution state into the listening state according to the trigger information of the control key.

6. The control method according to claim 1, characterized by comprising:

controlling the voice assistant to activate to cause the voice assistant to enter the listening state when the voice assistant receives a turn-on voice instruction;

controlling the voice assistant to shut down when the voice assistant receives a shut down voice instruction.

7. The control method according to claim 1, characterized by comprising:

creating a listening state node and pushing the listening state node into a dialog state stack when the voice assistant is activated;

when the voice assistant changes from the listening state to the executing state, creating an executing state node and pushing the executing state node to the conversation state stack;

refreshing the executing state node when the voice assistant maintains the executing state;

popping the executive state node up the dialog state stack when the voice assistant changes from the executive state to the listening state;

and when the voice assistant is closed, popping the listening state node up the dialog state stack.

8. The control method of claim 7, wherein the listening state node comprises state information and session information, and wherein the executing state node comprises state information, session information, and executing information.

9. The control method according to claim 1, wherein the client communicates with a server, the control method comprising:

and sending the voice signal to the server, wherein the server is used for determining whether the control instruction exists according to the voice signal and feeding back the result to the client.

10. The control method according to claim 9, characterized by comprising:

and controlling the client and the server to synchronize the conversation state of the voice assistant, wherein the conversation state comprises the listening state and the executing state.

11. A client, comprising a voice assistant and a processor configured to: controlling the voice assistant to activate to cause the voice assistant to enter a listening state in which the voice assistant is capable of acquiring a voice signal to determine whether control instructions are present directly from the voice signal; and when the control instruction exists, controlling the voice assistant to enter an execution state, and in the execution state, the voice assistant can correspondingly control the client according to the control instruction and restore the listening state after the control is finished.

12. A vehicle characterized in that the vehicle comprises a vehicle body and the client of claim 11, the client being provided on the vehicle body.

13. A speech system comprising a server and a client as claimed in claim 11, the server being in communication with the client.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the control method according to any one of claims 1 to 10.