WO2022135492A1 - 控制方法、客户端、车辆、语音系统和存储介质 - Google Patents

控制方法、客户端、车辆、语音系统和存储介质 Download PDF

Info

Publication number
WO2022135492A1
WO2022135492A1 PCT/CN2021/140569 CN2021140569W WO2022135492A1 WO 2022135492 A1 WO2022135492 A1 WO 2022135492A1 CN 2021140569 W CN2021140569 W CN 2021140569W WO 2022135492 A1 WO2022135492 A1 WO 2022135492A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice assistant
state
voice
client
control method
Prior art date
Application number
PCT/CN2021/140569
Other languages
English (en)
French (fr)
Inventor
易晖
杨如栋
鲍鹏丽
赵耀
翁志伟
Original Assignee
广州橙行智动汽车科技有限公司
广州小鹏汽车科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州橙行智动汽车科技有限公司, 广州小鹏汽车科技有限公司 filed Critical 广州橙行智动汽车科技有限公司
Publication of WO2022135492A1 publication Critical patent/WO2022135492A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/227Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology

Definitions

  • the present application relates to the field of voice technology, and in particular, to a control method, a client, a vehicle, a voice system and a storage medium.
  • the voice interaction needs to activate the voice assistant with a fixed wake-up word every time, and then the dialogue can be completed. After the dialogue ends, the voice assistant will automatically exit. Such a wake-up interaction is extremely inconvenient to use.
  • Embodiments of the present application provide a control method, a client, a vehicle, a voice system, and a storage medium.
  • the control method of the embodiment of the present application is used to control a client, where the client includes a voice assistant, and the control method includes: controlling the activation of the voice assistant to make the voice assistant enter a listening state, and in the listening state, all The voice assistant can obtain the voice signal to directly determine whether there is a control instruction according to the voice signal; when the control instruction exists, control the voice assistant to enter the execution state, and in the execution state the voice assistant can The control instruction performs corresponding control on the client and restores the listening state after the control ends.
  • the client includes a display screen
  • the display screen is used to display the image of the voice assistant
  • the control method includes: when the voice assistant enters the execution state, controlling the voice assistant
  • the display screen displays the card information corresponding to the control instruction and the first preset action or first preset expression of the avatar.
  • the client includes a display screen
  • the display screen is used to display the image of the voice assistant
  • the control method includes: when the voice assistant enters the listening state, controlling the voice assistant
  • the display screen displays the second preset action or the second preset expression of the avatar.
  • control instruction includes opening a preset application
  • control method includes: when detecting a closing signal of the preset application, controlling the voice assistant to change from the execution state to the listening state.
  • the client further includes a control button
  • the control method includes: controlling the voice assistant from the execution state to the listening state according to trigger information of the control button.
  • control method includes: controlling the voice assistant to activate when the voice assistant receives a voice command to turn on, so that the voice assistant enters the listening state;
  • the voice assistant is controlled to be turned off when a voice command occurs.
  • the control method includes: creating a listening state node when the voice assistant is activated and pushing the listening state node into a dialogue state stack; In the execution state, create an execution state node and push the execution state node into the dialogue state stack; when the voice assistant maintains the execution state, refresh the execution state node; When the execution state becomes the listening state, the execution state node is popped off the dialog state stack; when the voice assistant is closed, the listening state node is popped off the dialog state stack.
  • the listening state node includes state information and dialog information
  • the executing state node includes state information, dialog information, and execution information.
  • the client communicates with a server
  • the control method includes: sending the voice signal to the server, and the server is configured to determine whether there is the voice signal according to the voice signal Control instructions and feedback results to the client.
  • control method includes: controlling the client and the server to synchronize a dialog state of the voice assistant, where the dialog state includes the listening state and the executing state.
  • the client in the embodiment of the present application includes a voice assistant and a processor, and the processor is configured to: control the activation of the voice assistant to make the voice assistant enter a listening state, and in the listening state, the voice assistant can acquire voice signals To directly determine whether there is a control command according to the voice signal; when there is the control command, control the voice assistant to enter the execution state, in which the voice assistant can control the customer according to the control command.
  • the terminal performs corresponding control and restores the listening state after the control ends.
  • a vehicle according to an embodiment of the present application includes a vehicle body and a client terminal of any of the above-mentioned embodiments, and the client terminal is provided on the vehicle body.
  • the voice system of the embodiments of the present application includes a server and a client of any of the above embodiments, and the server communicates with the client.
  • the computer-readable storage medium of the embodiments of the present application stores a computer program thereon, and when the computer program is executed by the processor, implements the control method of any of the above-mentioned embodiments.
  • the control method, client, vehicle, voice system, and storage medium of the embodiments of the present application enter the listening state after being activated by controlling the voice assistant.
  • the voice assistant can obtain the voice signal to directly determine whether there is a control command according to the voice signal. It does not need to be reactivated again, and can realize the function of activating continuous dialogue at one time, making the voice assistant more convenient to use.
  • the voice assistant can control the client according to the control instructions. In this way, the voice assistant is managed through the two-level dialogue state (ie, the listening state and the execution state), which is convenient for the voice assistant to perform different tasks.
  • FIG. 1 is a schematic flowchart of a control method according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a module of a client according to an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a control method according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a dialog state according to an embodiment of the present application.
  • 5 to 8 are schematic flowcharts of a control method according to an embodiment of the present application.
  • 9 to 13 are schematic diagrams of a dialog state according to an embodiment of the present application.
  • FIG. 14 is a schematic diagram of a vehicle according to an embodiment of the present application.
  • FIG. 15 is a schematic diagram of a speech system according to an embodiment of the present application.
  • FIG. 16 is a schematic diagram of connection between a processor and a computer-readable storage medium according to an embodiment of the present application.
  • first and second are only used for description purposes, and cannot be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, features defined as “first”, “second” may expressly or implicitly include one or more of said features.
  • “plurality” means two or more, unless otherwise expressly and specifically defined.
  • control method of the embodiment of the present application is used to control the client 100, the client 100 includes the voice assistant 10, and the control method includes:
  • Step 01 control the activation of the voice assistant 10 so that the voice assistant 10 enters a listening state, and in the listening state, the voice assistant 10 can obtain a voice signal to directly determine whether there is a control command according to the voice signal;
  • Step 02 When there is a control command, control the voice assistant 10 to enter the execution state, and in the execution state, the voice assistant 10 can control the client 100 correspondingly according to the control command and resume the listening state after the control ends.
  • the present application also discloses a client 100 .
  • the control method of the embodiment of the present application may be implemented by the client 100 of the embodiment of the present application, and the client 100 includes a voice assistant 10 and a processor 20 .
  • step 01 and step 02 can be implemented by the processor 20, that is to say, the processor 20 is used to: control the activation of the voice assistant 10 to make the voice assistant 10 enter the listening state, and the voice assistant 10 can obtain the voice signal in the listening state To directly determine whether there is a control command according to the voice signal; when there is a control command, control the voice assistant 10 to enter the execution state, and in the execution state, the voice assistant 10 can control the client 100 accordingly according to the control command and resume after the control is over. listening state.
  • the above control method and the client 100 can enter the listening state by controlling the voice assistant 10 to be activated.
  • the voice assistant 10 can obtain a voice signal to directly determine whether there is a control command according to the voice signal.
  • Implementing the function of activating continuous conversations at one time makes the voice assistant 10 more convenient to use.
  • the voice assistant 10 can control the client 100 correspondingly according to the control instructions. In this way, the voice assistant 10 is managed through the two-level dialogue state (that is, the listening state and the execution state), which is convenient for the voice assistant 10 to perform different tasks. work.
  • the voice interaction in the related art requires the voice assistant 10 to be activated with a fixed wake-up word every time, and then the dialogue can be completed. After the dialogue ends, the voice assistant 10 will automatically exit. Such a wake-up interaction is extremely inconvenient to use.
  • the control method and the client 100 of the present embodiment can realize the function of activating the continuous dialogue at one time without reactivation, which makes the voice assistant 10 more convenient to use.
  • the control method and the client 100 of the present embodiment manage the voice assistant 10 through two-level dialogue states (ie, listening state and execution state), so as to facilitate the voice assistant 10 to perform different tasks.
  • the embodiment of the present application includes two levels of dialogue states (ie, listening state and execution state), the listening state includes dialogue information, and the execution state includes execution information.
  • the listening state has convenience, and the executive state has strong perception.
  • the voice assistant 10 After the control voice assistant 10 is activated, the voice assistant 10 enters a listening state, and in the listening state, the voice assistant 10 can acquire a voice signal to directly determine whether there is a control command according to the voice signal.
  • the control command corresponding to the voice signal can be set before leaving the factory, or can be set by the user, which is not limited here.
  • the voice assistant 10 is controlled to enter the execution state, and in the execution state, the voice assistant 10 can control the client 100 correspondingly according to the control command and resume the listening state after the control is over.
  • the voice assistant 10 is managed through the two-level dialogue state, so that the voice assistant 10 performs corresponding control in the execution state and resumes the listening state after the control ends. In this way, the voice assistant 10 exits the execution state, but remains in the listening state, which better takes into account the strong perception and convenience of voice interaction, and also facilitates the voice assistant 10 to perform different tasks.
  • the client 100 includes a display screen 30, and the display screen 30 is used to display the image of the voice assistant 10, and the control method includes:
  • Step 03 When the voice assistant 10 enters the execution state, the display screen 30 is controlled to display the card information and the first preset action or the first preset expression of the image corresponding to the control instruction.
  • step 03 can be implemented by the processor 20, that is to say, the processor 20 is used for: when the voice assistant 10 enters the execution state, the display screen 30 is controlled to display the card information and the first preset action of the image corresponding to the control instruction or the first preset expression.
  • the client 100 includes a display screen 30, and the display screen 30 is used to display the image of the voice assistant 10.
  • the image of the voice assistant 10 is the first The preset action or the first preset expression.
  • the image of the voice assistant 10 may be the image of a virtual robot, and the first preset action of the image of the voice assistant 10 may be the avatar of the virtual robot becoming larger and its eyes open. action, the virtual robot can be located in the middle of the display screen 30 at this time.
  • the first preset expression of the image of the voice assistant 10 may be a large blinking expression.
  • the control display screen 30 displays the card information corresponding to the control instruction, and the card information may be the content corresponding to the control instruction.
  • the control instruction displayed on the display screen 30 may be "navigate to Zhongguancun", and the card information corresponding to the control instruction displayed on the display screen 30 may be several routes to Zhongguancun.
  • the virtual robot may be displayed above the card information.
  • the client 100 includes a display screen 30, and the display screen 30 is used to display the image of the voice assistant 10, and the control method includes:
  • Step 04 When the voice assistant 10 enters the listening state, control the display screen 30 to display the second preset action or the second preset expression of the avatar.
  • step 03 can be implemented by the processor 20, that is, the processor 20 is configured to: control the display screen 30 to display the second preset action or the second preset expression of the image when the voice assistant 10 enters the listening state.
  • the client 100 includes a display screen 30, and the display screen 30 is used to display the image of the voice assistant 10.
  • the image of the voice assistant 10 is the second preset action or the first Two preset expressions.
  • the image of the voice assistant 10 may be the image of a virtual robot
  • the second preset action of the image of the voice assistant 10 may be that the avatar of the virtual robot becomes larger and the eyes show ripples action, the virtual robot can be located in the middle of the display screen 30 at this time.
  • the second preset expression of the image of the voice assistant 10 may be a slightly blinking expression.
  • the display screen 30 can display a small text box, and the content of the text box can be voice information, for example: "what are you doing" and other voice information.
  • the voice assistant 10 When the voice assistant 10 is not awakened, the image of the voice assistant 10 can be displayed in the upper left corner of the display screen 30, and the image of the voice assistant 10 is small in size. After the voice assistant 10 is activated, the voice assistant 10 can be displayed in the center of the display screen 30, and the volume of the image of the voice assistant 10 becomes larger.
  • the voice assistant 10 receives the voice, the voice assistant 10 may enter the listening state.
  • the voice assistant 10 executes the corresponding control instruction, the voice assistant 10 may enter the execution state. After the voice assistant 10 is exited, the voice assistant 10 can be turned into an unawakened state again.
  • the server 200 when the server 200 is in the listening state, it can receive the voice signal sent by the client 100 to realize dialogue pickup, and then perform Natural Language Understanding (NLU) on the received voice signal, and at this time, the voice can be recognized. Whether the dialogue represented by the signal is a meaningless dialogue or a meaningful dialogue, the voice signal is rejected when the dialogue is a meaningless dialogue, and the control command corresponding to the voice signal is prepared to be executed when the dialogue is a meaningful dialogue.
  • the client 100 and the server 200 enter the execution state from the listening state.
  • the server 200 determines whether the dialogue belongs to a multi-round dialogue or a single-round dialogue.
  • multi-round dialogues are functions that can be achieved by single-round dialogue information (such as adjusting the brightness of the screen to 100%), etc.
  • Enter script mode during multi-round dialogues (such as when navigating The user speaks a certain location, and the next step is to confirm the location), and directly send the instruction to the client 100 in a single round of dialogue.
  • control instruction includes opening a preset application
  • control method includes:
  • Step 05 When detecting the closing signal of the preset application, control the voice assistant 10 to change from the execution state to the listening state.
  • step 05 may be implemented by the processor 20, that is to say, the processor 20 is configured to: control the voice assistant 10 to change from the execution state to the listening state when the closing signal of the preset application is detected.
  • the preset application may be a navigation, a music player, a search engine, etc.
  • the voice assistant 10 is controlled to be activated to make the voice assistant 10 enter the listening state, and the voice assistant 10 obtains the “I want to "Listen to Jay Chou's Sunny Day” and the voice signal of "I want to listen to Jay Chou's Sunny Day” confirm that there is a control command, so that the music player can be turned on to play Jay Chou's "Sunny Day”.
  • the preset application that is, the music player
  • the voice assistant 10 can be controlled to change from the execution state to the listening state when the closing signal of the preset application can be detected.
  • the voice assistant 10 can obtain the voice signal again to directly determine whether there is a control command according to the voice signal. At this time, no reactivation is required, and the function of activating the continuous dialogue at one time can be realized, making the voice assistant 10 more convenient to use.
  • the client 100 further includes control buttons, and the control method includes:
  • Step 06 Control the voice assistant 10 from the execution state to the listening state according to the trigger information of the control button.
  • step 06 may be implemented by the processor 20, that is to say, the processor 20 is configured to: control the voice assistant 10 from the execution state to the listening state according to the trigger information of the control button.
  • control keys may be virtual keys provided on the display screen 30 (for example, the display screen 30 is a touch screen, and the virtual keys are icons displayed on the touch screen), or may be separate physical keys.
  • the client 100 can change the voice assistant 10 from the execution state to the listening state by touching the virtual key; the client 100 can also press the physical key to change the voice assistant 10 from the executive state to the listening state, so that the current state can be quickly exited. execution state.
  • the control buttons can facilitate the management of the voice assistant and quickly switch the two-level dialogue state of the voice assistant.
  • control method includes:
  • Step 071 control the activation of the voice assistant 10 when the voice assistant 10 receives the voice command to enable the voice assistant 10 to enter the listening state;
  • Step 072 Control the voice assistant 10 to close when the voice assistant 10 receives the close voice command.
  • the control method of the embodiment of the present application may be implemented by the client 100 of the embodiment of the present application, and the client 100 includes a voice assistant 10 and a processor 20 .
  • both step 071 and step 072 can be realized by the processor 20, that is to say, the processor 20 is used to: control the activation of the voice assistant 10 when the voice assistant 10 receives the voice command to enable the voice assistant 10 to enter the listening state;
  • the voice assistant 10 is controlled to be closed when the voice assistant 10 receives the closing voice command.
  • the voice assistant 10 when the voice assistant 10 receives a voice command to turn on the voice assistant 10 is controlled to be activated so that the voice assistant 10 enters the listening state.
  • the voice command to turn on can be set before leaving the factory or set by the user, which is not a limitation here.
  • the starting voice command may be "hello, little P"
  • the voice assistant 10 when the voice assistant 10 receives "hello, little P", the voice assistant 10 is controlled to be activated to make the voice assistant 10 enter the listening state.
  • the voice command to turn off can be set before the factory or set by the user, which is not a limitation here.
  • the closing voice command may be "exit", and when the voice assistant 10 receives "exit", the voice assistant 10 is controlled to close.
  • control method includes:
  • Step 081 Create a listening state node when the voice assistant 10 is activated and push the listening state node into the dialog state stack;
  • Step 082 when the voice assistant 10 changes from the listening state to the executing state, create an executing state node and push the executing state node into the dialogue state stack;
  • Step 083 When the voice assistant 10 maintains the execution state, refresh the execution state node;
  • Step 084 when the voice assistant 10 changes from the execution state to the listening state, pops the execution state node out of the dialogue state stack;
  • Step 085 When the voice assistant 10 is turned off, pop the listening state node from the dialog state stack.
  • step 081, step 082, step 083, step 084 and step 085 can all be implemented by the processor 20, that is to say, the processor 20 is used to: create a listening state node when the voice assistant 10 is activated and connect the listening state node to the Push into the dialogue state stack; when the voice assistant 10 changes from the listening state to the execution state, create an execution state node and push the execution state node into the dialogue state stack; when the voice assistant 10 maintains the execution state, refresh the execution state node; When the assistant 10 changes from the execution state to the listening state, the execution state node is popped out of the dialogue state stack; when the voice assistant 10 is closed, the listening state node is popped out of the dialogue state stack.
  • the voice assistant 10 can switch from the listening state to the executing state.
  • an executing state node can be created and the executing state node can be pushed into the dialogue state stack.
  • the execution state node can be updated, and the execution state node in the dialogue state stack is refreshed at this time.
  • the voice assistant 10 can change from the execution state to the listening state, and the execution state node can be popped from the dialog state stack.
  • the listening state node includes state information and dialog information
  • the executing state node includes state information, dialog information and execution information.
  • the state information of the listening node may be a waiting state.
  • the dialogue information of the listening node includes text information, response information and category information.
  • the text information can be a voice signal "Hahaha”. Since there is no control command in "Hahaha", the response information is empty and the category information is rejected.
  • the state information of the execution state node may be the execution state.
  • the dialog information of the executing state node includes text information, response information and category information.
  • the text information can be a voice signal "navigate to Peking University", the response information can be a list of points of interest (POI), and the category information is a selection state.
  • the execution information of the execution state node includes the dialogue round, the script name and the execution interface. Dialogue rounds can be the first round, the second round, the third round, etc.
  • the script name is a point of interest (Point of Interest, POI) selection, and the script name can be the common dialogue information set before the factory.
  • the execution interface can be card information. It is worth mentioning that the card information can be the content of the point of interest selection, and the selection of the point of interest can include: the south gate of Peking University, the east gate of Peking University, the parking lot of Peking University, the bus station of Peking University, etc.
  • the client 100 communicates with the server 200, and the control method includes:
  • the voice signal is sent to the server 200 , and the server 200 is used to determine whether there is a control command according to the voice signal and feed back the result to the client 100 .
  • the control method of the embodiment of the present application may be implemented by the client terminal 100 of the embodiment of the present application.
  • the client terminal 100 includes a voice assistant 10 and a processor 20 , and the client terminal 100 communicates with the server terminal 200 .
  • the above steps can be implemented by the processor 20, that is to say, the processor 20 is used to: send the voice signal to the server 200, and the server 200 is used to determine whether there is a control command according to the voice signal and feed back the result to the client end 100.
  • the client 100 communicates with the server 200 , and can send a voice signal to the server 200 .
  • the server 200 determines whether there is a control command according to the voice signal and feeds back the result to the client 100 .
  • the client 100 is configured to receive the voice signal and send the voice signal to the server 200 .
  • the server 200 can be used to determine whether there is a control command, and the result can be fed back to the client 100 .
  • control method includes:
  • the control client 100 and the server 200 synchronize the dialog state of the voice assistant 10, and the dialog state includes a listening state and an executing state.
  • the control method of the embodiment of the present application may be implemented by the client terminal 100 of the embodiment of the present application.
  • the client terminal 100 includes a voice assistant 10 and a processor 20 , and the client terminal 100 communicates with the server terminal 200 .
  • the above steps can be implemented by the processor 20, that is, the processor 20 is used to: control the client 100 and the server 200 to synchronize the dialogue state of the voice assistant 10, and the dialogue state includes a listening state and an executing state.
  • the client 100 and the server 200 synchronize the dialogue state of the voice assistant 10, and the client 100 and the server 200 always maintain a high degree of state consistency.
  • the two-level dialogue state can be managed in an all-voice manner.
  • the user can use the voice command "Hello, Little P" to wake up the voice assistant 10, and control the voice assistant 10 to activate when the voice assistant 10 receives "Hello, Little P” to make the voice assistant 10 enter the listening state.
  • the user can say a meaningful voice of "navigate to Zhongguancun".
  • the voice assistant 10 enters the execution state, and the voice assistant 10 can list the points of interest to select and ask the user, "The following results have been found for you, which one do you want to go to? ?”
  • the user can voice answer "first", and the user can also use the close voice command "exit" to exit the voice assistant 10.
  • the two-level dialogue state can be managed by a combination of voice and keys.
  • the user can use the voice command "Hello, Little P" to wake up the voice assistant 10, and control the voice assistant 10 to activate when the voice assistant 10 receives "Hello, Little P", so that the voice assistant 10 enters the listening state.
  • the user can say a meaningful voice of "navigate to Zhongguancun".
  • the voice assistant 10 enters the execution state, and the voice assistant 10 can list the points of interest to select and ask the user, "The following results have been found for you, which one do you want to go to? ?”, the user can manually close the navigation application as required.
  • the user can also manually turn off the voice assistant 10 by pressing a button.
  • the present application discloses a vehicle 300 .
  • the vehicle 300 includes a vehicle body 301 and a client 100 according to any one of the above embodiments.
  • the client 100 is disposed on the vehicle body 301 .
  • the client 100 of the vehicle 300 can control the voice assistant 10 to enter the listening state after being activated.
  • the voice assistant 10 can obtain the voice signal to directly determine whether there is a control command according to the voice signal. After reactivation, the function of activating the continuous dialogue at one time can be realized, which makes the voice assistant 10 more convenient to use.
  • the voice assistant 10 can control the client 100 correspondingly according to the control instructions. In this way, the voice assistant 10 is managed through the two-level dialogue state (that is, the listening state and the execution state), which is convenient for the voice assistant 10 to perform different tasks. work.
  • the vehicle 300 can be connected to the client 100 through wireless communication (eg, WIFI, mobile communication network, etc.).
  • the vehicle 300 includes, but is not limited to, a pure electric vehicle, a hybrid electric vehicle, an extended-range electric vehicle, a fuel vehicle, and the like.
  • the voice system 500 includes a server 200 and a client 100 according to any one of the above embodiments, and the server 200 communicates with the client 100 .
  • the voice system 500 of the embodiment of the present application can control the voice assistant 10 to enter the listening state after being activated.
  • the voice assistant 10 can obtain the voice signal to directly determine whether there is a control command according to the voice signal, and there is no need to re-enter the listening state. Activation can realize the function of activating continuous dialogue at one time, which makes the voice assistant 10 more convenient to use.
  • the voice assistant 10 in the execution state, can control the client 100 correspondingly according to the control instructions. In this way, the voice assistant 10 is managed through the two-level dialogue state (that is, the listening state and the execution state), which is convenient for the voice assistant 10 to perform different tasks. work.
  • an embodiment of the present application further provides a computer-readable storage medium 1000 on which a computer program is stored.
  • the processor 20 is made to execute the control method of any of the above-mentioned embodiments. A step of.
  • Step 01 control the activation of the voice assistant 10 so that the voice assistant 10 enters a listening state, and in the listening state, the voice assistant 10 can obtain a voice signal to directly determine whether there is a control command according to the voice signal;
  • Step 02 When there is a control command, control the voice assistant 10 to enter the execution state, and in the execution state, the voice assistant 10 can control the client 100 correspondingly according to the control command and resume the listening state after the control ends.
  • the computer-readable storage medium 1000 of the embodiment of the present application can control the voice assistant 10 to enter the listening state after being activated.
  • the voice assistant 10 can obtain the voice signal to directly determine whether there is a control command according to the voice signal. It needs to be reactivated again, and the function of activating the continuous dialogue at one time can be realized, which makes the voice assistant 10 more convenient to use.
  • the voice assistant 10 can control the client 100 correspondingly according to the control instructions. In this way, the voice assistant 10 is managed through the two-level dialogue state (that is, the listening state and the execution state), which is convenient for the voice assistant 10 to perform different tasks. work.
  • a computer program includes computer program code.
  • the computer program code may be in source code form, object code form, an executable file or some intermediate form, or the like.
  • Computer-readable storage media may include: any entity or device capable of carrying computer program codes, recording media, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random storage Access memory (RAM, Random Access Memory), and software distribution media, etc.
  • a processor may refer to a processor contained in a controller.
  • the processor can be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable processor Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • CPU Central Processing Unit
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • any description of a process or method in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing a specified logical function or step of the process , and the scope of the preferred embodiments of the present application includes alternative implementations in which the functions may be performed out of the order shown or discussed, including performing the functions substantially concurrently or in the reverse order depending upon the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present application belong.
  • a "computer-readable medium” can be any device that can contain, store, communicate, propagate, or transport the program for use by or in conjunction with an instruction execution system, apparatus, or apparatus.
  • computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM).
  • the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.
  • each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.
  • the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like.

Abstract

一种控制方法、客户端(100)、车辆(300)、语音系统(500)和存储介质(1000)。控制方法用于控制客户端(100),客户端(100)包括语音助手(10),控制方法包括:控制语音助手(10)激活使语音助手(10)进入倾听态,在倾听态下语音助手(10)获取语音信号,确定是否存在控制指令(01);存在控制指令时,控制语音助手(10)进入执行态,在执行态下语音助手(10)根据控制指令对客户端(100)控制并在控制结束后恢复倾听态(02)。

Description

控制方法、客户端、车辆、语音系统和存储介质
优先权信息
本申请请求2020年12月25日向中国国家知识产权局提交的、专利申请号为202011562171.5的专利申请的优先权和权益,并且通过参照将其全文并入此处。
技术领域
本申请涉及语音技术领域,特别涉及一种控制方法、客户端、车辆、语音系统和存储介质。
背景技术
在相关技术中,语音交互需要每次都用固定的唤醒词激活语音助手,然后才能完成对话,在对话结束后语音助手会自动退出,这样的唤醒式交互使用起来极不便利。
发明内容
本申请的实施方式提供一种控制方法、客户端、车辆、语音系统和存储介质。
本申请实施方式的控制方法用于控制客户端,所述客户端包括语音助手,所述控制方法包括:控制所述语音助手激活以使所述语音助手进入倾听态,在所述倾听态下所述语音助手能够获取语音信号以直接根据所述语音信号确定是否存在控制指令;在存在所述控制指令时,控制所述语音助手进入执行态,在所述执行态下所述语音助手能够根据所述控制指令对所述客户端进行相应的控制并在控制结束后恢复所述倾听态。
在某些实施方式中,所述客户端包括显示屏,所述显示屏用于显示所述语音助手的形象,所述控制方法包括:在所述语音助手进入所述执行态时,控制所述显示屏显示所述控制指令对应的卡片信息和所述形象的第一预设动作或第一预设表情。
在某些实施方式中,所述客户端包括显示屏,所述显示屏用于显示所述语音助手的形象,所述控制方法包括:在所述语音助手进入所述倾听态时,控制所述显示屏显示所述形象的第二预设动作或第二预设表情。
在某些实施方式中,所述控制指令包括打开预设应用,所述控制方法包括:在检测到所述预设应用的关闭信号时,控制所述语音助手由所述执行态变成所述倾听态。
在某些实施方式中,所述客户端还包括控制按键,所述控制方法包括:根据所述控制按键的触发信息控制所述语音助手由所述执行态变成所述倾听态。
在某些实施方式中,所述控制方法包括:在所述语音助手接收到开启语音指令时控制所述语音助手激活以使所述语音助手进入所述倾听态;在所述语音助手接收到关闭语音指令时控制所述语音助手关闭。
在某些实施方式中,所述控制方法包括:在所述语音助手激活时创建倾听态节点并将所述倾听态节点压入对话状态栈;在所述语音助手由所述倾听态变成所述执行态时,创建执行态节点并将所述执行态节点压入所述对话状态栈;在所述语音助手维持所述执行态时,刷新所述执行态节点;在所述语音助手由所述执行态变成所述倾听态时,将所述执行态节点弹出所述对话状态栈;在所述语音助手关闭时,将所述倾听态节点弹出所述对话状态栈。
在某些实施方式中,所述倾听态节点包括状态信息和对话信息,所述执行态节点包括状态信息、对话信息和执行信息。
在某些实施方式中,所述客户端与服务端通信,所述控制方法包括:将所述语音信号发送至所述服务端,所述服务端用于根据所述语音信号确定是否存在所述控制指令并将结果反馈至所述客户端。
在某些实施方式中,所述控制方法包括:控制所述客户端与所述服务端同步所述语音助手的对话状态,所述对话状态包括所述倾听态和所述执行态。
本申请实施方式的客户端包括语音助手和处理器,所述处理器用于:控制所述语音助手激活以使所述语音助手进入倾听态,在所述倾听态下所述语音助手能够获取语音信号以直接根据所述语音信号确定是否存在控制指令;在存在所述控制指令时,控制所述语音助手进入执行态,在所述执行态下所述语音助手能够根据所述控制指令对所述客户端进行相应的控制并在控制结束后恢复所述倾听态。
本申请实施方式的车辆包括车辆本体和上述任一实施方式的客户端,所述客户端设置在所述车辆本体上。
本申请实施方式的语音系统包括服务端和上述任一实施方式的客户端,所述服务端与所述客户端通信。
本申请实施方式的计算机可读存储介质,其上存储有计算机程序,所述计算机程序在被处理器执行时,实现上述任一实施方式的控制方法。
本申请实施方式的控制方法、客户端、车辆、语音系统和存储介质通过控制语音助手激活后进入倾听态,在倾听态下语音助手能够获取语音信号以直接根据语音信号确定是否存在控制指令,此时不需要再重新激活,能够实现一次激活连续对话的功能,使得语音助手使用起来更加方便。另外,在执行态下语音助手能够根据控制指令对客户端进行相应的控制,如此,通过两级对话状态(即倾听态和执行态)对语音助手进行管理,便于语音助手进行不同的工作。
本申请的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。
附图说明
本申请的上述和/或附加的方面和优点从结合下面附图对实施方式的描述中将变得明显和容易理解,其中:
图1是本申请实施方式的控制方法的流程示意图;
图2是本申请实施方式的客户端的模块示意图;
图3是本申请实施方式的控制方法的流程示意图;
图4是本申请实施方式的对话状态的示意图;
图5至图8是本申请实施方式的控制方法的流程示意图;
图9至图13是本申请实施方式的对话状态的示意图;
图14是本申请实施方式的车辆的示意图;
图15是本申请实施方式的语音系统的示意图;
图16是本申请实施方式的处理器和计算机可读存储介质的连接示意图。
具体实施方式
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。
在本申请的实施方式的描述中,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个所述特征。在本申请的实施方式的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。
请参阅图1,本申请实施方式的控制方法用于控制客户端100,客户端100包括语音助手10,控制方法包括:
步骤01:控制语音助手10激活以使语音助手10进入倾听态,在倾听态下语音助手10能够获取语音信号以直接根据语音信号确定是否存在控制指令;
步骤02:在存在控制指令时,控制语音助手10进入执行态,在执行态下语音助手10能够根据控制指令对客户端100进行相应的控制并在控制结束后恢复倾听态。
请参阅图2,本申请还公开一种客户端100。具体地,本申请实施方式的控制方法可以由本申请实施方式的客户端100实现,客户端100包括语音助手10和处理器20。其中,步骤01和步骤02可以由处理器20实现,也即是说,处理器20用于:控制语音助手10激活以使语音助手10进入倾听态,在倾听态下语音助手10能够获取语音信号以直接根据语音信号确定是否存在控制指令;在存在控制指令时,控制语音助手10进入执行态,在执行态下语音助手10能够根据控制指令对客户端100进行相应的控制并在控制结束后恢复倾听态。
上述控制方法和客户端100能够通过控制语音助手10激活后进入倾听态,在倾听态下语音助手10能够获取语音信号以直接根据语音信号确定是否存在控制指令,此时不需要再重新激活,能够实现一次 激活连续对话的功能,使得语音助手10使用起来更加方便。另外,在执行态下语音助手10能够根据控制指令对客户端100进行相应的控制,如此,通过两级对话状态(即倾听态和执行态)对语音助手10进行管理,便于语音助手10进行不同的工作。
相关技术中的语音交互需要每次都用固定的唤醒词激活语音助手10,然后才能完成对话,在对话结束后语音助手10会自动退出,这样的唤醒式交互使用起来极不便利。而本实施方式的控制方法和客户端100不需要再重新激活,就能够实现一次激活连续对话的功能,使得语音助手10使用起来更加方便。本实施方式的控制方法和客户端100通过两级对话状态(即倾听态和执行态)对语音助手10进行管理,便于语音助手10进行不同的工作。
具体地,本申请实施方式包括两级对话状态(即倾听态和执行态),倾听态包括对话信息,执行态包括执行信息。倾听态具有便利性,执行态具有强感知。控制语音助手10激活后语音助手10进入倾听态,在倾听态下语音助手10能够获取语音信号以直接根据语音信号确定是否存在控制指令。语音信号对应的控制指令可以是在出厂前设置的,也可以是用户自定义设置的,此处不作为限制。在存在控制指令时,控制语音助手10进入执行态,在执行态下语音助手10能够根据控制指令对客户端100进行相应的控制并在控制结束后恢复倾听态。如此,通过两级对话状态对语音助手10进行管理,使得语音助手10在执行态下进行相应的控制并在控制结束后恢复倾听态。如此,语音助手10退出执行态,但保持倾听态,较好的兼顾了语音交互的强感知性和便利性,同时也便于语音助手10进行不同的工作。
请参阅图3,在某些实施方式中,客户端100包括显示屏30,显示屏30用于显示语音助手10的形象,控制方法包括:
步骤03:在语音助手10进入执行态时,控制显示屏30显示控制指令对应的卡片信息和形象的第一预设动作或第一预设表情。
本申请实施方式的控制方法可以由本申请实施方式的客户端100实现,客户端100还包括显示屏30。其中,步骤03可以由处理器20实现,也即是说,处理器20用于:在语音助手10进入执行态时,控制显示屏30显示控制指令对应的卡片信息和形象的第一预设动作或第一预设表情。
具体地,请一并参阅图2和图4,客户端100包括显示屏30,显示屏30用于显示语音助手10的形象,在语音助手10进入执行态时,语音助手10的形象为第一预设动作或第一预设表情。在一个例子中,在语音助手10进入执行态时,语音助手10的形象可以是一个虚拟机器人的形象,语音助手10的形象的第一预设动作可以是虚拟机器人的头像变大、眼睛睁开的动作,此时虚拟机器人可以位于显示屏30中间。语音助手10的形象的第一预设表情可以是大幅度闪烁的表情。在语音助手10进入执行态时,控制显示屏30显示控制指令对应的卡片信息,卡片信息可以是控制指令对应的内容。在一个例子中,控制显示屏30显示控制指令可以是“导航去中关村”,显示屏30显示控制指令对应的卡片信息可以是去中关村的几条路线。在某些实施方式中,在语音助手10进入执行态时,虚拟机器人可以位于卡片信息上方显示。
请再次参阅图3,在某些实施方式中,客户端100包括显示屏30,显示屏30用于显示语音助手10 的形象,控制方法包括:
步骤04:在语音助手10进入倾听态时,控制显示屏30显示形象的第二预设动作或第二预设表情。
本申请实施方式的控制方法可以由本申请实施方式的客户端100实现,客户端100还包括显示屏30。其中,步骤03可以由处理器20实现,也即是说,处理器20用于:在语音助手10进入倾听态时,控制显示屏30显示形象的第二预设动作或第二预设表情。
具体地,请参阅图4,客户端100包括显示屏30,显示屏30用于显示语音助手10的形象,在语音助手10进入倾听态时,语音助手10的形象为第二预设动作或第二预设表情。在一个例子中,在语音助手10进入倾听态时,语音助手10的形象可以是一个虚拟机器人的形象,语音助手10的形象的第二预设动作可以是虚拟机器人的头像变大、眼睛显示波纹的动作,此时虚拟机器人可以位于显示屏30中间。语音助手10的形象的第二预设表情可以是小幅度闪烁的表情。值得一提的是,在语音助手10进入倾听态时,显示屏30无卡片信息显示。显示屏30可以显示很小的文本框,文本框的内容可以是语音信息,例如:“你在干嘛”等语音信息。
在语音助手10处于未唤醒状态时,语音助手10的形象可以在显示屏30的左上角显示,语音助手10的形象的体积较小。在语音助手10被激活后,语音助手10可以在显示屏30的正中间显示,语音助手10的形象的体积变大。在语音助手10接收到语音时,语音助手10可以进入倾听态。在语音助手10执行相应的控制指令时,语音助手10可以进入执行态。在语音助手10退出后,语音助手10可以重新变成未唤醒状态。
其中,服务端200在处于倾听态时,能够进行接收客户端100发送的语音信号以实现对话拾音,然后对接收的语音信号进行自然语言理解(Natural Language Understanding,NLU),此时可以识别语音信号所表示的对话是无意义对话还是有意义对话,在对话为无意义对话时拒识语音信号,在对话为有意义对话时准备执行语音信号对应的控制指令。此时客户端100和服务端200由倾听态开始进入执行态,在服务端200处于执行态时,服务端200判断对话属于多轮对话还是单轮对话,其中,多轮对话是指需要多轮对话信息才能实现的功能(例如导航等),多轮对话是单轮对话信息即可实现的功能(例如把屏幕的亮度调至100%)等,在多轮对话时进入剧本模式(例如导航时用户说出某个地点,此时下一步可以进行地点确认),在单轮对话时直接将指令下发至客户端100。
请参阅图5,在某些实施方式中,控制指令包括打开预设应用,控制方法包括:
步骤05:在检测到预设应用的关闭信号时,控制语音助手10由执行态变成倾听态。
本申请实施方式的控制方法可以由本申请实施方式的客户端100实现,客户端100包括语音助手10和处理器20。其中,步骤05可以由处理器20实现,也即是说,处理器20用于:在检测到预设应用的关闭信号时,控制语音助手10由执行态变成倾听态。
具体地,预设应用可以是导航、音乐播放器、搜索引擎等,在一个例子中,控制语音助手10激活以使语音助手10进入倾听态,语音助手10在倾听态的状态下获取“我想听周杰伦的晴天”的语音信号,“我 想听周杰伦的晴天”的语音信号确定存在控制指令,如此可以打开音乐播放器播放周杰伦的《晴天》。预设应用即音乐播放器被用户关闭时,可以检测到预设应用的关闭信号时,控制语音助手10由执行态变成倾听态。在倾听态下语音助手10能够再次获取语音信号以直接根据语音信号确定是否存在控制指令,此时不需要再重新激活,能够实现一次激活连续对话的功能,使得语音助手10使用起来更加方便。
请参阅图6,在某些实施方式中,客户端100还包括控制按键,控制方法包括:
步骤06:根据控制按键的触发信息控制语音助手10由执行态变成倾听态。
本申请实施方式的控制方法可以由本申请实施方式的客户端100实现,客户端100还包括控制按键。其中,步骤06可以由处理器20实现,也即是说,处理器20用于:根据控制按键的触发信息控制语音助手10由执行态变成倾听态。
具体地,控制按键可以是设置在显示屏30上的虚拟按键(例如显示屏30为触摸屏,虚拟按键为触摸屏中显示的图标),也可以是单独设置的物理按键。客户端100可以通过轻触虚拟按键以使语音助手10由执行态变成倾听态;客户端100也可以通过按压物理按键以使语音助手10由执行态变成倾听态,如此可以快速地退出当前执行态。控制按键可以便于对语音助手进行管理,快速切换语音助手的两级对话状态。
请参阅图7,在某些实施方式中,控制方法包括:
步骤071:在语音助手10接收到开启语音指令时控制语音助手10激活以使语音助手10进入倾听态;
步骤072:在语音助手10接收到关闭语音指令时控制语音助手10关闭。
本申请实施方式的控制方法可以由本申请实施方式的客户端100实现,客户端100包括语音助手10和处理器20。其中,步骤071和步骤072均可以由处理器20实现,也即是说,处理器20用于:在语音助手10接收到开启语音指令时控制语音助手10激活以使语音助手10进入倾听态;在语音助手10接收到关闭语音指令时控制语音助手10关闭。
具体地,在语音助手10接收到开启语音指令时控制语音助手10激活以使语音助手10进入倾听态。开启语音指令可以是出厂前设置的,也可以是用户自定义设置的,此处不作为限制。在一个例子中,开启语音指令可以是“你好,小P”,在语音助手10接收到“你好,小P”时控制语音助手10激活以使语音助手10进入倾听态。关闭语音指令可以是出厂前设置的,也可以是用户自定义设置的,此处不作为限制。在一个例子中,关闭语音指令可以是“退出”,在语音助手10接收到“退出”时控制语音助手10关闭。
请参阅图8,在某些实施方式中,控制方法包括:
步骤081:在语音助手10激活时创建倾听态节点并将倾听态节点压入对话状态栈;
步骤082:在语音助手10由倾听态变成执行态时,创建执行态节点并将执行态节点压入对话状态栈;
步骤083:在语音助手10维持执行态时,刷新执行态节点;
步骤084:在语音助手10由执行态变成倾听态时,将执行态节点弹出对话状态栈;
步骤085:在语音助手10关闭时,将倾听态节点弹出对话状态栈。
申请实施方式的控制方法可以由本申请实施方式的客户端100实现,客户端100包括语音助手10和处理器20。其中,步骤081、步骤082、步骤083、步骤084和步骤085均可以由处理器20实现,也即是说,处理器20用于:在语音助手10激活时创建倾听态节点并将倾听态节点压入对话状态栈;在语音助手10由倾听态变成执行态时,创建执行态节点并将执行态节点压入对话状态栈;在语音助手10维持执行态时,刷新执行态节点;在语音助手10由执行态变成倾听态时,将执行态节点弹出对话状态栈;在语音助手10关闭时,将倾听态节点弹出对话状态栈。
请参阅图9,如此利用状态栈可以便于对两级对话状态进行管理。具体地,在有意义语音被执行时,语音助手10可以由倾听态切换为执行态,此时可以创建执行态节点并将执行态节点压入对话状态栈中。在点击卡片上的按钮以触发新的执行事件或者在执行多轮对话剧本时,可以更新执行态节点,此时对话状态栈中的执行态节点进行刷新。在卡片被关闭、当前应用被关闭或者当前对话执行结束、或者执行退出当前对话类时,此时语音助手10可以由执行态变成倾听态,此时执行态节点可以从对话状态栈弹出。
请参阅图10,在某些实施方式中,倾听态节点包括状态信息和对话信息,执行态节点包括状态信息、对话信息和执行信息。在一个例子中,倾听态节点的状态信息可以是等待状态。倾听态节点的对话信息包括文本信息、响应信息和类别信息。文本信息可以是语音信号“哈哈哈”,由于“哈哈哈”不存在控制指令,如此响应信息为空,类别信息为被拒识。执行态节点的状态信息可以是执行状态。执行态节点的对话信息包括文本信息、响应信息和类别信息。文本信息可以是语音信号“导航到北大”,响应信息可以是兴趣点(Point of Interest,POI)列表,类别信息为选择状态。执行态节点的执行信息包括对话轮次、剧本名称和执行界面。对话轮次可以是首轮、第二轮、第三轮等。剧本名称是兴趣点(Point of Interest,POI)选择,剧本名称可以是在出厂前设置好的常用对话信息。执行界面可以是卡片信息。值得一提的是,卡片信息可以是兴趣点选择的内容,兴趣点选择可以包括:北大南门、北大东门、北大停车场、北大公交站等。
请一并参阅图4和图11,在某些实施方式中,客户端100与服务端200通信,控制方法包括:
将语音信号发送至服务端200,服务端200用于根据语音信号确定是否存在控制指令并将结果反馈至客户端100。
本申请实施方式的控制方法可以由本申请实施方式的客户端100实现,客户端100包括语音助手10和处理器20,客户端100与服务端200通信。其中,上述步骤可以由处理器20实现,也即是说,处理器20用于:将语音信号发送至服务端200,服务端200用于根据语音信号确定是否存在控制指令并将结果反馈至客户端100。
请再次参阅图4,客户端100与服务端200通信,可以将语音信号发送至服务端200,服务端200用于根据语音信号确定是否存在控制指令并将结果反馈至客户端100。如此客户端100用于接收语音信号,并将语音信号发送至服务端200,可以利用服务端200确定是否存在控制指令,并将结果反馈至客户端100。
请一并参阅图4、图11、图12和图13,在某些实施方式中,控制方法包括:
控制客户端100与服务端200同步语音助手10的对话状态,对话状态包括倾听态和执行态。
本申请实施方式的控制方法可以由本申请实施方式的客户端100实现,客户端100包括语音助手10和处理器20,客户端100与服务端200通信。其中,上述步骤可以由处理器20实现,也即是说,处理器20用于:控制客户端100与服务端200同步语音助手10的对话状态,对话状态包括倾听态和执行态。
具体地,客户端100与服务端200同步语音助手10的对话状态,客户端100与服务端200始终高度保持状态的一致性。
请参阅图11,在某些实施方式中,可以使用全语音的方式对两级对话状态进行管理。例如:用户可以使用开启语音指令“你好,小P”来唤醒语音助手10,在语音助手10接收到“你好,小P”时控制语音助手10激活以使语音助手10进入倾听态。用户可以说出“导航到中关村”的有意义语音,如此,语音助手10进入执行态,语音助手10可以列出兴趣点选择的同时询问用户“已为你找到以下结果,你要去哪一个呢?”用户可以语音回答“第一个”,用户也可以使用关闭语音指令“退出”以使语音助手10退出。
请参阅图12,在某些实施方式中,可以使用语音与按键结合的方式对两级对话状态进行管理。例如:户可以使用开启语音指令“你好,小P”来唤醒语音助手10,在语音助手10接收到“你好,小P”时控制语音助手10激活以使语音助手10进入倾听态。用户可以说出“导航到中关村”的有意义语音,如此,语音助手10进入执行态,语音助手10可以列出兴趣点选择的同时询问用户“已为你找到以下结果,你要去哪一个呢?”,用户可以根据需求,手动关闭导航应用。用户也可以通过按键手动关闭语音助手10。
请参阅图14,本申请公开了一种车辆300,车辆300包括车辆本体301和上述任意一种实施方式的客户端100,客户端100设置在车辆本体301上。
如此,本申请实施方式的车辆300的客户端100可以控制语音助手10激活后进入倾听态,在倾听态下语音助手10能够获取语音信号以直接根据语音信号确定是否存在控制指令,此时不需要再重新激活,能够实现一次激活连续对话的功能,使得语音助手10使用起来更加方便。另外,在执行态下语音助手10能够根据控制指令对客户端100进行相应的控制,如此,通过两级对话状态(即倾听态和执行态)对语音助手10进行管理,便于语音助手10进行不同的工作。值得一提的是,车辆300可以通过无线通信方式(如WIFI、移动通信网络等)连接客户端100。车辆300包括但不限于纯电动车、混合动力电动车、增程式电动车、燃油车等。
请参阅图15,本申请公开了一种语音系统500,语音系统500包括服务端200和上述任意一种实施方式的客户端100,服务端200与客户端100通信。
如此,本申请实施方式的语音系统500,可以控制语音助手10激活后进入倾听态,在倾听态下语音助手10能够获取语音信号以直接根据语音信号确定是否存在控制指令,此时不需要再重新激活,能够实现一次激活连续对话的功能,使得语音助手10使用起来更加方便。另外,在执行态下语音助手10能够根据控制指令对客户端100进行相应的控制,如此,通过两级对话状态(即倾听态和执行态)对语音助 手10进行管理,便于语音助手10进行不同的工作。
请参阅图16,本申请实施方式还提供一种计算机可读存储介质1000,其上存储有计算机程序,当计算机程序被处理器20执行时,使得处理器20执行上述任一实施方式的控制方法的步骤。
例如,程序被处理器20执行的情况下,实现以下控制方法的步骤:
步骤01:控制语音助手10激活以使语音助手10进入倾听态,在倾听态下语音助手10能够获取语音信号以直接根据语音信号确定是否存在控制指令;
步骤02:在存在控制指令时,控制语音助手10进入执行态,在执行态下语音助手10能够根据控制指令对客户端100进行相应的控制并在控制结束后恢复倾听态。
如此,本申请实施方式的计算机可读存储介质1000,可以控制语音助手10激活后进入倾听态,在倾听态下语音助手10能够获取语音信号以直接根据语音信号确定是否存在控制指令,此时不需要再重新激活,能够实现一次激活连续对话的功能,使得语音助手10使用起来更加方便。另外,在执行态下语音助手10能够根据控制指令对客户端100进行相应的控制,如此,通过两级对话状态(即倾听态和执行态)对语音助手10进行管理,便于语音助手10进行不同的工作。
可以理解,计算机程序包括计算机程序代码。计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。计算机可读存储介质可以包括:能够携带计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、以及软件分发介质等。
处理器可以是指控制器包含的处理器。处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。
在本说明书的描述中,参考术语“一个实施方式”、“一些实施方式”、“示意性实施方式”、“示例”、“具体示例”或“一些示例”等的描述意指结合所述实施方式或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施方式或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基 于计算机的系统、包括处理模块的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。
应当理解,本申请的实施方式的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,程序在执行时,包括方法实施例的步骤之一或其组合。
此外,在本申请的各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
上述提到的存储介质可以是只读存储器,磁盘或光盘等。
尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施方式进行变化、修改、替换和变型。

Claims (14)

  1. 一种控制方法,用于控制客户端,其特征在于,所述客户端包括语音助手,所述控制方法包括:
    控制所述语音助手激活以使所述语音助手进入倾听态,在所述倾听态下所述语音助手能够获取语音信号以直接根据所述语音信号确定是否存在控制指令;
    在存在所述控制指令时,控制所述语音助手进入执行态,在所述执行态下所述语音助手能够根据所述控制指令对所述客户端进行相应的控制并在控制结束后恢复所述倾听态。
  2. 根据权利要求1所述的控制方法,其特征在于,所述客户端包括显示屏,所述显示屏用于显示所述语音助手的形象,所述控制方法包括:
    在所述语音助手进入所述执行态时,控制所述显示屏显示所述控制指令对应的卡片信息和所述形象的第一预设动作或第一预设表情。
  3. 根据权利要求1所述的控制方法,其特征在于,所述客户端包括显示屏,所述显示屏用于显示所述语音助手的形象,所述控制方法包括:
    在所述语音助手进入所述倾听态时,控制所述显示屏显示所述形象的第二预设动作或第二预设表情。
  4. 根据权利要求1所述的控制方法,其特征在于,所述控制指令包括打开预设应用,所述控制方法包括:
    在检测到所述预设应用的关闭信号时,控制所述语音助手由所述执行态变成所述倾听态。
  5. 根据权利要求1所述的控制方法,其特征在于,所述客户端还包括控制按键,所述控制方法包括:
    根据所述控制按键的触发信息控制所述语音助手由所述执行态变成所述倾听态。
  6. 根据权利要求1所述的控制方法,其特征在于,所述控制方法包括:
    在所述语音助手接收到开启语音指令时控制所述语音助手激活以使所述语音助手进入所述倾听态;
    在所述语音助手接收到关闭语音指令时控制所述语音助手关闭。
  7. 根据权利要求1所述的控制方法,其特征在于,所述控制方法包括:
    在所述语音助手激活时创建倾听态节点并将所述倾听态节点压入对话状态栈;
    在所述语音助手由所述倾听态变成所述执行态时,创建执行态节点并将所述执行态节点压入所述对话状态栈;
    在所述语音助手维持所述执行态时,刷新所述执行态节点;
    在所述语音助手由所述执行态变成所述倾听态时,将所述执行态节点弹出所述对话状态栈;
    在所述语音助手关闭时,将所述倾听态节点弹出所述对话状态栈。
  8. 根据权利要求7所述的控制方法,其特征在于,所述倾听态节点包括状态信息和对话信息,所述执行态节点包括状态信息、对话信息和执行信息。
  9. 根据权利要求1所述的控制方法,其特征在于,所述客户端与服务端通信,所述控制方法包括:
    将所述语音信号发送至所述服务端,所述服务端用于根据所述语音信号确定是否存在所述控制指令并将结果反馈至所述客户端。
  10. 根据权利要求9所述的控制方法,其特征在于,所述控制方法包括:
    控制所述客户端与所述服务端同步所述语音助手的对话状态,所述对话状态包括所述倾听态和所述执行态。
  11. 一种客户端,其特征在于,所述客户端包括语音助手和处理器,所述处理器用于:控制所述语音助手激活以使所述语音助手进入倾听态,在所述倾听态下所述语音助手能够获取语音信号以直接根据所述语音信号确定是否存在控制指令;在存在所述控制指令时,控制所述语音助手进入执行态,在所述执行态下所述语音助手能够根据所述控制指令对所述客户端进行相应的控制并在控制结束后恢复所述倾听态。
  12. 一种车辆,其特征在于,所述车辆包括车辆本体和权利要求11所述的客户端,所述客户端设置在所述车辆本体上。
  13. 一种语音系统,其特征在于,所述语音系统包括服务端和权利要求11所述的客户端,所述服务端与所述客户端通信。
  14. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序在被处理器执行时,实现权利要求1-10任一项所述的控制方法。
PCT/CN2021/140569 2020-12-25 2021-12-22 控制方法、客户端、车辆、语音系统和存储介质 WO2022135492A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011562171.5 2020-12-25
CN202011562171.5A CN112735411A (zh) 2020-12-25 2020-12-25 控制方法、客户端、车辆、语音系统和存储介质

Publications (1)

Publication Number Publication Date
WO2022135492A1 true WO2022135492A1 (zh) 2022-06-30

Family

ID=75616215

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/140569 WO2022135492A1 (zh) 2020-12-25 2021-12-22 控制方法、客户端、车辆、语音系统和存储介质

Country Status (2)

Country Link
CN (1) CN112735411A (zh)
WO (1) WO2022135492A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735411A (zh) * 2020-12-25 2021-04-30 广州橙行智动汽车科技有限公司 控制方法、客户端、车辆、语音系统和存储介质
CN113223527A (zh) * 2021-05-08 2021-08-06 雅迪科技集团有限公司 一种用于电动车智能仪表的语音控制方法及电动车
CN117746851A (zh) * 2022-09-15 2024-03-22 比亚迪股份有限公司 一种车载交互方法、系统、控制器和汽车

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160351206A1 (en) * 2015-05-26 2016-12-01 Speaktoit, Inc. Dialog system with automatic reactivation of speech acquiring mode
US10129720B1 (en) * 2011-12-30 2018-11-13 Genesys Telecommunications Laboratories, Inc. Conversation assistant
CN109346076A (zh) * 2018-10-25 2019-02-15 三星电子(中国)研发中心 语音交互、语音处理方法、装置和系统
CN111291151A (zh) * 2018-12-06 2020-06-16 阿里巴巴集团控股有限公司 交互方法、装置及计算机设备
CN111612482A (zh) * 2020-05-22 2020-09-01 云知声智能科技股份有限公司 对话管理方法、装置和设备
CN112735411A (zh) * 2020-12-25 2021-04-30 广州橙行智动汽车科技有限公司 控制方法、客户端、车辆、语音系统和存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060678B (zh) * 2019-04-16 2021-09-14 深圳欧博思智能科技有限公司 一种基于智能设备的虚拟角色控制方法及智能设备
CN110096191B (zh) * 2019-04-24 2021-06-29 北京百度网讯科技有限公司 一种人机对话方法、装置及电子设备
CN110767220A (zh) * 2019-10-16 2020-02-07 腾讯科技(深圳)有限公司 一种智能语音助手的交互方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10129720B1 (en) * 2011-12-30 2018-11-13 Genesys Telecommunications Laboratories, Inc. Conversation assistant
US20160351206A1 (en) * 2015-05-26 2016-12-01 Speaktoit, Inc. Dialog system with automatic reactivation of speech acquiring mode
CN109346076A (zh) * 2018-10-25 2019-02-15 三星电子(中国)研发中心 语音交互、语音处理方法、装置和系统
CN111291151A (zh) * 2018-12-06 2020-06-16 阿里巴巴集团控股有限公司 交互方法、装置及计算机设备
CN111612482A (zh) * 2020-05-22 2020-09-01 云知声智能科技股份有限公司 对话管理方法、装置和设备
CN112735411A (zh) * 2020-12-25 2021-04-30 广州橙行智动汽车科技有限公司 控制方法、客户端、车辆、语音系统和存储介质

Also Published As

Publication number Publication date
CN112735411A (zh) 2021-04-30

Similar Documents

Publication Publication Date Title
WO2022135492A1 (zh) 控制方法、客户端、车辆、语音系统和存储介质
US11676601B2 (en) Voice assistant tracking and activation
US20220383852A1 (en) Method and user device for providing context awareness service using speech recognition
US11145302B2 (en) System for processing user utterance and controlling method thereof
CN107430855B (zh) 在支持语音的电子设备中对语音转文本模型的场境敏感动态更新
US20210241775A1 (en) Hybrid speech interface device
CN107329843A (zh) 应用程序语音控制方法、装置、设备以及存储介质
CN112231021B (zh) 软件新功能的引导方法和装置
CN111177453B (zh) 控制音频播放的方法、装置、设备及计算机可读存储介质
CN109192208A (zh) 一种电器设备的控制方法、系统、装置、设备及介质
WO2022252946A1 (zh) 语音控制方法、语音控制装置、服务器和存储介质
CN110450714A (zh) 一种信息显示方法、装置、设备及存储介质
CN112017650A (zh) 电子设备的语音控制方法、装置、计算机设备和存储介质
JP2020038709A (ja) 人工知能機器における連続会話機能
WO2017166602A1 (zh) 车载终端与移动终端协同输入的控制方法及移动终端
CN110010127A (zh) 场景切换方法、装置、设备和存储介质
EP3745252B1 (en) Voice control method and apparatus of electronic device, computer device and storage medium
JP2019091006A (ja) 音声対話方法、装置、端末、サーバ及び可読記憶媒体
CN109658924B (zh) 会话消息处理方法、装置及智能设备
CN110718221A (zh) 语音技能控制方法、语音设备、客户端以及服务器
US20190163331A1 (en) Multi-Modal Dialog Broker
US20210233527A1 (en) Agent system, terminal device, and computer readable recording medium
JP6851491B2 (ja) 音声対話制御装置および音声対話制御方法
JP6944920B2 (ja) スマートインタラクティブの処理方法、装置、設備及びコンピュータ記憶媒体
CN109697980A (zh) 一种唤醒词的响应方法、装置、存储介质及智能音箱

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21909487

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21909487

Country of ref document: EP

Kind code of ref document: A1