CN115938361A

CN115938361A - Voice interaction method, device and equipment for vehicle cabin and storage medium

Info

Publication number: CN115938361A
Application number: CN202211512730.0A
Authority: CN
Inventors: 徐培来
Original assignee: Beijing Binli Information Technology Co Ltd
Current assignee: Beijing Binli Information Technology Co Ltd
Priority date: 2022-11-28
Filing date: 2022-11-28
Publication date: 2023-04-07

Abstract

A voice interaction method, apparatus, computer device and storage medium for a vehicle cabin are provided, the method comprising: detecting a first voice interaction instruction initiated by a user in a vehicle cabin; in response to detecting the first voice interaction instruction, causing the vehicle cabin to run a first session corresponding to the first voice interaction instruction; determining identity information of a user initiating the first voice interaction instruction and a topic field corresponding to the first voice interaction instruction based on the first voice interaction instruction; detecting, during operation of the first session, a second voice interaction instruction initiated by the user within the vehicle cabin; in response to the detection of the second voice interaction instruction, determining the identity information of the user who initiates the second voice interaction instruction and the topic field corresponding to the second voice interaction instruction based on the second voice interaction instruction; and based on the comparison of the topic field and the identity information of the user, the vehicle cabin is enabled to continue to run the first conversation or run the second conversation corresponding to the second voice interaction instruction.

Description

Voice interaction method, device and equipment for vehicle cabin and storage medium

Technical Field

The present disclosure relates to the field of vehicles, and in particular, to a voice interaction method, apparatus, computer device, computer readable storage medium, computer program product for a vehicle cabin, and a vehicle including the apparatus or the computer device.

Background

With the continuous development of intelligent cabins, voice interaction is receiving more and more attention as a convenient control mode. In order to improve the user experience of voice interaction in the intelligent cabin, a new execution strategy of a voice interaction instruction needs to be designed for a scene used by a single user or multiple users, so that the processing of a continuous multiple voice interaction scene is more humanized.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.

Disclosure of Invention

It would be advantageous to provide a mechanism that alleviates, mitigates or even eliminates one or more of the above-mentioned problems.

According to an aspect of the present disclosure, there is provided a voice interaction method for a vehicle cabin, the method comprising: detecting a first voice interaction instruction initiated by a user in the vehicle cabin; in response to detecting the first voice interaction instruction, causing the vehicle cabin to run a first session corresponding to the first voice interaction instruction; determining identity information of a user who initiates the first voice interaction instruction and a topic field corresponding to the first voice interaction instruction based on the first voice interaction instruction; detecting a second voice interaction instruction initiated by a user in the vehicle cabin during operation of the first session; in response to detecting the second voice interaction instruction, determining, based on the second voice interaction instruction, identity information of a user who initiates the second voice interaction instruction and a topic field corresponding to the second voice interaction instruction; and determining to enable the vehicle cabin to continue running the first session or run a second session corresponding to the second voice interaction instruction based on the comparison of the topic area corresponding to the first voice interaction instruction and the topic area corresponding to the second voice interaction instruction and the comparison of the identity information of the user who initiated the first voice interaction instruction and the identity information of the user who initiated the second voice interaction instruction, wherein the second session is different from the first session.

According to another aspect of the present disclosure, there is provided an apparatus for a vehicle cabin including at least one of a plurality of sound zones, a voiceprint identification system and a multi-modal identification system corresponding to a plurality of vehicle seats, respectively, the apparatus comprising: a first module configured to detect a first voice interaction instruction initiated by a user within the vehicle cabin; a second module configured to cause the vehicle cabin to run a first session corresponding to the first voice interaction instruction in response to detecting the first voice interaction instruction; a third module configured to determine, based on the first voice interaction instruction, identity information of a user who initiated the first voice interaction instruction and a topic field corresponding to the first voice interaction instruction; a fourth module configured to detect, during operation of the first session, a second voice interaction instruction initiated by a user within the vehicle cabin; a fifth module configured to, in response to detecting the second voice interaction instruction, determine, based on the second voice interaction instruction, identity information of a user who initiated the second voice interaction instruction and a topic area corresponding to the second voice interaction instruction; and a sixth module configured to determine to cause the vehicle cabin to continue running the first session or run a second session corresponding to the second voice interaction instruction based on a comparison of the topic area corresponding to the first voice interaction instruction and the topic area corresponding to the second voice interaction instruction and a comparison of the identity information of the user initiating the first voice interaction instruction and the identity information of the user initiating the second voice interaction instruction, wherein the second session is different from the first session.

According to yet another aspect of the present disclosure, there is provided a computer apparatus including: at least one processor; and at least one memory having a computer program stored thereon, which, when executed by the at least one processor, causes the at least one processor to perform a method according to the present disclosure.

According to yet another aspect of the present disclosure, a vehicle is provided, the vehicle comprising an apparatus or a computer device according to the present disclosure.

According to yet another aspect of the present disclosure, a computer-readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, causes the processor to carry out a method according to the present disclosure.

According to yet another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, causes the processor to carry out the method according to the present disclosure.

According to one or more embodiments of the disclosure, for the first voice interaction instruction and the second voice interaction instruction, the topic areas of the first voice interaction instruction and the second voice interaction instruction are determined and compared with the identity information of the initiating user, and whether to continue to run the first session (corresponding to the first voice interaction instruction) or run the second session (corresponding to the second voice interaction instruction) different from the first session is determined based on the comparison result, so that the execution result of the voice interaction instruction is more flexible and humanized, and therefore, the voice interaction use experience of the user is improved.

These and other aspects of the disclosure will be apparent from and elucidated with reference to the embodiments described hereinafter.

Drawings

Further details, features and advantages of the disclosure are disclosed in the following description of exemplary embodiments with reference to the drawings. The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements. In the drawings:

FIG. 1 is a schematic diagram illustrating an example system in which various methods described herein may be implemented, according to an example embodiment;

FIG. 2 is a flowchart illustrating a voice interaction method for a vehicle cabin according to an exemplary embodiment;

FIG. 3 is a diagram illustrating a preset interruption identifier and its settings in accordance with an illustrative embodiment;

FIG. 4 is a diagram illustrating a preset interrupt identifier and its setting in accordance with other exemplary embodiments;

FIG. 5 is a diagram illustrating a preset interrupt identifier and its settings in accordance with still other exemplary embodiments;

FIG. 6 is a schematic block diagram illustrating an apparatus for a vehicle cabin according to an exemplary embodiment;

FIG. 7 is a block diagram illustrating an exemplary computer device that can be applied to the exemplary embodiments.

Detailed Description

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, the timing relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, while in some cases they may refer to different instances based on the context of the description.

The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. As used herein, the term "plurality" means two or more, and the term "based on" should be interpreted as "based, at least in part, on". Further, the terms "and/or" and "\8230, at least one of which" encompasses any and all possible combinations of the listed items.

Before describing exemplary embodiments of the present disclosure, a number of terms used herein are first explained.

As used herein, the term "voice interaction command" refers to a voice command issued by a user within the vehicle cabin intended to evoke an in-vehicle system and perform a corresponding feedback.

As used herein, the term "running a session" includes an in-vehicle system performing an associated voice announcement or controlling a designated component of the vehicle to perform an associated action in response to a voice interaction command issued by a user.

As used herein, the term "semantic content" refers to processed signals that result from the processing of an original speech signal by a natural language understanding NLU, which contain content such as contextual information (e.g., whether there is continuation with other speech), conversational state (e.g., whether it is the beginning or end of a conversation), and intent (e.g., whether it includes a query for particular information).

As used herein, the term "topic area" refers to the subject matter to which the semantic content contained in the voice interaction instruction issued by the user relates. Exemplary topical areas may include "weather," "music," "navigation," and so forth.

Exemplary embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram illustrating an example system 100 in which various methods described herein may be implemented, according to an example embodiment.

Referring to FIG. 1, the system 100 includes an in-vehicle system 110, a server 120, and a network 130 communicatively coupling the in-vehicle system 110 and the server 120.

In-vehicle system 110 includes a display 114 and an Application (APP) 112 that may be displayed via display 114. The application 112 may be an application installed by default or downloaded and installed by the user 102 for the in-vehicle system 110, or an applet that is a lightweight application. In the case where the application 112 is an applet, the user 102 may run the application 112 directly on the in-vehicle system 110 without installing the application 112 by searching the application 112 in a host application (e.g., by the name of the application 112, etc.) or by scanning a graphic code (e.g., a barcode, a two-dimensional code, etc.) of the application 112, etc. In some embodiments, the in-vehicle system 110 may include one or more processors and one or more memories (not shown), and the in-vehicle system 110 is implemented as an in-vehicle computer. In some embodiments, in-vehicle system 110 may include more or fewer display screens 114 (e.g., not including display screens 114), and/or one or more speakers or other human interaction devices. In some embodiments, the in-vehicle system 110 may not be in communication with the server 120.

Server 120 may represent a single server, a cluster of multiple servers, a distributed system, or a cloud server providing an underlying cloud service (such as cloud database, cloud computing, cloud storage, cloud communications). It will be understood that although the server 120 is shown in FIG. 1 as communicating with only one in-vehicle system 110, the server 120 may provide background services for multiple in-vehicle systems simultaneously.

The network 130 allows wireless communication and information exchange between vehicles-X ("X" means vehicle, road, pedestrian, or internet, etc.) according to agreed communication protocols and data interaction standards. Examples of network 130 include a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), and/or a combination of communication networks such as the Internet. The network 130 may be a wired or wireless network. In one example, the network 130 may be an intra-vehicle network, an inter-vehicle network, and/or an in-vehicle mobile internet network.

For purposes of the disclosed embodiments, in the example of fig. 1, the application 112 may be an electronic map application that may provide various electronic map-based functions, such as navigation, route queries, location finding, and the like. Accordingly, the server 120 may be a server used with an electronic map application. The server 120 may provide online mapping services, such as online navigation, online route query, and online location finding, to the application 112 running in the in-vehicle system 110 based on the road network data. Alternatively, the server 120 may provide the road network data to the vehicle-mounted system 110, and the application 112 running in the vehicle-mounted system 110 provides the local map service according to the road network data.

Fig. 2 is a flowchart illustrating a voice interaction method 200 for a vehicle cabin according to an exemplary embodiment. The method 200 may be performed at an on-board system (e.g., the on-board system 110 shown in fig. 1), i.e., the subject of execution of the various steps of the method 200 may be the on-board system 110 shown in fig. 1. In some embodiments, method 200 may be performed at a server (e.g., server 120 shown in fig. 1). In some embodiments, method 200 may be performed by an in-vehicle system (e.g., in-vehicle system 110) in combination with a server (e.g., server 120). Hereinafter, the respective steps of the method 200 will be described by taking the execution subject as the in-vehicle system 110 as an example.

As shown in fig. 2, the method 200 includes:

step 210, detecting a first voice interaction instruction initiated by a user in a vehicle cabin;

step 220, in response to the detection of the first voice interaction instruction, enabling the vehicle cabin to run a first session corresponding to the first voice interaction instruction;

step 230, determining identity information of a user initiating the first voice interaction instruction and a topic field corresponding to the first voice interaction instruction based on the first voice interaction instruction;

step 240, during the operation of the first session, detecting a second voice interaction instruction initiated by the user in the vehicle cabin;

step 250, in response to detecting the second voice interaction instruction, determining the identity information of the user who initiates the second voice interaction instruction and the topic field corresponding to the second voice interaction instruction based on the second voice interaction instruction; and

and step 260, determining to enable the vehicle cabin to continue to run the first session or run the second session corresponding to the second voice interaction instruction based on the comparison between the topic field corresponding to the first voice interaction instruction and the topic field corresponding to the second voice interaction instruction and the comparison between the identity information of the user initiating the first voice interaction instruction and the identity information of the user initiating the second voice interaction instruction, wherein the second session is different from the first session.

The various steps of method 200 are described in detail below.

At step 210, a first voice interaction instruction initiated by a user within a vehicle cabin is detected. In some embodiments of the present disclosure, a Voice Activity Detection (VAD) module may be used to detect a first voice interaction instruction initiated by a user within a vehicle cabin, and the voice uttered by the user may be pre-processed to filter out audio data (e.g., noise, laughter, sigh, etc.) that does not contain the interaction instruction.

At step 220, in response to detecting the first voice interaction command, the vehicle cabin is operated for a first session corresponding to the first voice interaction command. In some embodiments of the present disclosure, the detected first voice interaction instruction may be analyzed using a natural language processing method, thereby causing the vehicle cabin to run a first conversation corresponding to the first voice interaction instruction. In some embodiments of the present disclosure, the in-vehicle system may voice-report the following exemplary contents, "weather may be raining today, air temperature 16-25 degrees, humidity 80% \ 8230; \8230;" according to the detected first voice interaction instruction being "weather today". In some embodiments of the present disclosure, in accordance with the detected first voice interaction instruction being "close window", the in-vehicle system may close the window, and optionally voice-report the following exemplary contents "close window for you".

At step 230, based on the first voice interaction instruction, identity information of the user who initiated the first voice interaction instruction and a topic area corresponding to the first voice interaction instruction are determined.

In some embodiments of the present disclosure, determining the topic areas to which the voice interaction instructions (including the first voice interaction instruction and the second voice interaction instruction) correspond may include: the method comprises the steps of determining semantic content of a voice interaction instruction by using natural language processing, and determining a topic field corresponding to the voice interaction instruction based on the semantic content.

In some embodiments of the present disclosure, the vehicle cabin may include a plurality of sound zones corresponding to a plurality of vehicle seats, respectively. Determining identity information of a user initiating a voice interaction instruction may include: the method includes determining a seat position of a user initiating voice interaction instructions within a vehicle cabin using a plurality of phonemes, determining identity information of the user based on the seat position of the user. For example, if a voice interaction command is detected or the signal strength of the detected voice interaction command is the largest in a sound zone corresponding to the main driving seat, it may be determined that the user initiating the voice interaction command is the main driving user; if the voice interaction instruction is detected in the sound zone corresponding to the co-driver seat or the signal intensity of the detected voice interaction instruction is maximum, the user initiating the voice interaction instruction can be determined to be the co-driver user.

In some embodiments of the present disclosure, the vehicle cabin may include a voiceprint identification system. Determining identity information of a user initiating the voice interaction instruction may further include: and determining the voiceprint information of the user initiating the voice interaction instruction by using a voiceprint identification system, and determining the identity information of the user based on the voiceprint information of the user.

In some embodiments of the present disclosure, the vehicle cabin may include a multi-modal identification system that is capable of at least face recognition and voice recognition of the user. Determining identity information of a user initiating the voice interaction instruction may further include: identity information of a user initiating a voice interaction instruction is determined using a multimodal identity recognition system. For example, a face recognition function of a multi-modal identity recognition system is adopted to acquire face images of a user(s) in a vehicle cabin so as to determine the user who is initiating a voice interaction instruction and the identity of the user, and the face recognition function is combined with the voice recognition function to improve the accuracy of user identity recognition.

At step 240, during operation of the first session, a second voice interaction instruction initiated by the user within the vehicle cabin is detected. In some embodiments of the present disclosure, the process of detecting the second voice interaction instruction may refer to the process of detecting the first voice interaction instruction at step 210.

At step 250, in response to detecting the second voice interaction instruction, based on the second voice interaction instruction, identity information of the user who initiated the second voice interaction instruction and a topic area corresponding to the second voice interaction instruction are determined. In some embodiments of the present disclosure, the process of step 250 may refer to the process of step 230.

At step 260, it is determined to continue running the vehicle cabin with the first session or running a second session corresponding to the second voice interaction instruction based on the comparison of the topic area corresponding to the first voice interaction instruction and the topic area corresponding to the second voice interaction instruction and the comparison of the identity information of the user initiating the first voice interaction instruction and the identity information of the user initiating the second voice interaction instruction, wherein the second session is different from the first session.

In some embodiments of the present disclosure, the method 200 may further comprise: the vehicle cabin is caused to continue running the first session in response to the determination in step 260, and the second session is run after the first session is ended. According to the embodiments, the second session can be buffered during the execution of the first session instead of directly ignoring the second session, so that the vehicle-mounted system has a continuous interactive function with buffering capability.

According to some embodiments of the disclosure, for the first voice interaction instruction and the second voice interaction instruction, the topic fields of the first voice interaction instruction and the second voice interaction instruction are determined and compared with the identity information of the initiating user, and whether to continue to operate the first session or the second session different from the first session is determined based on the comparison result, so that the execution result of the voice interaction instruction is more flexible and humanized, and the voice interaction use experience of the user is improved.

In some embodiments of the present disclosure, the determining in step 260 to have the vehicle cabin continue running the first session or running the second session corresponding to the second voice interaction instruction comprises: selecting a corresponding preset interruption identifier from a plurality of preset interruption identifiers in the topic field corresponding to the first voice interaction instruction based on the comparison between the topic field corresponding to the first voice interaction instruction and the topic field corresponding to the second voice interaction instruction, and the comparison between the identity information of the user initiating the first voice interaction instruction and the identity information of the user initiating the second voice interaction instruction, wherein the preset interruption identifier has a set value indicating whether to interrupt the first session; and determining to cause the vehicle cabin to continue to run the first session or to run a second session corresponding to the second voice interaction instruction based on the set value of the selected preset interrupt identifier.

It should be noted that in some embodiments of the present disclosure, the setting of the preset interrupt flag may be configured by the vehicle manufacturer, or may be configured in a custom manner by the user.

According to some embodiments of the present disclosure, based on the topic areas of both the first and second voice interaction instructions and the identity information of the initiating user and the set value of the selected preset interruption identifier, it can be determined to have the vehicle cabin continue running the first session or run the second session. The preset values of the preset interrupt identifiers of the topic fields can be configured for different topic fields, so that the execution result of the voice interaction instruction is more flexible and humanized, and the voice interaction use experience of a user is improved.

In some embodiments of the present disclosure, selecting a corresponding preset breaking identifier from a plurality of preset breaking identifiers of a topic field corresponding to the first voice interaction instruction may include:

selecting a first preset interruption identifier from the plurality of preset interruption identifiers as a corresponding preset interruption identifier according to a determination that the topic field corresponding to the first voice interaction instruction is the same as the topic field corresponding to the second voice interaction instruction and the identity information of the user initiating the first voice interaction instruction is the same as the identity information of the user initiating the second voice interaction instruction, wherein the first preset interruption identifier indicates that the second voice interaction instruction has the same topic field and the same initiating user as the first voice interaction instruction;

selecting a second preset interruption identifier from the plurality of preset interruption identifiers as a corresponding preset interruption identifier according to the fact that the topic field corresponding to the first voice interaction instruction is different from the topic field corresponding to the second voice interaction instruction and the identity information of the user initiating the first voice interaction instruction is the same as the identity information of the user initiating the second voice interaction instruction, wherein the second preset interruption identifier indicates that the second voice interaction instruction and the first voice interaction instruction have different topic fields and the same initiating user;

selecting a third preset interruption identifier from the plurality of preset interruption identifiers as a corresponding preset interruption identifier according to a determination that the topic field corresponding to the first voice interaction instruction is the same as the topic field corresponding to the second voice interaction instruction and that the identity information of the user initiating the first voice interaction instruction is different from the identity information of the user initiating the second voice interaction instruction, wherein the third preset interruption identifier indicates that the second voice interaction instruction has the same topic field and different initiating users as the first voice interaction instruction; and

in accordance with a determination that the topic area corresponding to the first voice interaction instruction is different from the topic area corresponding to the second voice interaction instruction and that the identity information of the user initiating the first voice interaction instruction is different from the identity information of the user initiating the second voice interaction instruction, selecting a fourth preset interruption identifier from the plurality of preset interruption identifiers as a corresponding preset interruption identifier, wherein the fourth preset interruption identifier indicates that the second voice interaction instruction has a different topic area and a different initiating user from the first voice interaction instruction.

Next, the configuration of the preset interruption identifier associated with the topic area corresponding to the voice interaction instruction and the setting value thereof will be described with reference to fig. 3 to 5.

Fig. 3 is a diagram illustrating a preset interruption identifier and its setting value according to an exemplary embodiment.

As shown in fig. 3, four different preset breaking identifiers may be set for the topic area "weather", including: a first preset interrupt identifier (marked as flag1_ spakerA _ domainA), the setting value of which is True; a second preset interrupt identifier (marked as flag1_ spakera _ domainB), the setting value of which is True; a third preset interrupt identifier (marked as flag1_ spakerb _ domainA), the setting value of which is True; the fourth pre-set interrupt identifier (labeled flag1_ spakerb _ domainB), whose setting is False. In the preset interrupt identifiers, the 'flag 1' represents the category of the topic field corresponding to the voice interaction instruction; "spaakera" indicates that the initiating user of the second voice interaction instruction is the same as the initiating user of the first voice interaction instruction, and "spaker b" indicates that the initiating user of the second voice interaction instruction is different from the initiating user of the first voice interaction instruction; "domainA" indicates that the topic field corresponding to the second voice interaction instruction is the same as the topic field corresponding to the first voice interaction instruction, and "domainB" indicates that the topic field corresponding to the second voice interaction instruction is different from the topic field corresponding to the first voice interaction instruction. The preset interrupt identifier has a set value True and represents that the first conversation is interrupted and a second conversation corresponding to the second voice interaction instruction starts to run; the preset interrupt identifier having the set value False indicates that the first session is continued and a second session corresponding to the second voice interaction command is not operated or is operated after the first session is ended. According to the preset interruption identifiers and the set values thereof shown in fig. 3, when a second voice interaction instruction intended to initiate a second session is detected during the first session, the in-vehicle system is configured to directly execute the second session with a greater tendency to interrupt the first session. Therefore, such a plurality of preset interruption identifiers and the setting values thereof can be applied to, for example, a topic area with a lower priority.

Fig. 4 is a diagram illustrating preset interruption identifiers and their settings according to further exemplary embodiments.

As shown in fig. 4, four different preset interruption identifiers may be set for the topic area "navigation", including: a first preset interrupt identifier (marked as flag2_ spakera _ domainA), the setting value of which is True; a second preset interrupt identifier (marked as flag2_ spakera _ domainB) whose set value is False; a third preset interrupt identifier (marked as flag2_ spakerb _ domainA), the setting value of which is False; the fourth pre-set interrupt identifier (labeled flag2_ spakerb _ domainB) is set to False. The meanings of the characters "flag2", "speakerA", "speakerB", "domainA", "domainB" and the setting value True/False in these preset breaking identifiers can be referred to those described in the topic area "weather". According to the preset interrupt identifiers and their settings shown in fig. 4, the on-board system is configured to prefer to continue the first session and to re-execute the second session after the end of the first session, when a second voice interaction command intended to initiate the second session is detected during the first session. Therefore, such a plurality of preset interruption identifiers and the setting values thereof can be applied to, for example, a topic area with higher priority.

Fig. 5 is a diagram illustrating a preset interrupt identifier and its setting value according to still other exemplary embodiments.

As shown in fig. 5, four different preset breaking identifiers can be set for the topic area "music", including: a first preset interrupt identifier (marked as flag3_ spakerA _ domainA), the setting value of which is True; a second preset interrupt identifier (marked as flag3_ spakera _ domainB), the setting value of which is True; a third predetermined breaking identifier (marked as flag3_ spakerb _ domainA), whose set value is False; the fourth pre-set interrupt identifier (labeled flag3_ spakerb _ domainB) is set to False. The meanings of the characters "flag3", "spaakera", "spaker b", "domainA", "domainB" and the setting value True/False in these preset interruption identifiers can also refer to those described in the topic area "weather". The plurality of preset interrupt identifiers and their set values shown in fig. 5 are more suitable for topic areas of medium priority than the plurality of preset interrupt identifiers and their set values described above with respect to fig. 3 and 4.

It should be understood that the preset interruption identifier associated with the topic area and the setting value thereof described with reference to fig. 3 to 5 are only exemplary, and the specific setting scheme may be specifically configured according to the topic area and the preference of the user, and the like.

In some usage scenarios of the present disclosure associated with fig. 3, the user a first initiates a voice interaction instruction (i.e., a first voice interaction instruction) belonging to the topic area "weather", for example, "weather of today", and after the in-vehicle system detects the voice interaction instruction, the in-vehicle system starts a first session to voice-broadcast information content related to the weather of today, for example, "rainy in the morning, start turning to sunny in the afternoon, lowest air temperature 18 degrees of today, highest air temperature 28 degrees \8230;". During the period that the vehicle-mounted system broadcasts the information content related to the weather of today, if the user A initiates another voice interactive instruction (i.e. a second voice interactive instruction) belonging to the topic field of "weather", for example "weather on the day". Under the condition, the vehicle-mounted system determines that the users initiating the two voice interaction instructions belong to the same user according to the first voice interaction instruction and the second voice interaction instruction, and the topic fields corresponding to the two voice interaction instructions are also the same. Then, the on-board system selects a first preset interruption identifier (flag 1_ speakerA _ domainA) from a plurality of preset interruption identifiers of the topic field "weather", and according to the determination that the set value of the flag1_ speakerA _ domainA is True, interrupts the running of the first session and starts the running of a second session corresponding to a second voice interactive instruction, namely directly starts broadcasting information contents related to the weather of tomorrow, such as "rainy all day tomorrow, lowest air temperature 15 degrees, highest air temperature 24 degrees \8230;". In other usage scenarios, during the period when the above-mentioned on-board system broadcasts information content related to the weather of today, if the user a initiates another voice interaction instruction (i.e. a second voice interaction instruction) belonging to the topic area "window", for example, "close the window". Under the condition, the vehicle-mounted system determines that the users initiating the two voice interaction instructions belong to the same user according to the first voice interaction instruction and the second voice interaction instruction, but the topic fields corresponding to the two voice interaction instructions are different. Then, the vehicle-mounted system selects a second preset interruption identifier (flag 1_ speakerA _ domainB) from a plurality of preset interruption identifiers of the topic field "weather", and according to the determination that the set value of the flag1_ speakerA _ domainB is True, the running of the first session is interrupted and the running of a second session corresponding to a second voice interaction instruction is started, namely, the action of closing the window is directly executed, and a voice prompt such as "closing the window for you" is optionally broadcasted. In still other usage scenarios, during the broadcast of information content related to today's weather by the above-mentioned in-vehicle system, if a user B other than user a initiates another voice interactive instruction (i.e. a second voice interactive instruction) belonging to the topic area "weather", for example "naming day". In such a case, the vehicle-mounted system determines that the topic fields corresponding to the two voice interaction instructions are the same according to the first voice interaction instruction and the second voice interaction instruction, but the users initiating the two voice interaction instructions belong to different users. Then, the on-board system selects a third preset interruption identifier (flag 1_ speakerB _ domainA) from the plurality of preset interruption identifiers of the topic area "weather", and upon determining that the set value of the flag1_ speakerB _ domainA is True, interrupts the running of the first session and starts the running of the second session corresponding to the second voice interactive instruction, i.e., directly starts broadcasting information contents related to the weather of tomorrow, such as "rainy all day, 15 degrees of minimum air temperature, 24 degrees of maximum air temperature \8230;". In still other usage scenarios, during the broadcast of information content related to the weather of today by the above-mentioned on-board system, if a user B different from the user a initiates another voice interaction instruction (i.e. a second voice interaction instruction) belonging to the topic field "window", for example "help me close the window". Under the condition, the vehicle-mounted system determines that the topic fields corresponding to the two voice interaction instructions are different according to the first voice interaction instruction and the second voice interaction instruction, and the users initiating the two voice interaction instructions belong to different users. Then, the on-board system will select a fourth preset interruption identifier (flag 1_ speakerB _ domainB) from the plurality of preset interruption identifiers of the topic area "weather", and will continue to run the first session by determining that the set value of flag1_ speakerB _ domainB is False, and will immediately perform an action of closing the window after the information content related to the weather of today is announced, optionally voice-prompting the user "close the window for you on the horse".

According to some embodiments of the present disclosure, a topic area such as "weather" is configured to have a lower priority, and thus a first session associated with such a topic area is more easily interrupted considering the identity of the user initiating the voice interaction instruction and the topic area to which the voice interaction instruction corresponds, so that a second session can be started in response to a second interaction instruction more quickly.

In some usage scenarios associated with fig. 4 of the present disclosure, a user a first initiates a voice interaction instruction (i.e., a first voice interaction instruction) belonging to a topic field "navigation", for example, "please navigate to a nearest school", and after detecting the voice interaction instruction, the vehicle-mounted system performs path planning and continuously provides navigation information for the user a (for example, in a voice broadcast manner, a display screen display manner, and the like). During the period that the in-vehicle system continuously provides the navigation information for the user, if the user a initiates another voice interaction instruction (i.e. the second voice interaction instruction) belonging to the topic field "navigation", for example, "go to the nearest hospital". Under the condition, the vehicle-mounted system determines that the users initiating the two voice interaction instructions belong to the same user according to the first voice interaction instruction and the second voice interaction instruction, and the topic fields corresponding to the two voice interaction instructions are also the same. Then, the vehicle-mounted system selects a first preset interrupt identifier (flag 2_ speakerA _ domainA) from a plurality of preset interrupt identifiers of the topic field "navigation", and stops running the first session and starts running a second session corresponding to a second voice interaction instruction according to the determination that the set value of the flag2_ speakerA _ domainA is True, namely, planning a path again and continuously providing navigation information to the hospital for the user A. In other usage scenarios, if the user a initiates another voice interaction command (i.e. a second voice interaction command) belonging to the topic field "weather", for example "how hot the day is", while the in-vehicle system continues to provide the navigation information to the user. Under the condition, the vehicle-mounted system determines that the users initiating the two voice interaction instructions belong to the same user according to the first voice interaction instruction and the second voice interaction instruction, but the topic fields corresponding to the two voice interaction instructions are different. Then, the vehicle-mounted system selects a second preset interruption identifier (flag 2_ speakerA _ domainB) from the plurality of preset interruption identifiers of the topic field "navigation", and according to the determination that the set value of the flag2_ speakerA _ domainB is False, continues the first session (i.e., continues to provide navigation information to the school) and then executes the second session after the first session is ended, i.e., broadcasts information content related to the weather of the tomorrow after the navigation is ended. In still other usage scenarios, while the in-vehicle system continues to provide navigation information to the user, if a user B other than user a initiates another voice interaction instruction (i.e., a second voice interaction instruction) belonging to the topic area "navigation", such as "display the latest gas station nearby". In such a case, the vehicle-mounted system determines that the topic fields corresponding to the two voice interaction instructions are the same according to the first voice interaction instruction and the second voice interaction instruction, but the users initiating the two voice interaction instructions belong to different users. Then, the in-vehicle system selects a third preset interruption identifier (flag 2_ speakerB _ domainA) from the plurality of preset interruption identifiers of the topic area "navigation", and upon determining that the set value of flag2_ speakerB _ domainA is False, continues the first session (i.e., continues to provide navigation information to the school) and then executes a second session after the first session is ended, i.e., voice prompts "display the nearest gas station to you immediately after the navigation is ended", and displays the nearest gas station to user B. In still other usage scenarios, during the period that the in-vehicle system continues to provide navigation information to the user, if a user B other than user a initiates another voice interaction instruction (i.e., a second voice interaction instruction) belonging to the topic area "weather", for example "do it rains today". Under the condition, the vehicle-mounted system determines that the topic fields corresponding to the two voice interaction instructions are different according to the first voice interaction instruction and the second voice interaction instruction, and the users initiating the two voice interaction instructions belong to different users. Then, the in-vehicle system selects a fourth preset interruption identifier (flag 2_ speakerB _ domainB) from the plurality of preset interruption identifiers of the topic area "navigation", and upon determining that the set value of flag2_ speakerB _ domainB is False, continues the first session (i.e., continues to provide navigation information to the school) and then performs a second session after the first session is ended, for example, the voice announces "open-day general rate will not rain".

According to some embodiments of the present disclosure, the topic areas such as "navigation" are configured to have higher priority (because interference and delay of navigation information may affect driving safety), so that in consideration of the identity of the user initiating the voice interaction instruction and the topic area corresponding to the voice interaction instruction, the first session associated with such topic area is easier to continue to be executed, while the second session corresponding to the second interaction instruction is buffered and not ignored first, and is executed after the first session is ended.

In some usage scenarios of the present disclosure associated with fig. 5, user a first initiates a voice interaction instruction (i.e., a first voice interaction instruction) belonging to the topic field "music", for example, "play my favorite chinese song", and after detecting the voice interaction instruction, the in-vehicle system will play his favorite chinese song for user a. During the time that the vehicle-mounted system plays the favorite song of the user A, if the user A initiates another voice interaction instruction (i.e. a second voice interaction instruction) belonging to the topic field of "music", for example, "change to the favorite English song". Under the condition, the vehicle-mounted system determines that the users initiating the two voice interaction instructions belong to the same user according to the first voice interaction instruction and the second voice interaction instruction, and the topic fields corresponding to the two voice interaction instructions are also the same. Then, the vehicle-mounted system selects a first preset breaking identifier (flag 3_ speakerA _ domainA) from a plurality of preset breaking identifiers of the topic field "music", and according to the determination that the set value of the flag3_ speakerA _ domainA is True, the running of the first session is interrupted and the running of a second session corresponding to a second voice interaction instruction is started, namely the playing of a Chinese song is broken and the favorite English song is played for the user A. In other usage scenarios, during the period of playing the favorite song of user a by the vehicle-mounted system, if user a initiates another voice interactive instruction (i.e. a second voice interactive instruction) belonging to the topic field "weather", for example, "weather on tomorrow" is broadcasted. Under the condition, the vehicle-mounted system determines that the users initiating the two voice interaction instructions belong to the same user according to the first voice interaction instruction and the second voice interaction instruction, but the topic fields corresponding to the two voice interaction instructions are different. Then, the vehicle-mounted system selects a second preset interruption identifier (flag 3_ speakerA _ domainB) from a plurality of preset interruption identifiers of the topic field "music", and according to the determination that the set value of the flag3_ speakerA _ domainB is Ture, the running of the first session is interrupted and the running of a second session corresponding to a second voice interaction instruction is started, namely the playing of a Chinese song is interrupted and information content related to the weather of tomorrow is broadcasted for the user a. In still other usage scenarios, during the above-mentioned in-vehicle system playing the favorite song of user a, if user B different from user a initiates another voice interaction instruction (i.e. a second voice interaction instruction) belonging to the topic area "music", for example "playing the most popular song of the week". In such a case, the vehicle-mounted system determines that the topic fields corresponding to the two voice interaction instructions are the same according to the first voice interaction instruction and the second voice interaction instruction, but the users initiating the two voice interaction instructions belong to different users. Then, the vehicle-mounted system selects a third preset interruption identifier (flag 3_ speakerB _ domainA) from the plurality of preset interruption identifiers of the topic field "music", and sets the setting value of the flag3_ speakerB _ domainA to False (the reason for which is configured that False may be that it is not intended to interrupt the enjoyment of music by a previous user), so as to continue the first session and then execute a second session after the first session is ended, that is, continue to play the favorite chinese song for the user a, and play the most popular song for the user B after the chinese song is ended. In still other usage scenarios, during the playback of the favorite song of user a by the in-vehicle system, if user B, who is different from user a, initiates another voice interaction instruction (i.e., a second voice interaction instruction) belonging to the topic area "weather", for example, "do it rains today". Under the condition, the vehicle-mounted system determines that the topic fields corresponding to the two voice interaction instructions are different according to the first voice interaction instruction and the second voice interaction instruction, and the users initiating the two voice interaction instructions belong to different users. Then, the in-vehicle system will select a fourth preset breaking identifier (flag 3_ spiakerb _ domainB) from the plurality of preset breaking identifiers of the topic area "music", and upon determining that the setting value of flag3_ spiakerb _ domainB is False, continue the first session (i.e., continue playing the favorite chinese song for user a) and then execute a second session after the first session is ended, for example, the voice announces that "tomorrow's probability will not rain".

According to some embodiments of the present disclosure, a topic area such as "music" is configured to have a medium priority, and thus a first conversation associated with such a topic area and a second conversation corresponding to a second interactive instruction may be configured to have a similar probability to be executed, taking into account the identity of the user initiating the voice interactive instruction and the topic area to which the voice interactive instruction corresponds.

Although the operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, nor that all illustrated operations be performed, to achieve desirable results. For example, step 230 may be performed prior to step 220, or concurrently with step 220.

Fig. 6 is a schematic block diagram illustrating an arrangement 600 for a vehicle cabin according to an exemplary embodiment. As shown in fig. 6, the apparatus 600 may include a first module 610, a second module 620, a third module 630, a fourth module 640, a fifth module 650, and a sixth module 660. The first module 610 is configured to detect a first voice interaction instruction initiated by a user within a vehicle cabin. The second module 620 is configured to, in response to detecting the first voice interaction command, cause the vehicle cabin to operate a first session corresponding to the first voice interaction command. The third module 630 is configured to determine, based on the first voice interaction instruction, identity information of a user who initiated the first voice interaction instruction and a topic area corresponding to the first voice interaction instruction. The fourth module 640 is configured to detect a second voice interaction instruction initiated by the user within the vehicle cabin during operation of the first session. The fifth module 650 is configured to, in response to detecting the second voice interaction instruction, determine, based on the second voice interaction instruction, the identity information of the user who initiated the second voice interaction instruction and a topic area corresponding to the second voice interaction instruction. The sixth module 660 is configured to determine to cause the vehicle cabin to continue running the first session or run a second session corresponding to the second voice interaction instruction based on the comparison of the topic area corresponding to the first voice interaction instruction and the topic area corresponding to the second voice interaction instruction and the comparison of the identity information of the user initiating the first voice interaction instruction and the identity information of the user initiating the second voice interaction instruction, wherein the second session is different from the first session.

According to some embodiments of the present disclosure, by the apparatus 500, for the first voice interaction instruction and the second voice interaction instruction, the topic areas of the first voice interaction instruction and the second voice interaction instruction are determined and compared with the identity information of the initiating user, and based on the comparison result, whether to continue to operate the first session or to operate the second session different from the first session is determined, so that the execution result of the voice interaction instruction is more flexible and humanized, and therefore, the voice interaction use experience of the user is improved.

It should be understood that the various modules of the apparatus 600 shown in fig. 6 may correspond to the various steps in the method 200 described with reference to fig. 2. Thus, the operations, features and advantages described above with respect to the method 200 are equally applicable to the apparatus 600 and the modules included therein. Certain operations, features and advantages may not be described in detail herein for the sake of brevity.

Although specific functionality is discussed above with reference to particular modules, it should be noted that the functionality of the various modules discussed herein may be divided into multiple modules and/or at least some of the functionality of multiple modules may be combined into a single module. Performing an action by a particular module discussed herein includes the particular module itself performing the action, or alternatively the particular module invoking or otherwise accessing another component or module that performs the action (or performs the action in conjunction with the particular module). Thus, a particular module performing an action can include the particular module performing the action itself and/or another module performing the action that the particular module invokes or otherwise accesses. For example, the second module 630 and the third module 650 described above may be combined into a single module in some embodiments. Also for example, the first module 610 may include a second module 620 and a third module 630 in some embodiments.

As used herein, the phrase "entity a initiates action B" may refer to entity a issuing instructions to perform action B, but entity a itself does not necessarily perform that action B.

As used herein, the phrase "perform action Z based on a, B, and C" may refer to performing action Z based on a alone, B alone, C alone, a and B alone, a and C alone, B and C alone, or a, B, and C alone.

It should also be appreciated that various techniques may be described herein in the general context of software hardware elements or program modules. The various modules described above with respect to fig. 6 may be implemented in hardware or in hardware in combination with software and/or firmware. For example, the modules may be implemented as computer program code/instructions configured to be executed in one or more processors and stored in a computer-readable storage medium. Alternatively, the modules may be implemented as hardware logic/circuitry. For example, in some embodiments, one or more of the first module 610, the second module 620, the third module 630, the fourth module 640, the fifth module 650, and the sixth module 660 may be implemented together in a System on Chip (SoC). The SoC may include an integrated circuit chip (which includes one or more components of a Processor (e.g., a Central Processing Unit (CPU), microcontroller, microprocessor, digital Signal Processor (DSP), etc.), memory, one or more communication interfaces, and/or other circuitry), and may optionally execute received program code and/or include embedded firmware to perform functions.

According to an aspect of the disclosure, a computer device is provided that includes at least one memory, at least one processor, and a computer program stored on the at least one memory. The at least one processor is configured to execute the computer program to implement the steps of any of the method embodiments described above.

According to an aspect of the present disclosure, there is provided a vehicle comprising an apparatus or a computer device as described above.

According to an aspect of the present disclosure, a non-transitory computer-readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, implements the steps of any of the method embodiments described above.

According to an aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of any of the method embodiments described above.

Illustrative examples of such computer devices, non-transitory computer-readable storage media, and computer program products are described below in connection with FIG. 7.

Fig. 7 illustrates an example configuration of a computer device 700 that may be used to implement the methods described herein. For example, server 120 and/or in-vehicle system 110 shown in fig. 1 may include an architecture similar to computer device 700. The computer device/apparatus described above may also be implemented in whole or at least in part by a computer device 700 or similar device or system.

The computer device 700 may include at least one processor 702, memory 704, communication interface(s) 706, presentation device 708, other input/output (I/O) devices 710, and one or more mass storage devices 712, capable of communication with each other, such as through a system bus 714 or other suitable connection.

The processor 702 may be a single processing unit or multiple processing units, all of which may include single or multiple computing units or multiple cores. The processor 702 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitry, and/or any devices that manipulate signals based on operational instructions. The processor 702 may be configured to retrieve and execute computer-readable instructions, such as program code for an operating system 716, program code for an application 718, program code for other programs 720, and the like, stored in the memory 704, mass storage device 712, or other computer-readable medium, among other capabilities.

Memory 704 and mass storage device 712 are examples of computer-readable storage media for storing instructions that are executed by processor 702 to implement the various functions described above. By way of example, memory 704 may generally include both volatile and nonvolatile memory (e.g., RAM, ROM, and the like). In addition, mass storage device 712 may generally include a hard disk drive, a solid state drive, removable media, including external and removable drives, memory cards, flash memory, floppy disks, optical disks (e.g., CDs, DVDs), storage arrays, network attached storage, storage area networks, and the like. The memory 704 and mass storage device 712 may both be referred to herein collectively as memory or computer-readable storage media, and may be non-transitory media capable of storing computer-readable, processor-executable program instructions as computer program code that may be executed by the processor 702 as a particular machine configured to implement the operations and functions described in the examples herein.

A number of programs may be stored on the mass storage device 712. These programs include an operating system 716, one or more application programs 718, other programs 720, and program data 722, which can be loaded into memory 704 for execution. Examples of such applications or program modules may include, for instance, computer program logic (e.g., computer program code or instructions) to implement the following components/functions: method 200 and optional additional steps thereof, apparatus 600, and/or further embodiments described herein.

Although illustrated in fig. 7 as being stored in memory 704 of computer device 700,

modules

716, 718, 720, and 722, or portions thereof, may be implemented using any form of computer-readable media that is accessible by computer device 700. As used herein, "computer-readable media" includes at least two types of computer-readable media, namely computer-readable storage media and communication media.

Computer-readable storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer-readable storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computer device. In contrast, communication media may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism. Computer-readable storage media, as defined herein, does not include communication media.

One or more communication interfaces 706 are used to exchange data with other devices, such as over a network, a direct connection, and so forth. Such communication interfaces may be one or more of the following: any type of network interface (e.g., a Network Interface Card (NIC)), wired or wireless (such as IEEE 802.11 Wireless LAN (WLAN)) wireless interface, worldwide interoperability for microwave Access (Wi-MAX) interface, ethernet interface, universal Serial Bus (USB) interface, cellular network interface, bluetooth ^TM An interface, a Near Field Communication (NFC) interface, etc. The communication interface 706 may facilitate communications within a variety of networks and protocol types, including wired networks (e.g., LAN, cable, etc.) and wireless networks (e.g., WLAN, cellular, satellite, etc.), the Internet, and so forth. The communication interface 706 may also provide for communication with external storage devices (not shown), such as in storage arrays, network attached storage, storage area networks, and so forth.

In some examples, a display device 708, such as a monitor, may be included for displaying information and images to a user. Other I/O devices 710 may be devices that receive various inputs from a user and provide various outputs to the user, and may include touch input devices, gesture input devices, cameras, keyboards, remote controls, mice, printers, audio input/output devices, and so forth.

The techniques described herein may be supported by these various configurations of the computer device 700 and are not limited to specific examples of the techniques described herein. For example, the functionality may also be implemented in whole or in part on a "cloud" using a distributed system. The cloud includes and/or represents a platform for resources. The platform abstracts underlying functionality of hardware (e.g., servers) and software resources of the cloud. The resources may include applications and/or data that may be used when performing computing processes on servers remote from the computer device 700. Resources may also include services provided over the internet and/or over a subscriber network such as a cellular or Wi-Fi network. The platform may abstract resources and functionality to connect the computer device 700 with other computer devices. Thus, implementations of the functionality described herein may be distributed throughout the cloud. For example, the functionality may be implemented in part on the computer device 700 and in part by a platform that abstracts the functionality of the cloud.

While the disclosure has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative and exemplary and not restrictive; the present disclosure is not limited to the disclosed embodiments. Variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps than those listed, the indefinite article "a" or "an" does not exclude a plurality, the term "a" or "an" refers to two or more, and the term "based on" should be construed as "based at least in part on". The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Some exemplary aspects of the disclosure will be described below.

In aspect 1, a method of voice interaction for a vehicle cabin, the method comprising:

detecting a first voice interaction instruction initiated by a user in the vehicle cabin;

in response to detecting the first voice interaction instruction, causing the vehicle cabin to run a first session corresponding to the first voice interaction instruction;

determining identity information of a user initiating the first voice interaction instruction and a topic field corresponding to the first voice interaction instruction based on the first voice interaction instruction;

detecting a second voice interaction instruction initiated by a user in the vehicle cabin during operation of the first session;

in response to detecting the second voice interaction instruction, determining, based on the second voice interaction instruction, identity information of a user who initiates the second voice interaction instruction and a topic field corresponding to the second voice interaction instruction; and

and determining to enable the vehicle cabin to continue running the first session or run a second session corresponding to the second voice interaction instruction based on the comparison of the topic area corresponding to the first voice interaction instruction and the topic area corresponding to the second voice interaction instruction and the comparison of the identity information of the user who initiates the first voice interaction instruction and the identity information of the user who initiates the second voice interaction instruction, wherein the second session is different from the first session.

Aspect 2, the method of aspect 1, wherein the determining to cause the vehicle cabin to continue to run the first session or to run a second session corresponding to the second voice interaction instruction comprises:

selecting a corresponding preset interruption identifier from a plurality of preset interruption identifiers in the topic field corresponding to the first voice interaction instruction based on the comparison between the topic field corresponding to the first voice interaction instruction and the topic field corresponding to the second voice interaction instruction, and the comparison between the identity information of the user initiating the first voice interaction instruction and the identity information of the user initiating the second voice interaction instruction, wherein the preset interruption identifier has a set value indicating whether to interrupt the first session; and

determining to cause the vehicle cabin to continue running the first session or running a second session corresponding to the second voice interaction instruction based on the set value of the selected preset interrupt identifier.

Aspect 3, the method according to aspect 2, wherein selecting a corresponding preset interruption identifier from a plurality of preset interruption identifiers of a topic area corresponding to the first voice interaction instruction includes:

according to the fact that the topic field corresponding to the first voice interaction instruction is the same as the topic field corresponding to the second voice interaction instruction and the identity information of the user initiating the first voice interaction instruction is the same as the identity information of the user initiating the second voice interaction instruction, selecting a first preset interrupt identifier from the preset interrupt identifiers as the corresponding preset interrupt identifier, wherein the first preset interrupt identifier indicates that the second voice interaction instruction and the first voice interaction instruction have the same topic field and the same initiating user;

according to the fact that the topic field corresponding to the first voice interaction instruction is different from the topic field corresponding to the second voice interaction instruction and the identity information of the user initiating the first voice interaction instruction is the same as the identity information of the user initiating the second voice interaction instruction, selecting a second preset interrupt identifier from the preset interrupt identifiers as the corresponding preset interrupt identifier, wherein the second preset interrupt identifier indicates that the second voice interaction instruction and the first voice interaction instruction have different topic fields and the same initiating user;

according to the fact that the topic field corresponding to the first voice interaction instruction is the same as the topic field corresponding to the second voice interaction instruction and the identity information of the user initiating the first voice interaction instruction is different from the identity information of the user initiating the second voice interaction instruction, selecting a third preset interrupt identifier from the preset interrupt identifiers as the corresponding preset interrupt identifier, wherein the third preset interrupt identifier indicates that the second voice interaction instruction and the first voice interaction instruction have the same topic field and different initiating users; and

according to the fact that the topic field corresponding to the first voice interaction instruction is different from the topic field corresponding to the second voice interaction instruction and the identity information of the user initiating the first voice interaction instruction is different from the identity information of the user initiating the second voice interaction instruction, selecting a fourth preset interrupt identifier from the preset interrupt identifiers as the corresponding preset interrupt identifier, wherein the fourth preset interrupt identifier indicates that the second voice interaction instruction and the first voice interaction instruction have different topic fields and different initiating users.

Aspect 4, the method of aspect 1, further comprising: in response to determining to cause the vehicle cabin to continue operating the first session, after the first session ends, operating the second session.

Aspect 5 is the method of aspect 1, wherein determining the topic area corresponding to the voice interaction instruction comprises:

determining semantic content of the voice interaction instruction using natural language processing; and

and determining a topic field corresponding to the voice interaction instruction based on the semantic content.

Aspect 6 the method of aspect 1, wherein the vehicle cabin includes a plurality of sound zones corresponding to a plurality of vehicle seats, respectively, and wherein determining identity information of a user initiating the voice interaction instruction comprises:

determining a seat position of the user initiating the voice interaction instruction within the vehicle cabin using the plurality of phonemes, determining identity information of the user based on the seat position of the user.

Aspect 7, the method of aspect 1, wherein the vehicle cabin includes a voiceprint identification system, and wherein determining identity information of a user initiating a voice interaction instruction further comprises:

and determining the voiceprint information of the user initiating the voice interaction instruction by using the voiceprint identification system, and determining the identity information of the user based on the voiceprint information of the user.

Aspect 8, the method of aspect 1, wherein the vehicle cabin includes a multi-modal identification system, and wherein determining identity information of a user initiating the voice interaction instruction further comprises:

and determining the identity information of the user initiating the voice interaction instruction by using the multi-modal identity recognition system, wherein the multi-modal identity recognition system can at least perform face recognition and voice recognition on the user.

In aspect 9, an apparatus for a vehicle cabin including at least one of a plurality of sound zones, a voiceprint identification system and a multi-modal identification system corresponding to a plurality of vehicle seats, respectively, the apparatus comprising:

a first module configured to detect a first voice interaction instruction initiated by a user within the vehicle cabin;

a second module configured to cause the vehicle cabin to run a first session corresponding to the first voice interaction instruction in response to detecting the first voice interaction instruction;

a third module configured to determine, based on the first voice interaction instruction, identity information of a user who initiated the first voice interaction instruction and a topic field corresponding to the first voice interaction instruction;

a fourth module configured to detect a second voice interaction instruction initiated by a user within the vehicle cabin during operation of the first session;

a fifth module configured to, in response to detecting the second voice interaction instruction, determine, based on the second voice interaction instruction, identity information of a user who initiated the second voice interaction instruction and a topic area corresponding to the second voice interaction instruction; and

a sixth module configured to determine to cause the vehicle cabin to continue running the first session or to run a second session corresponding to the second voice interaction instruction based on a comparison of the topic area corresponding to the first voice interaction instruction and the topic area corresponding to the second voice interaction instruction and a comparison of the identity information of the user initiating the first voice interaction instruction and the identity information of the user initiating the second voice interaction instruction, wherein the second session is different from the first session.

Aspect 10, a computer device, comprising:

at least one processor; and

at least one memory having a computer program stored thereon,

wherein the computer program, when executed by the at least one processor, causes the at least one processor to perform the method of any of aspects 1 to 8.

Aspect 11, a vehicle comprising an apparatus according to aspect 9 or a computer device according to aspect 10.

Aspect 12, a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to perform the method of any of aspects 1 to 8.

Aspect 13, a computer program product comprising a computer program which, when executed by a processor, causes the processor to perform the method of any of aspects 1 to 8.

Claims

1. A method of voice interaction for a vehicle cabin, the method comprising:

in response to detecting the first voice interaction instruction, causing the vehicle cabin to operate a first session corresponding to the first voice interaction instruction;

determining identity information of a user who initiates the first voice interaction instruction and a topic field corresponding to the first voice interaction instruction based on the first voice interaction instruction;

in response to detecting the second voice interaction instruction, determining, based on the second voice interaction instruction, identity information of a user who initiated the second voice interaction instruction and a topic field corresponding to the second voice interaction instruction; and

2. The method of claim 1, wherein the determining to cause the vehicle cabin to continue operating the first session or operating a second session corresponding to the second voice interaction instruction comprises:

3. The method of claim 2, wherein selecting a corresponding preset interruption identifier from a plurality of preset interruption identifiers of a topic area corresponding to the first voice interaction directive comprises:

4. The method of claim 1, further comprising: in response to determining to cause the vehicle cabin to continue operating the first session, after the first session ends, operating the second session.

5. The method of claim 1, wherein determining a topic area corresponding to the voice interaction instruction comprises:

6. The method of claim 1, wherein the vehicle cabin includes a plurality of sound zones corresponding to a plurality of vehicle seats, respectively, and wherein determining identity information of a user initiating a voice interaction instruction comprises:

determining a seating position of the user initiating voice interaction instructions within the vehicle cabin using the plurality of voice zones, determining identity information of the user based on the seating position of the user.

7. The method of claim 1, wherein the vehicle cabin includes a voiceprint identification system, and wherein determining identity information of a user initiating a voice interaction instruction further comprises:

8. The method of claim 1, wherein the vehicle cabin includes a multi-modal identification system, and wherein determining identity information of a user initiating a voice interaction instruction further comprises:

9. An apparatus for a vehicle cabin including at least one of a plurality of sound zones, a voiceprint identification system and a multi-modal identification system corresponding to a plurality of vehicle seats, respectively, the apparatus comprising:

a third module configured to determine, based on the first voice interaction instruction, identity information of a user who initiated the first voice interaction instruction and a topic area corresponding to the first voice interaction instruction;

a fifth module configured to determine, in response to detecting the second voice interaction instruction, based on the second voice interaction instruction, identity information of a user who initiated the second voice interaction instruction and a topic area corresponding to the second voice interaction instruction; and

a sixth module configured to determine to cause the vehicle cabin to continue running the first session or a second session corresponding to the second voice interaction instruction based on a comparison of the topic area corresponding to the first voice interaction instruction and the topic area corresponding to the second voice interaction instruction and a comparison of the identity information of the user initiating the first voice interaction instruction and the identity information of the user initiating the second voice interaction instruction, wherein the second session is different from the first session.

10. A computer device, comprising:

at least one processor; and

at least one memory having a computer program stored thereon,

wherein the computer program, when executed by the at least one processor, causes the at least one processor to perform the method of any one of claims 1 to 8.