CN111816189A - Multi-tone-zone voice interaction method for vehicle and electronic equipment - Google Patents

Multi-tone-zone voice interaction method for vehicle and electronic equipment Download PDF

Info

Publication number
CN111816189A
CN111816189A CN202010630094.6A CN202010630094A CN111816189A CN 111816189 A CN111816189 A CN 111816189A CN 202010630094 A CN202010630094 A CN 202010630094A CN 111816189 A CN111816189 A CN 111816189A
Authority
CN
China
Prior art keywords
voice
passenger
voice interaction
link
sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010630094.6A
Other languages
Chinese (zh)
Other versions
CN111816189B (en
Inventor
杨扬
袁志俊
吴晓敏
王恺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zebra Network Technology Co Ltd
Original Assignee
Zebra Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zebra Network Technology Co Ltd filed Critical Zebra Network Technology Co Ltd
Priority to CN202010630094.6A priority Critical patent/CN111816189B/en
Publication of CN111816189A publication Critical patent/CN111816189A/en
Application granted granted Critical
Publication of CN111816189B publication Critical patent/CN111816189B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides a vehicle multi-sound-zone voice interaction method and electronic equipment, comprising the following steps: the car machine terminal respectively creates voice interaction links corresponding to the positions of one or more sound zones according to the positions of the sound zones; the car machine terminal sets a voice interaction link in a monitoring state and is used for monitoring voice signals of a sound zone for awakening; when one or more voice interaction links monitor voice signals of a sound zone, the vehicle-mounted terminal switches the one or more voice interaction links to a voice processing state, and the voice processing state is used for processing voice signals input by passengers in the corresponding sound zone; the vehicle-mounted terminal acquires a voice interaction link, determines a processing result based on the voice signal, and performs voice broadcasting. According to the embodiment of the application, the voice interaction objects can be switched when the car terminal in the carriage talks with the passengers, and the talk and emotion of the car terminal are adjusted according to the characteristics of different passengers, so that the voice interaction process in the carriage is more real, natural and personalized.

Description

Multi-tone-zone voice interaction method for vehicle and electronic equipment
Technical Field
The application relates to the technical field of automotive electronics, in particular to a multi-tone-zone voice interaction method for a vehicle.
Background
In the existing driving process, a certain question (or a certain round of conversation) is often required to be transferred to another person to answer and complete the conversation, and in a multi-tone zone environment (such as 4, 6 multi-tone zones and other technologies), the car-machine terminal can know which user sits on which seat in a car and can also know which user in which seat is in voice interaction. However, the existing technical solutions for voice interaction in multiple sound zones in an automobile scene do not support switching of interaction objects through voice.
Disclosure of Invention
In view of this, the application provides a multi-zone voice interaction method for a vehicle, which can realize switching of voice interaction objects when a vehicle terminal in a carriage talks with passengers, and dynamically adjust the dialect and emotion of the vehicle terminal according to the characteristics of different passengers, so that the voice interaction process in the carriage is more real, more natural and more personalized.
In order to solve the technical problem, the following technical scheme is adopted in the application:
in a first aspect, the present application provides a vehicle multi-zone voice interaction method, including:
the car machine terminal respectively creates voice interaction links corresponding to the positions of one or more sound zones according to the positions of the sound zones;
the car machine terminal sets a voice interaction link in a monitoring state, and the monitoring state is used for monitoring voice signals of a sound zone for awakening;
when one or more voice interaction links monitor voice signals of a sound zone, the vehicle-mounted terminal switches the one or more voice interaction links to a voice processing state, and the voice processing state is used for processing voice signals input by passengers in the corresponding sound zone;
the vehicle-mounted terminal acquires a voice interaction link, determines a processing result based on the voice signal, and performs voice broadcasting.
As an embodiment of the first aspect of the present application, the creating, by the car terminal, voice interactive links corresponding to the locations of one or more sound zones according to the locations of the sound zones respectively includes:
and when the car terminal identifies that the passenger is in the sound zone, a voice interaction link is established for the sound zone where the passenger is located.
As an embodiment of the first aspect of the present application, the car terminal recognizing that a passenger is included in a vocal range includes:
the vehicle-mounted terminal acquires an ID for identifying a passenger;
a confidence level that the passenger is a registered user based on the ID used to identify the passenger,
when the confidence coefficient is greater than a preset value, determining that the passenger is a registered user, and acquiring user information of the passenger;
and when the confidence coefficient is smaller than the preset value, judging the passenger as a new user, and registering user information.
As an embodiment of the first aspect of the present application, registering user information includes:
and the ID is used for identifying the user and one or more of the user name, the user nickname, the user age and the user preference are acquired by the vehicle-mounted terminal.
As an embodiment of the first aspect of the present application, the ID for identifying the user includes: one or more of a face ID, a voiceprint ID and an iris ID.
As an embodiment of the first aspect of the present application, the voice processing state of the voice interactive link includes:
front-end signal processing, including the voice interactive link acquiring the voice signal of the corresponding sound zone, and preprocessing the voice signal to obtain a high-quality voice signal;
and voice interaction, including voice interaction of the voice interaction link with the passenger based on the high-quality voice signal.
As an embodiment of the first aspect of the present application, the front-end signal processing further includes:
voice endpoint detection, which is used for detecting the initial position of the voice signal and acquiring an effective voice signal containing voice information and an invalid voice signal not containing voice information;
the noise reduction processing is used for reducing noise interference in the effective voice signal and improving the signal-to-noise ratio;
echo cancellation for canceling echo in a valid speech signal;
sound source positioning, namely determining the position of a speaking passenger based on a voice signal collected by a microphone array;
and the beam forming is used for integrating the multiple paths of voice signals collected by the microphone array into one path of voice signal so as to further accurately position the sound source.
As an embodiment of the first aspect of the present application, the voice interaction includes:
the voice recognition is used for converting an effective voice signal containing voice information into first text information;
semantic understanding for understanding a meaning of the first text information;
conversation management, namely judging whether the current voice conversation of the passenger is finished or not based on semantic understanding, and generating a decision;
performing speech technology processing, namely generating a second text based on the decision-making sleeve by using a preset speech technology;
and voice synthesis, namely generating voice from the second text and feeding the voice back to the vehicle terminal for playing.
As an embodiment of the first aspect of the application, the conversational processing (NLG) comprises one or more of the following methods:
selecting a default dialect style by the voice interaction link;
selecting a template and configuring a dialect style by a voice interaction link;
the voice interaction link selection model generates a conversational style.
As an embodiment of the first aspect of the present application, the voice interaction further includes:
when the semantic understanding identifies the target vocabulary of the first passenger in the first sound zone and the target vocabulary points to the second passenger, the first voice interactive link is switched to the monitoring state, and the second passenger is switched from the monitoring state to the voice processing state corresponding to the second sound zone, the first voice interactive link is used for processing the voice signal of the second sound zone where the second passenger is located, and,
and when the second voice interaction link obtains the processing result of the voice signal of the second sound zone, feeding the processing result back to the vehicle terminal, and performing voice broadcast by the vehicle terminal, wherein the state of the second voice interaction link is switched from the voice processing state to the monitoring state, and the state of the first voice interaction link is switched from the monitoring state to the voice processing state.
As an embodiment of the first aspect of the present application, the voice interaction further includes:
and when the voice interactive link does not acquire the voice signal within the preset time range of the voice processing state, the voice interactive link is switched to the monitoring state from the voice processing state.
As an embodiment of the first aspect of the present application, the target vocabulary comprises:
one or more of a user title, a user nickname, and a user name.
As an embodiment of the first aspect of the present application, when the voice interactive link is in the switching state, the car-in-vehicle terminal sends a voice message to notify the passenger corresponding to the sound zone, so that the passenger can know the state of the voice interactive link.
As an embodiment of the first aspect of the present application, the plurality of voice interaction links may process voice signals input by passengers of a plurality of phonemes in parallel.
As an example of the first aspect of the present application, the soundzone includes a main driving seat area and/or other individual passenger seat areas.
As an embodiment of the first aspect of the present application, the in-vehicle terminal performs voice broadcast according to a time sequence of obtaining the processing result.
In a second aspect, an embodiment of the present application provides an electronic device, including a processor and a memory,
the memory has stored therein an instruction that,
and the processor is used for reading the instructions stored in the memory so as to execute the vehicular multi-tone-zone voice interaction method.
The technical scheme of the application has at least one of the following beneficial effects:
according to the vehicle multi-sound-zone voice interaction method and the electronic device, the vehicle terminal can identify passengers in different sound zones of the carriage, voice objects of conversation are switched according to voice instructions of the passengers, and according to the characteristics of the different passengers, the speech and emotion of the vehicle terminal are dynamically adjusted, so that the voice interaction process in the carriage is more real, more natural and more personalized.
Drawings
FIG. 1 is a scene diagram of a vehicular multi-zone speech interaction method according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for a voice interaction link to converse with a passenger according to an embodiment of the present application;
FIG. 3 is a flowchart of a method for registering passenger user information according to an embodiment of the present application;
FIG. 4 is a flow chart of a method of detecting and identifying passengers in an embodiment of the present application;
FIG. 5 is a flowchart of a method for vehicular multi-zone voice interaction according to an embodiment of the present application;
FIG. 6 is a diagram illustrating a method of speech signal processing according to an embodiment of the present application;
fig. 7 is a flowchart of a method for switching a voice interactive link according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The following describes embodiments of the present application with reference to specific scenarios.
Fig. 1 is a scene diagram of voice interaction between passengers in a car and a car terminal according to an embodiment of the present application. As shown in fig. 1, there are passengers in the car in the scene, each passenger sits at different positions in the car, for example, a main driver seat, a subsidiary driver seat and a rear row position, each position corresponds to a sound zone, the car terminal determines whether there is a passenger in the sound zone first, when there is a passenger, a voice interaction link corresponding to the sound zone is created, several passengers are in the car, several voice interaction links are created for processing voice conversations between the passenger corresponding to the sound zone and the car terminal, the voice interaction links are parallel, so that the passengers at different positions can perform voice interaction with the car terminal, for example, daddy sits at the driver seat, daddy sits at the subsidiary driver seat, when the car terminal is started, it is determined that there is a passenger in the sound zone of the driver seat and the sound zone of the subsidiary driver seat, two voice interaction links are created, wherein one daddy managing the car terminal to talk with the sound zone of the driver seat, another management car machine terminal and the minor driving district's Mingming dialogue, and two voice links that are being established are in the monitoring state, when Mingmdad says: "help me navigate to a western restaurant. ", the voice interactive link of the driving seat is switched from the monitoring state to the voice processing state, and converse and answer with daddy: "good, found 5 western restaurants. "and broadcast through the car terminal, at this moment dad want to receive a phone, say: "I have something to do, let select a bar in the Ming Dynasty! At this time, the voice interaction link of the driving seat is switched from the voice processing state to the monitoring state, conversation is suspended, the voice exchange link of the copilot is switched from the monitoring state to the voice processing state, and small-scale voice input are waited: "go to the 3 rd steak Bar! ", at this time, the voice exchange link in the copilot answers: "Xiaoming has selected the Tanjin shop on the xib steak and started to navigate your. And the voice is broadcasted through the vehicle terminal, so that the switching of multi-tone-zone voice objects in the automobile can be realized, and the interaction in the automobile is more convenient and faster.
In some embodiments of the present application, first, user information of a passenger is confirmed, for example, after the passenger enters a car, a car terminal is started to start detecting and identifying the user, and a camera of the car terminal identifies facial images of the passenger at various positions in the car and positions in the car.
After confirming the user information and the position information of the automobile passengers, starting to create a voice interaction link to manage the conversation between the passengers in the carriage and the vehicle-mounted terminal.
The vehicular multi-tone-zone voice interaction method according to the present application is described below with reference to the accompanying drawings, and fig. 2 shows a flowchart of the vehicular multi-tone-zone voice interaction method, where the method is applied to a car terminal to implement switching of an interactive object of the car terminal by a voice in a multi-tone zone, and as shown in fig. 2, the method includes:
step S210, the car terminal respectively creates voice interaction links corresponding to the positions of one or more sound zones according to the positions of the sound zones, wherein one voice interaction link manages the voice conversations of one sound zone, and the specific method for creating the voice interaction links is that when the car terminal identifies that a passenger is in the sound zone, a voice interaction link is created for the sound zone where the passenger is located, so that the voice interaction link can manage the conversations between the passenger in the sound zone and the car terminal, and it needs to be noted that when a plurality of passengers are in a carriage, a plurality of voice interaction links are created simultaneously, wherein the plurality of voice interaction links are parallel, and the voice conversations of the plurality of sound zones are processed simultaneously.
Step S220, the car terminal sets the voice interaction link to be in a monitoring state, the monitoring state is used for monitoring voice signals used for awakening in a sound zone, namely, when the voice interaction link is established, the voice interaction link is in the monitoring state and monitors the voice signals corresponding to the sound zone at any time, wherein the voice signals used for awakening are hundreds of voice signals, most of daily car scenes such as navigation, entertainment, car control and service are covered, meanwhile, in order to respond to the awakened voice signals more accurately, the voice interaction link adopts nearly millions of levels of corpus training, and therefore the voice interaction link can respond to the voice signals of passengers at any time and process the voice signals of the passengers more accurately.
Step S230, when the voice signal of the sound zone is monitored by one or more voice interaction links, the car-in-vehicle terminal switches the one or more voice interaction links to a voice processing state, where the voice processing state is used to process the voice signal input by the passenger in the corresponding sound zone, for example, the passenger a speaks: "help me to look up weather today", wherein "help me to look up" belongs to the speech signal who awakens up, and the voice interaction link route monitoring state that the sound zone that passenger A belongs to is switched to the speech processing state and is conversed with passenger A, and the statement of dialogue is reported out through car machine terminal, and passenger B says: "i want to listen to rock music", the voice interaction link corresponding to the sound zone where the passenger B is located is switched from the monitoring state to the voice processing state to have a conversation with the passenger B, so that voice interaction of passengers in multiple sound zones can be realized.
And step S240, the vehicle-mounted terminal acquires the voice interaction link, determines a processing result based on the voice signal, and performs voice broadcast. For example, according to the previous step, the voice interactive link of passenger a determines, based on the voice signal, that the processing result is: inquiring weather, feeding back to the vehicle terminal, displaying weather forecast by the vehicle terminal, and determining that a processing result is based on the voice signal by the voice interaction link of the passenger B: playing rock music and feeding back the rock music to the car terminal, and playing the rock music by the car terminal, so that the voice interaction link of each sound zone can interact with passengers in each sound zone by using natural language voice, and finally, users in each sound zone can control the car terminal by voice to obtain various services such as navigation, entertainment and the like.
Therefore, the car terminals of the passengers in different sound zones of the car talk, the car terminals can quickly respond to various operations of different passengers on vehicle setting, navigation, music, video and the like, the interaction objects of the car terminals are switched by voice in multiple sound zones of the car, and the interaction in the carriage is more convenient and faster.
In some embodiments of the present application, the car terminal recognizing that the passenger is included in the vocal tract includes: the vehicle-mounted terminal acquires an ID for identifying a passenger, judges the confidence level of the passenger as a registered user based on the ID for identifying the passenger, determines the passenger as the registered user when the confidence level is greater than a preset value, acquires the user information of the passenger, and judges the passenger as a new user and registers the user information when the confidence level is less than the preset value. That is, before creating the voice interactive link, it is necessary to detect and identify the user information, as shown in fig. 3, and identify the user according to the face ID or voiceprint ID, and acquire information such as nickname of the passenger, age of the passenger, or hobby of the passenger, for example, when the passenger enters the car, the car terminal is activated to start detecting and identifying the user, the camera of the car terminal identifies the face image of the passenger at each position in the car, compares with the face ID of the user stored in the database, determines whether the passenger has registered the user information, when the confidence of the face ID is greater than the preset value, determines that the passenger is the registered user, and simultaneously acquires the user information of the passenger, such as nickname, age, and hobby, and when the confidence of the face ID is greater than the preset value, it is possible to further acquire the voiceprint ID of the passenger to compare with the voiceprint ID of the user stored in the database, judging whether a passenger registers user information, when the confidence coefficient of the voiceprint ID is larger than a preset value, determining that the passenger is a registered user, and acquiring the user information of the passenger, when the confidence coefficient of the voiceprint ID is smaller than the preset value, the user information cannot be confirmed, returning to an initial step of identifying the user, and re-acquiring the face ID of the user to detect and identify the user.
In some embodiments of the present application, as shown in fig. 4, registering the user information includes: and the ID is used for identifying the user and one or more of the user name, the user nickname, the user age and the user preference are acquired by the vehicle-mounted terminal. That is, the premise for identifying the passenger is that the database already holds the registered user information of the passenger, among others. The registered user information requires an ID that uniquely identifies the user, for example: one or more of a face ID, a voiceprint ID and an iris ID, associated user information is also added to the user attribute table after the ID identifying the user is determined, wherein the associated user information includes user name, user nickname, user age and user preferences, thereby providing for a conversational process in a subsequent voice interaction link with the passenger.
In some embodiments of the present application, as shown in fig. 5, the voice processing state of the voice interaction link includes front-end signal processing and voice interaction, the front-end signal processing is that the voice interaction link acquires a voice signal of a corresponding vocal range and performs preprocessing to obtain a high-quality voice signal, and the voice interaction is that the voice interaction link performs a voice conversation with a passenger based on the high-quality voice signal.
In some embodiments Of the present application, the front-end signal processing further includes voice endpoint detection (VAD), noise reduction processing, echo Cancellation (AEC), sound source localization (DOA), and beam forming (Beamforming, BF), where the voice endpoint detection is used to detect a start position Of a voice signal, obtain an effective voice signal containing voice information and an invalid voice signal not containing voice information, the noise reduction processing is used to reduce noise interference in the effective voice signal and improve a signal-to-noise ratio, the echo Cancellation is used to cancel echo in the effective voice signal, the sound source localization is based on the voice signals collected by the microphone array, and determines a position Of a speaking passenger, and the beam forming is used to integrate multiple voice signals collected by the microphone array into one voice signal, so as to further accurately localize a sound source.
In some embodiments of the present application, the voice interaction includes speech recognition (ASR), semantic Understanding (NLU), Dialog Management (DM), speech processing (NLG), and speech synthesis (TTS), the speech recognition is used to convert a valid speech signal containing speech information into first Text information, the semantic Understanding is used to understand meaning of the first Text information, the dialog management is based on the semantic Understanding, determines whether a passenger's current speech dialog is ended, and generates a decision, the dialog processing is based on the decision applying a preset speech technology to generate a second Text, and the speech synthesis generates speech from the second Text and feeds the second Text back to the car terminal for playing.
In some embodiments of the present application, a conversational processing (NLG) includes one or more of the following methods: the voice interaction link selects a default conversational style, the voice interaction link selects a template configuration conversational style, or the voice interaction link selects a model generation conversational style.
In some embodiments of the present application, as shown in fig. 6, a voice interactive link is created according to whether a passenger in a sound zone creates a corresponding voice interactive link, the voice interactive link manages a conversation with the corresponding passenger, and in order to make the conversation more convenient and natural, the voice interactive link selects a conversation template, and the conversation template can adjust a voice style according to an object of the conversation, for example, daddy sits in a driving position, daddy sits in a passenger position, daddy says: "help me to look up today's meeting schedule. "at this moment, the voice interactive link of the driving seat selects a conversational template, uses the character style voice of a private secretary, rigorously specializes the mood, and broadcasts important information such as meeting arrangement, time and place, attendees, meeting subjects and the like through the vehicle terminal, and at this moment, the minor explanation of the assistant driver: "I want to see the piglet cookie. ", the voice communication link of the driving seat will be switched to the monitoring state, the conversation is suspended, the voice interaction link of the assistant driver will select the conversation template, the role style voice of a preschool teacher is used, the emotion is very sweet and has affinity, and the lovely piglet is broadcasted through the vehicle terminal! ", i.e. different speech interaction styles may be employed for different speech interaction objects.
Therefore, the voice interaction link can dynamically adjust the dialect and the emotion of the vehicle terminal according to the characteristics of different passengers, so that the voice interaction process in the carriage is more real, natural and personalized.
The above processing of speech signals is a routine process step for those skilled in the art, can be understood and readily implemented, and is therefore not described in detail.
In some embodiments of the present application, the voice interaction further comprises:
when a target vocabulary of a first passenger in the first sound zone is identified by a Natural Language Understanding (NLU) and points to a second passenger, the first voice interaction link is switched to a monitoring state, and the second passenger is switched from the monitoring state to a voice processing state corresponding to the second sound zone, the first voice interaction link is used for processing a voice signal of the second sound zone where the second passenger is located, and,
and when the second voice interaction link obtains the processing result of the voice signal of the second sound zone, feeding the processing result back to the vehicle terminal, and performing voice broadcast by the vehicle terminal, wherein the state of the second voice interaction link is switched from the voice processing state to the monitoring state, and the state of the first voice interaction link is switched from the monitoring state to the voice processing state.
In some embodiments of the present application, the voice interaction further comprises: and when the voice interactive link does not acquire the voice signal within the preset time range of the voice processing state, the voice interactive link is switched to the monitoring state from the voice processing state.
In some embodiments of the present application, the target vocabulary includes one or more of a user name, a user nickname, and a user name.
As shown in fig. 7, the voice interactive link monitors the voice signal of the corresponding sound zone according to whether there is a passenger in the sound zone to create the voice interactive link, for example, the voice interactive link of the first sound zone monitors the passenger speaking in the first sound zone in the monitoring state: "i hungry", wherein "i hungry" belongs to a wake-up voice signal, the voice interaction link is switched from the monitoring state to the voice processing state to talk with the passenger, the passenger is asked whether to navigate to a nearby restaurant for eating through the vehicle-mounted terminal, and the voice processing state of the voice interaction link waits for the passenger to continue inputting the voice signal, for example, the passenger answers: "mom you come to select a restaurant", where "mom" is a target vocabulary, the voice interaction link of the first sound zone recognizes the target vocabulary "mom" in the voice instruction of the passenger, and determines that mom is in the second sound zone, the voice interaction link of the first sound zone is switched from the voice processing state to the monitoring state, and the voice interaction link of the second sound zone where mom is located is changed from the monitoring state to the voice processing state, for example, mom answers: and when the user goes to eat the hot pot, the voice interaction link of the second sound zone is in conversation with the mother, the mother is asked to go to which hot pot shop through the vehicle terminal, and the mother answers: the voice interaction link of the second sound zone inquires whether mom navigates to a nearby hot pot restaurant to eat through the vehicle terminal, and if mom says: the navigation is carried out, a voice interaction link of the second sound zone judges that conversation is finished based on voice signals of the mother, processing results of the first hot pot restaurant which needs to be navigated by the mother are fed back to the car terminal, the car terminal displays a navigation route to help passengers to navigate to nearby fast food restaurants for dining, when the voice interaction link of the second sound zone does not capture the voice signals within a certain time range, the voice conversation is judged to be finished, the voice interaction link of the second sound zone is switched to a monitoring state through a voice processing state, therefore, the voice interaction link can be switched to each sound zone to interact with the passengers in each sound zone through natural language voice, and finally, users in each sound zone can control the car terminal through voice to obtain various services such as navigation, entertainment and the like.
In some embodiments of the application, when the voice interaction link is in the switching state, the car terminal sends a voice message to notify the passenger corresponding to the sound zone, so that the passenger can know the state of the voice interaction link, and therefore the passenger can be reminded whether to enter the state of the voice operation car machine, and voice interaction can be performed with the car terminal more accurately.
In some embodiments of the present application, the sound zones include a main driver seat zone and/or other individual passenger seat zones, and are generally divided into four-tone zones and eight-tone zones inside the vehicle compartment, each zone corresponding to a passenger seat position, depending on the vehicle type.
In some embodiments of the application, the car machine terminal performs voice broadcast according to the time sequence of the processing result, so that the situation that the car machine terminal simultaneously broadcasts the processing results of a plurality of voice interaction links to cause confusion is avoided.
The present application further provides an electronic device comprising a processor and a memory,
the memory stores instructions, and the processor is used for reading the instructions stored in the memory so as to execute any step of the vehicle multi-sound-zone voice interaction method.
Therefore, according to the vehicle multi-sound-zone voice interaction method and the electronic device, the vehicle terminal can identify passengers in different sound zones of the carriage, voice objects of conversation are switched according to voice instructions of the passengers, and according to the characteristics of the different passengers, the speech and emotion of the vehicle terminal are dynamically adjusted, so that the voice interaction process in the carriage is more real, more natural and more personalized, meanwhile, the vehicle terminal can accurately distinguish, identify and process the voice instructions of the different passengers, finally, the vehicle terminal can quickly respond to various operations of the different passengers on vehicle setting, navigation, music, video and the like, and the interaction in the carriage is more convenient and faster.
It is noted that, in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.
The foregoing is a preferred embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and refinements can be made without departing from the principle described in the present application, and these modifications and refinements should be regarded as the protection scope of the present application.

Claims (17)

1. A vehicular multi-zone voice interaction method, the method comprising:
the car machine terminal respectively creates voice interaction links corresponding to one or more positions of the sound zones according to the positions of the sound zones;
the car machine terminal sets the voice interaction link to be in a monitoring state, and the monitoring state is used for monitoring voice signals of the sound zone for awakening;
when one or more voice interaction links monitor the voice signals of the sound zone, the vehicle-mounted terminal switches the one or more voice interaction links to a voice processing state, wherein the voice processing state is used for processing the voice signals input by passengers in the corresponding sound zone;
and the vehicle-mounted terminal acquires the voice interaction link, determines a processing result based on the voice signal, and performs voice broadcast.
2. The method according to claim 1, wherein the in-vehicle terminal respectively creates voice interactive links corresponding to the positions of one or more than one sound zone according to the positions of the sound zones, and the method comprises the following steps:
and when the car terminal identifies that the passenger is in the sound zone, establishing the voice interaction link for the sound zone where the passenger is located.
3. The method of claim 2, wherein the car terminal identifies that the passenger is included in the zone, comprising:
the vehicle-mounted terminal acquires an ID for identifying a passenger;
a confidence level that the passenger is a registered user based on the ID identifying the passenger,
when the confidence coefficient is larger than a preset value, determining that the passenger is a registered user, and acquiring user information of the passenger;
and when the confidence coefficient is smaller than a preset value, judging the passenger as a new user, and registering user information.
4. The method of claim 3, wherein the registered user information comprises:
and the vehicle-mounted terminal acquires an ID for identifying the user, and one or more of a user name, a user nickname, a user age and user preferences.
5. The method of claim 4, wherein the ID for identifying the user comprises: one or more of a face ID, a voiceprint ID and an iris ID.
6. The method of claim 1, wherein the voice processing state of the voice interactive link comprises:
front-end signal processing, including the voice interactive link acquiring the voice signal corresponding to the sound zone, and preprocessing the voice signal to obtain a high-quality voice signal;
and voice interaction, including the voice interaction link, carrying out voice conversation with the passenger based on the high-quality voice signal.
7. The method of claim 6, wherein the front-end signal processing further comprises:
voice endpoint detection, which is used for detecting the initial position of the voice signal and acquiring an effective voice signal containing voice information and an invalid voice signal not containing voice information;
the noise reduction processing is used for reducing noise interference in the effective voice signal and improving the signal-to-noise ratio;
echo cancellation for canceling echo in the valid speech signal;
sound source positioning, namely determining the position of a speaking passenger based on the voice signals collected by the microphone array;
and the beam forming is used for integrating the multiple paths of voice signals collected by the microphone array into one path of voice signal so as to further accurately position the sound source.
8. The method of claim 6 or 7, wherein the voice interaction comprises:
the voice recognition is used for converting an effective voice signal containing voice information into first text information;
semantic understanding for understanding a meaning of the first text information;
conversation management, namely judging whether the current voice conversation of the passenger is finished or not based on the semantic understanding, and generating a decision;
performing a phonetics processing, and generating a second text based on the decision-making sleeve by using a preset phonetics;
and voice synthesis, namely generating voice from the second text and feeding the voice back to the vehicle terminal for playing.
9. The method according to claim 8, characterized in that said tactical processing (NLG) comprises one or more of the following methods:
selecting a default dialect style for the voice interaction link;
the voice interaction link selection template configures a conversational style;
the voice interaction link selection model generates a conversational style.
10. The method of claim 1, wherein the voice interaction further comprises:
a first voice interaction link is in voice conversation with a first passenger in a first sound zone, when the semantic understanding identifies a target vocabulary of the first passenger in the first sound zone and the target vocabulary points to a second passenger, the first voice interaction link is switched into a monitoring state, the state of a second voice interaction link corresponding to the second sound zone of the second passenger is switched from the monitoring state into a voice processing state for processing a voice signal of the second sound zone where the second passenger is located, and,
and when the second voice interaction link obtains the processing result of the voice signal of the second sound zone, feeding the processing result back to the vehicle terminal, and performing voice broadcasting by the vehicle terminal, wherein the state of the second voice interaction link is switched from a voice processing state to a monitoring state, and the state of the first voice interaction link is switched from the monitoring state to the voice processing state.
11. The method of claim 1, wherein the voice interaction further comprises:
and when the voice interactive link does not acquire the voice signal within the preset time range of the voice processing state, the voice interactive link is switched to a monitoring state from the voice processing state.
12. The method of claim 10, wherein the target vocabulary comprises:
one or more of a user title, a user nickname, and a user name.
13. The method according to claim 10, wherein when the voice interactive link is switched to the state, the car terminal sends a voice message to notify the passenger corresponding to the zone, so that the passenger can know the state of the voice interactive link.
14. The method of claim 1, wherein a plurality of said voice interaction links can process said voice signals input by passengers of a plurality of phonemes in parallel.
15. The method of claim 1, wherein the soundfield comprises a main driver seating area and/or other individual passenger seating areas.
16. The method according to claim 1, wherein the in-vehicle terminal performs voice broadcast according to the time sequence of the processing result.
17. An electronic device comprising a processor and a memory,
the memory has stored therein an instruction that,
the processor to read the instructions stored in the memory to perform the method of any of claims 1-16.
CN202010630094.6A 2020-07-03 2020-07-03 Multi-voice-zone voice interaction method for vehicle and electronic equipment Active CN111816189B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010630094.6A CN111816189B (en) 2020-07-03 2020-07-03 Multi-voice-zone voice interaction method for vehicle and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010630094.6A CN111816189B (en) 2020-07-03 2020-07-03 Multi-voice-zone voice interaction method for vehicle and electronic equipment

Publications (2)

Publication Number Publication Date
CN111816189A true CN111816189A (en) 2020-10-23
CN111816189B CN111816189B (en) 2023-12-26

Family

ID=72856710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010630094.6A Active CN111816189B (en) 2020-07-03 2020-07-03 Multi-voice-zone voice interaction method for vehicle and electronic equipment

Country Status (1)

Country Link
CN (1) CN111816189B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112599133A (en) * 2020-12-15 2021-04-02 北京百度网讯科技有限公司 Vehicle-based voice processing method, voice processor and vehicle-mounted processor
CN114179613A (en) * 2021-12-10 2022-03-15 常州星宇车灯股份有限公司 Audio-video touch interactive control method for copilot control panel
CN114678026A (en) * 2022-05-27 2022-06-28 广州小鹏汽车科技有限公司 Voice interaction method, vehicle terminal, vehicle and storage medium
WO2024078435A1 (en) * 2022-10-09 2024-04-18 蔚来汽车科技(安徽)有限公司 Method for dynamically switching speech zones, speech interaction method, device, medium, and vehicle

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004172781A (en) * 2002-11-19 2004-06-17 Hitachi Ltd Information processing apparatus for voice interaction, voice interaction processing system, and car navigation terminal
JP2006189394A (en) * 2005-01-07 2006-07-20 Toyota Motor Corp Vehicle agent device
JP2009073428A (en) * 2007-09-24 2009-04-09 Clarion Co Ltd In-vehicle apparatus and system
DE102008051757A1 (en) * 2007-11-12 2009-05-14 Volkswagen Ag Multimodal user interface of a driver assistance system for entering and presenting information
JP2010102163A (en) * 2008-10-24 2010-05-06 Xanavi Informatics Corp Vehicle interior voice interaction device
US20150006167A1 (en) * 2012-06-25 2015-01-01 Mitsubishi Electric Corporation Onboard information device
CN107230476A (en) * 2017-05-05 2017-10-03 众安信息技术服务有限公司 A kind of natural man machine language's exchange method and system
DE102017109734A1 (en) * 2016-05-06 2017-11-09 GM Global Technology Operations LLC SYSTEM FOR PROVIDING PASSENGER-SPECIFIC ACOUSTICS FUNCTIONS IN A TRANSPORT VEHICLE
CN107340991A (en) * 2017-07-18 2017-11-10 百度在线网络技术(北京)有限公司 Switching method, device, equipment and the storage medium of speech roles
CN109754803A (en) * 2019-01-23 2019-05-14 上海华镇电子科技有限公司 Vehicle multi-sound area voice interactive system and method
CN110033775A (en) * 2019-05-07 2019-07-19 百度在线网络技术(北京)有限公司 Multitone area wakes up exchange method, device and storage medium
CN110070868A (en) * 2019-04-28 2019-07-30 广州小鹏汽车科技有限公司 Voice interactive method, device, automobile and the machine readable media of onboard system
CN110211585A (en) * 2019-06-05 2019-09-06 广州小鹏汽车科技有限公司 In-car entertainment interactive approach, device, vehicle and machine readable media
US20200075006A1 (en) * 2018-08-29 2020-03-05 Alibaba Group Holding Limited Method, system, and device for interfacing with a terminal with a plurality of response modes
WO2020070878A1 (en) * 2018-10-05 2020-04-09 本田技研工業株式会社 Agent device, agent control method, and program
JP2020060623A (en) * 2018-10-05 2020-04-16 本田技研工業株式会社 Agent system, agent method, and program
WO2020079733A1 (en) * 2018-10-15 2020-04-23 三菱電機株式会社 Speech recognition device, speech recognition system, and speech recognition method

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004172781A (en) * 2002-11-19 2004-06-17 Hitachi Ltd Information processing apparatus for voice interaction, voice interaction processing system, and car navigation terminal
JP2006189394A (en) * 2005-01-07 2006-07-20 Toyota Motor Corp Vehicle agent device
JP2009073428A (en) * 2007-09-24 2009-04-09 Clarion Co Ltd In-vehicle apparatus and system
DE102008051757A1 (en) * 2007-11-12 2009-05-14 Volkswagen Ag Multimodal user interface of a driver assistance system for entering and presenting information
JP2010102163A (en) * 2008-10-24 2010-05-06 Xanavi Informatics Corp Vehicle interior voice interaction device
US20150006167A1 (en) * 2012-06-25 2015-01-01 Mitsubishi Electric Corporation Onboard information device
DE102017109734A1 (en) * 2016-05-06 2017-11-09 GM Global Technology Operations LLC SYSTEM FOR PROVIDING PASSENGER-SPECIFIC ACOUSTICS FUNCTIONS IN A TRANSPORT VEHICLE
CN107230476A (en) * 2017-05-05 2017-10-03 众安信息技术服务有限公司 A kind of natural man machine language's exchange method and system
CN107340991A (en) * 2017-07-18 2017-11-10 百度在线网络技术(北京)有限公司 Switching method, device, equipment and the storage medium of speech roles
US20200075006A1 (en) * 2018-08-29 2020-03-05 Alibaba Group Holding Limited Method, system, and device for interfacing with a terminal with a plurality of response modes
WO2020070878A1 (en) * 2018-10-05 2020-04-09 本田技研工業株式会社 Agent device, agent control method, and program
JP2020060623A (en) * 2018-10-05 2020-04-16 本田技研工業株式会社 Agent system, agent method, and program
WO2020079733A1 (en) * 2018-10-15 2020-04-23 三菱電機株式会社 Speech recognition device, speech recognition system, and speech recognition method
CN109754803A (en) * 2019-01-23 2019-05-14 上海华镇电子科技有限公司 Vehicle multi-sound area voice interactive system and method
CN110070868A (en) * 2019-04-28 2019-07-30 广州小鹏汽车科技有限公司 Voice interactive method, device, automobile and the machine readable media of onboard system
CN110033775A (en) * 2019-05-07 2019-07-19 百度在线网络技术(北京)有限公司 Multitone area wakes up exchange method, device and storage medium
CN110211585A (en) * 2019-06-05 2019-09-06 广州小鹏汽车科技有限公司 In-car entertainment interactive approach, device, vehicle and machine readable media

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112599133A (en) * 2020-12-15 2021-04-02 北京百度网讯科技有限公司 Vehicle-based voice processing method, voice processor and vehicle-mounted processor
CN114179613A (en) * 2021-12-10 2022-03-15 常州星宇车灯股份有限公司 Audio-video touch interactive control method for copilot control panel
CN114179613B (en) * 2021-12-10 2024-03-05 常州星宇车灯股份有限公司 Audio-video touch interactive control method for co-driver control panel
CN114678026A (en) * 2022-05-27 2022-06-28 广州小鹏汽车科技有限公司 Voice interaction method, vehicle terminal, vehicle and storage medium
CN114678026B (en) * 2022-05-27 2022-10-14 广州小鹏汽车科技有限公司 Voice interaction method, vehicle terminal, vehicle and storage medium
WO2023227129A1 (en) * 2022-05-27 2023-11-30 广州小鹏汽车科技有限公司 Voice interaction method, head unit terminal, vehicle and storage medium
WO2024078435A1 (en) * 2022-10-09 2024-04-18 蔚来汽车科技(安徽)有限公司 Method for dynamically switching speech zones, speech interaction method, device, medium, and vehicle

Also Published As

Publication number Publication date
CN111816189B (en) 2023-12-26

Similar Documents

Publication Publication Date Title
CN111816189A (en) Multi-tone-zone voice interaction method for vehicle and electronic equipment
CN110660397B (en) Dialogue system, vehicle and method for controlling a vehicle
CN109785828B (en) Natural language generation based on user speech styles
CN108573702B (en) Voice-enabled system with domain disambiguation
KR102014665B1 (en) User training by intelligent digital assistant
CN109712615A (en) System and method for detecting the prompt in dialogic voice
KR102249392B1 (en) Apparatus and method for controlling device of vehicle for user customized service
US20040215453A1 (en) Method and apparatus for tailoring an interactive voice response experience based on speech characteristics
US11211033B2 (en) Agent device, method of controlling agent device, and storage medium for providing service based on vehicle occupant speech
JP2017067849A (en) Interactive device and interactive method
CN112015530A (en) System and method for integrating third party services with digital assistants
CN111145721A (en) Personalized prompt language generation method, device and equipment
CN112614491B (en) Vehicle-mounted voice interaction method and device, vehicle and readable medium
CN103124318A (en) Method of initiating a hands-free conference call
JP2020080503A (en) Agent device, agent presentation method, and program
CN109785827A (en) The neural network used in speech recognition arbitration
CN112000787A (en) Voice interaction method, server and voice interaction system
JP2020095121A (en) Speech recognition system, generation method for learned model, control method for speech recognition system, program, and moving body
CN111081244A (en) Voice interaction method and device
US11709065B2 (en) Information providing device, information providing method, and storage medium
US20220324460A1 (en) Information output system, server device, and information output method
GB2596141A (en) Driving companion
JP2019212168A (en) Speech recognition system and information processing device
JP2020060623A (en) Agent system, agent method, and program
US20220208185A1 (en) Speech Dialog System for Multiple Passengers in a Car

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant