CN111816189B - Multi-voice-zone voice interaction method for vehicle and electronic equipment - Google Patents

Multi-voice-zone voice interaction method for vehicle and electronic equipment Download PDF

Info

Publication number
CN111816189B
CN111816189B CN202010630094.6A CN202010630094A CN111816189B CN 111816189 B CN111816189 B CN 111816189B CN 202010630094 A CN202010630094 A CN 202010630094A CN 111816189 B CN111816189 B CN 111816189B
Authority
CN
China
Prior art keywords
voice
passenger
zone
link
voice interaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010630094.6A
Other languages
Chinese (zh)
Other versions
CN111816189A (en
Inventor
杨扬
袁志俊
吴晓敏
王恺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zebred Network Technology Co Ltd
Original Assignee
Zebred Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zebred Network Technology Co Ltd filed Critical Zebred Network Technology Co Ltd
Priority to CN202010630094.6A priority Critical patent/CN111816189B/en
Publication of CN111816189A publication Critical patent/CN111816189A/en
Application granted granted Critical
Publication of CN111816189B publication Critical patent/CN111816189B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides a multi-voice-zone voice interaction method for a vehicle and electronic equipment, comprising the following steps: the vehicle terminal respectively creates voice interaction links corresponding to the positions of one or more voice areas according to the positions of the voice areas; the vehicle terminal sets a voice interaction link in a monitoring state and is used for monitoring voice signals of a voice zone for waking up; when one or more voice interaction links monitor voice signals of the voice zone, the vehicle terminal switches the one or more voice interaction links to a voice processing state, and the voice processing state is used for processing voice signals input by passengers in the corresponding voice zone; and the vehicle terminal acquires a voice interaction link, determines a processing result based on the voice signal and performs voice broadcasting. According to the embodiment of the application, the voice interaction object can be switched when the car machine terminal and passengers in the carriage are in conversation, and the speaking operation and emotion of the car machine terminal are adjusted according to the characteristics of different passengers, so that the voice interaction process in the carriage is more real, natural and personalized.

Description

Multi-voice-zone voice interaction method for vehicle and electronic equipment
Technical Field
The application relates to the technical field of automobile electronics, in particular to a multi-voice-zone voice interaction method for a vehicle.
Background
In the current driving process, a certain problem (or a certain round of dialogue) needs to be transferred to another person to answer and complete the dialogue, and in a multitone area environment (such as 4, 6 multitone areas and other technologies), the terminal of the vehicle can know which user sits in which seat in the carriage and also can know which seat user is in voice interaction. However, the existing technical scheme of voice interaction multi-voice zone in the automobile scene does not support the switching of the interaction object through voice.
Disclosure of Invention
In view of this, the application provides a voice interaction method in a multitone area for a vehicle, which can realize that a voice interaction object is switched when a vehicle terminal and passengers in a carriage are in dialogue, and dynamically adjust the speech operation and emotion of the vehicle terminal according to the characteristics of different passengers, so that the voice interaction process in the carriage is more real, natural and personalized.
In order to solve the technical problems, the application adopts the following technical scheme:
in a first aspect, the present application provides a method for voice interaction in a multitone area for a vehicle, the method comprising:
the vehicle terminal respectively creates voice interaction links corresponding to the positions of one or more voice areas according to the positions of the voice areas;
the vehicle-mounted terminal sets a voice interaction link to be in a monitoring state, wherein the monitoring state is used for monitoring voice signals of a voice zone and used for waking up;
when one or more voice interaction links monitor voice signals of the voice zone, the vehicle terminal switches the one or more voice interaction links to a voice processing state, and the voice processing state is used for processing voice signals input by passengers in the corresponding voice zone;
and the vehicle terminal acquires a voice interaction link, determines a processing result based on the voice signal and performs voice broadcasting.
As an embodiment of the first aspect of the present application, the vehicle terminal creates voice interaction links corresponding to the positions of one or more voice zones according to the positions of the voice zones, respectively, including:
when the vehicle terminal recognizes that passengers exist in the voice zone, a voice interaction link is established for the voice zone where the passengers are located.
As an embodiment of the first aspect of the present application, the vehicle terminal recognizes that the sound zone contains a passenger, including:
the vehicle terminal acquires an ID for identifying the passenger;
a confidence that the passenger is a registered user is judged based on the ID for identifying the passenger,
when the confidence coefficient is larger than a preset value, determining that the passenger is a registered user, and acquiring user information of the passenger;
and when the confidence coefficient is smaller than a preset value, judging that the passenger is a new user, and registering user information.
As an embodiment of the first aspect of the present application, registering the user information includes:
the ID obtained by the car terminal for identifying the user, and one or more of the user name, the user nickname, the user age, and the user preference.
As an embodiment of the first aspect of the present application, the ID for identifying the user includes: one or more of face ID, voiceprint ID, and iris ID.
As an embodiment of the first aspect of the present application, the speech processing state of the speech interaction link includes:
front-end signal processing, which comprises the steps that the voice interactive link acquires voice signals of corresponding voice areas and performs preprocessing to obtain high-quality voice signals;
the voice interaction includes the voice interaction link conducting a voice conversation with the passenger based on the high quality voice signal.
As an embodiment of the first aspect of the present application, the front-end signal processing further includes:
a voice endpoint detection, for detecting a starting position of a voice signal, and obtaining an effective voice signal containing voice information and an ineffective voice signal not containing voice information;
noise reduction processing, which is used for reducing noise interference in effective voice signals and improving signal to noise ratio;
echo cancellation for canceling echo in an active speech signal;
sound source localization, which is to determine the position of the speaking passenger based on the voice signals collected by the microphone array;
and the beam forming is used for integrating the multipath voice signals collected by the microphone array into one path of voice signals and further accurately positioning the sound source.
As an embodiment of the first aspect of the present application, the voice interaction includes:
speech recognition for converting an active speech signal containing speech information into first text information;
semantic understanding for understanding meaning of the first text information;
dialog management, judging whether the current voice dialog of the passenger is ended or not based on semantic understanding, and generating a decision;
performing speaking operation processing, namely generating a second text based on a preset speaking operation applied by a decision sleeve;
and synthesizing the voice, generating the voice from the second text and feeding the voice back to the vehicle terminal for playing.
As an embodiment of the first aspect of the present application, the session processing (NLG) comprises one or more of the following methods:
the voice interaction link selects a default speaking style;
selecting a template configuration speech style by a voice interaction link;
the speech interaction link selection model generates a speech style.
As an embodiment of the first aspect of the present application, the voice interaction further comprises:
the first voice interaction link is in voice conversation with a first passenger in the first voice zone, when the semantic understanding identifies the target vocabulary of the first passenger in the first voice zone and the target vocabulary points to a second passenger, the first voice interaction link is switched into a monitoring state, the second voice interaction link state of the second passenger corresponding to the second voice zone is switched from the monitoring state to a voice processing state for processing the voice signal of the second voice zone where the second passenger is located, and,
and when the second voice interaction link obtains the processing result of the voice signal of the second voice region, feeding back the processing result to the vehicle terminal, and performing voice broadcasting by the vehicle terminal, wherein the state of the second voice interaction link is switched from the voice processing state to the monitoring state, and the state of the first voice interaction link is switched from the monitoring state to the voice processing state.
As an embodiment of the first aspect of the present application, the voice interaction further comprises:
when the voice interaction link does not acquire the voice signal within the preset time range of the voice processing state, the voice interaction link is switched to the monitoring state by the voice processing state.
As an embodiment of the first aspect of the present application, the target vocabulary includes:
one or more of a user name, a user nickname, and a user name.
As an embodiment of the first aspect of the present application, when the voice interactive link switches the state, the vehicle terminal sends a voice message to notify the passenger in the corresponding voice zone, so that the passenger can learn the state of the voice interactive link.
As an embodiment of the first aspect of the present application, the plurality of voice interaction links may process the voice signals input by passengers of the plurality of voice zones in parallel.
As an embodiment of the first aspect of the present application, the soundtrack includes a main driver seat area and/or other individual passenger seat areas.
As an embodiment of the first aspect of the present application, the vehicle terminal performs voice broadcasting according to a time sequence of obtaining the processing result.
In a second aspect, embodiments of the present application provide an electronic device, comprising a processor and a memory,
the memory has stored therein instructions that,
and the processor is used for reading the instructions stored in the memory to execute the multi-voice-zone voice interaction method for the vehicle.
The technical scheme of the application has at least one of the following beneficial effects:
according to the multi-voice-zone voice interaction method for the vehicle and the electronic equipment, the vehicle terminal can identify passengers in different voice zones of the carriage, voice objects of conversation are switched according to voice instructions of the passengers, and according to characteristics of different passengers, the voice operation and emotion of the vehicle terminal are dynamically adjusted, so that the voice interaction process in the carriage is more real, more natural and more personalized, meanwhile, the vehicle terminal can accurately distinguish, identify and process the voice instructions of different passengers, and finally, the vehicle terminal can quickly respond to various operations of different passengers on vehicle setting, navigation, music, video and the like, and the interaction in the carriage is more convenient and quicker.
Drawings
Fig. 1 is a scene diagram of a multi-voice-zone voice interaction method for a vehicle according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for a voice interactive link to passenger dialogue in accordance with an embodiment of the present application;
FIG. 3 is a flow chart of a method of registering occupant user information in an embodiment of the present application;
FIG. 4 is a flow chart of a method of detecting and identifying passengers according to an embodiment of the present application;
FIG. 5 is a flowchart of a method for multi-voice zone voice interaction for a vehicle according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a method for processing a speech signal according to an embodiment of the present application;
fig. 7 is a flowchart of a method for switching voice interaction links according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Embodiments of the present application are described below in conjunction with specific scenarios.
Fig. 1 is a scene diagram of a voice interaction between a passenger in a vehicle cabin and a vehicle terminal according to one embodiment of the present application. As shown in fig. 1, in the scene car, there are passengers, each passenger sits at a different position of the car, for example, a main driver's seat, a co-driver's seat and a rear-row seat, each position corresponds to a voice zone, the car terminal first determines whether there are passengers in the voice zone, when there are passengers, a voice interaction link corresponding to the voice zone is created, several passengers on the car create several voice interaction links for processing a voice dialogue between the passenger corresponding to the voice zone and the car terminal, and several voice interaction links are parallel, so that the passengers at different positions can perform voice interaction with the car terminal, for example, the small bright father sits at the driver's seat, the small bright sit at the co-driver's seat, when the car terminal is started, the voice zone of the driver's seat and the voice zone of the co-driver's seat are determined to have passengers, two voice interaction links are created, wherein one of the car terminal manages a small bright father dialogue of the driving zone and the other car terminal manages a small bright dialogue of the sub-voice zone, and the two created voice interaction links are in a monitoring state, when the small bright father is in which states: "help me navigate to a western restaurant. The voice interaction link of the driver's seat is switched from the monitoring state to the voice processing state, and the voice interaction link and the dad of the driver answer in a dialogue: "good, 5 western restaurants were found. ", and broadcast through the car machine terminal, at this moment dad will receive a phone call, say: "I have something to do, let a small explicit bar-! At this time, the voice interaction link of the driver bit is switched from the voice processing state to the monitoring state, the dialogue is paused, the voice exchange link of the assistant driving voice zone is switched from the monitoring state to the voice processing state, and the driver waits for the small-scale voice input and the small-scale specification: go to 3 rd beefsteak bar-! ", at this time, the voice exchange link of the secondary driving voice zone replies: "Xiaoming has selected the West dyke steak to watch Beijing shop, begin to navigate for your. And the voice object is broadcasted through the vehicle-mounted terminal, so that the switching of the voice object in the multitone area in the vehicle can be realized, and the interaction in the vehicle is more convenient and quicker.
In some embodiments of the present application, user information of a passenger is first confirmed, for example, when the passenger enters the cabin, the vehicle terminal is started to start detecting and identifying the user, and the camera of the vehicle terminal identifies face images of the passenger at various positions in the cabin and positions sitting in the cabin.
After confirming the user information and the position information of the car passenger, the voice interactive link is started to be established to manage the dialogue between the passenger and the terminal of the car.
The following describes a multi-voice-zone voice interaction method for a vehicle with reference to the accompanying drawings, fig. 2 shows a flowchart of the multi-voice-zone voice interaction method for a vehicle, the method is applied to a vehicle terminal, and realizes that the multi-voice zone switches interaction objects of the vehicle terminal through voice, as shown in fig. 2, the method comprises:
in step S210, the vehicle terminal creates voice interaction links corresponding to the positions of one or more voice areas according to the positions of the voice areas, where one voice interaction link manages a voice dialogue of one voice area, and specifically creates a voice interaction link.
In step S220, the vehicle terminal sets the voice interaction link in a listening state, where the listening state is used to listen to the voice signal used for waking up in the voice zone, that is, when the voice interaction link is created, the voice interaction link is in a listening state, and listens to the voice signal of the corresponding voice zone at any time, where the number of voice signals used for waking up is hundreds, so as to cover most of daily vehicle scenes such as navigation, entertainment, vehicle control, service, etc., and meanwhile, in order to respond to the voice signal of waking up more accurately, the voice interaction link adopts the corpus training of nearly millions of levels, so that the voice interaction link can respond to the voice signal of the passenger at any time and process the voice signal of the passenger more accurately.
In step S230, when the one or more voice interaction links monitor the voice signal of the voice zone, the vehicle terminal switches the one or more voice interaction links to a voice processing state, where the voice processing state is used to process the voice signal input by the passenger in the corresponding voice zone, for example, the passenger a speaks: the method comprises the following steps of helping me to check the weather of today, wherein the help me to check the weather belongs to a wake-up voice signal, a voice interaction link corresponding to a voice zone where a passenger A is located is switched to a voice processing state to carry out conversation with the passenger A by a monitoring state, a conversation statement is broadcasted through a vehicle terminal, and a passenger B says: the voice interaction link corresponding to the voice zone where the passenger B is located is switched to the voice processing state to carry out dialogue with the passenger B by the monitoring state, so that voice interaction of passengers in multiple voice zones can be realized.
And step S240, the vehicle-mounted terminal acquires a voice interaction link, determines a processing result based on the voice signal and performs voice broadcasting. For example, according to the previous step, the voice interactive link of passenger a determines the processing result based on the voice signal as follows: inquiring weather, feeding back to the vehicle terminal, displaying weather forecast by the vehicle terminal, and determining the processing result of the voice interaction link of the passenger B based on the voice signal: and playing the rock music, feeding the rock music back to the vehicle terminal, and playing the rock music by the vehicle terminal, so that the voice interaction link of each sound zone can interact with passengers in each sound zone by natural language voice, and finally the user in each sound zone can control the vehicle terminal by voice to obtain various services such as navigation, entertainment and the like.
Therefore, the terminal of the vehicle machine of the passenger in different sound areas of the vehicle can quickly respond to various operations of different passengers on the vehicle, such as setting, navigation, music, video and the like, the interactive object of the vehicle multi-sound area through the voice switching terminal of the vehicle is realized, and the interaction in the carriage is more convenient and quicker.
In some embodiments of the present application, the vehicle terminal identifies that the sound zone contains a passenger, including: the vehicle terminal acquires an ID for identifying the passenger, judges the passenger as the confidence coefficient of the registered user based on the ID for identifying the passenger, determines the passenger as the registered user when the confidence coefficient is larger than a preset value, acquires the user information of the passenger, and judges the passenger as a new user and registers the user information when the confidence coefficient is smaller than the preset value. That is, before creating the voice interactive link, it is necessary to detect and identify the user information, as shown in fig. 3, and identify the user according to the face ID or the voiceprint ID, and acquire information such as a nickname of the passenger, an age of the passenger, or a taste of the passenger, for example, after the passenger enters the passenger compartment, the vehicle terminal is started to start detecting and identifying the user, the camera of the vehicle terminal identifies the face image of the passenger at each position in the passenger compartment, compares the face image with the face IDs of the user stored in the database, determines whether the passenger has registered the user information, determines that the passenger is a registered user when the confidence level of the face ID is greater than a preset value, acquires the user information such as nickname, age, taste, etc. of the passenger, further acquires the voiceprint ID of the passenger when the confidence level of the face ID is greater than the preset value, determines whether the passenger has registered the user information, and returns to the step of initially acquiring the face information when the confidence level of the voiceprint ID is less than the preset value, and then re-identifies the user.
In some embodiments of the present application, as shown in fig. 4, registering user information includes: the ID obtained by the car terminal for identifying the user, and one or more of the user name, the user nickname, the user age, and the user preference. That is, the premise of identifying the passenger is that the database already holds registered user information of the passenger, wherein. Registering user information requires an ID that uniquely identifies the user, for example: one or more of the face ID, voiceprint ID, and iris ID, after determining the ID identifying the user, associated user information including user name, user nickname, user age, user preference, etc. is also added to the user attribute table, thereby providing for a subsequent voice interaction link to talk to the passenger.
In some embodiments of the present application, as shown in fig. 5, the voice processing state of the voice interaction link includes front-end signal processing and voice interaction, where the front-end signal processing is that the voice interaction link obtains a voice signal of a corresponding voice zone and performs preprocessing to obtain a high-quality voice signal, and the voice interaction is that the voice interaction link performs a voice dialogue with a passenger based on the high-quality voice signal.
In some embodiments of the present application, the front-end signal processing further includes a voice endpoint detection (Voice activity detection, VAD), a noise reduction process, an echo cancellation (Acoustic Echo Cancellation, AEC), a sound source localization (Direction Of Arrival, DOA), and a Beamforming (BF), the voice endpoint detection is configured to detect a start position of a voice signal, obtain an effective voice signal containing voice information and an ineffective voice signal not containing voice information, the noise reduction process is configured to reduce noise interference in the effective voice signal, improve a signal-to-noise ratio, the echo cancellation is configured to cancel echo in the effective voice signal, the sound source localization is based on the voice signals collected by the microphone array, determine a location of a speaking passenger, and the Beamforming is configured to integrate multiple voice signals collected by the microphone array into one voice signal, and further accurately localize the sound source.
In some embodiments of the present application, the voice interaction includes voice recognition (Automatic Speech Recognition, ASR), semantic understanding (Natural Language Understanding, NLU), dialogue management (Dialog Management, DM), speech processing (Natural Language Generation, NLG), and Speech synthesis (TTS), the voice recognition is used To convert an effective voice signal containing voice information into first Text information, the semantic understanding is used To understand meaning of the first Text information, the dialogue management is based on the semantic understanding, determines whether a passenger's current voice dialogue is ended, and generates a decision, the Speech processing generates a second Text based on the decision applying a preset Speech, and the Speech synthesis generates and feeds back the second Text To the vehicle terminal for playing.
In some embodiments of the present application, the speech processing (Natural Language Generation, NLG) includes one or more of the following methods: the voice interaction link selects a default speech style, the voice interaction link selection template configures the speech style or the voice interaction link selection model generates the speech style.
In some embodiments of the present application, as shown in fig. 6, according to whether a passenger creates a corresponding voice interaction link in a voice zone, the voice interaction link manages a conversation with the corresponding passenger, in order to make the conversation more convenient and natural, the voice interaction link selects a conversation template, and the conversation template can adjust a voice style according to an object of the conversation, for example, dad sits in a driving position, dad sits in a secondary driving position, dad speaks: help me look into today's meeting schedule. The voice interaction link of the driver's seat selects a conversation template, voice of the character style of a private secretary is used, the emotion of the professional is strict, important information such as meeting arrangement, time and place, attendees, topics of the meeting and the like is broadcasted through the vehicle-mounted terminal, and the small instruction of the assistant driver is: "I want to look at the piggy's pecies". At the moment, the voice communication link of the driver's seat is switched to a monitoring state, the conversation is paused, the voice interaction link of the assistant driver selects a conversation template, the voice of the character style of a kindergarten teacher is used, the sound is very sweet and has an affinity emotion, and a lovely piggy's petty and a la ' are broadcast through a terminal of a vehicle machine! By the method, different voice interaction styles can be adopted for different voice interaction objects.
Therefore, the voice interaction link can dynamically adjust the speaking operation and emotion of the vehicle terminal according to the characteristics of different passengers, so that the voice interaction process in the carriage is more real, natural and personalized.
The above processing of speech signals is a routine processing step for a person skilled in the art, which is understood and easily implemented and therefore not described in detail.
In some embodiments of the present application, the voice interaction further comprises:
when semantic understanding (Natural Language Understanding, NLU) recognizes a target vocabulary of a first passenger in a first voice zone and the target vocabulary points to a second passenger, the first voice interactive link is switched to a listening state, and the second voice interactive link state of the second passenger corresponding to the second voice zone is switched from the listening state to a voice processing state for processing a voice signal of the second voice zone where the second passenger is located,
and when the second voice interaction link obtains the processing result of the voice signal of the second voice region, feeding back the processing result to the vehicle terminal, and performing voice broadcasting by the vehicle terminal, wherein the state of the second voice interaction link is switched from the voice processing state to the monitoring state, and the state of the first voice interaction link is switched from the monitoring state to the voice processing state.
In some embodiments of the present application, the voice interaction further comprises: when the voice interaction link does not acquire the voice signal within the preset time range of the voice processing state, the voice interaction link is switched to the monitoring state by the voice processing state.
In some embodiments of the present application, the target vocabulary includes one or more of a user name, a user nickname, and a user name.
As shown in fig. 7, according to whether a passenger creates a voice interaction link in a voice zone, the voice interaction link listens to a voice signal of a corresponding voice zone, for example, the voice interaction link of a first voice zone listens to the passenger speaking of the first voice zone in a listening state: "i starve" is a wake-up voice signal, and the voice interaction link is switched from the monitoring state to the voice processing state to talk with the passenger, and the passenger is inquired whether to navigate to a nearby restaurant for dining through the terminal of the vehicle, and the voice processing state of the voice interaction link waits for the passenger to continuously input the voice signal, for example, the passenger answers: "mom you select a restaurant", where "mom" is a target vocabulary, the voice interaction link of the first voice zone recognizes the target vocabulary "mom" in the voice command of the passenger, and determines that mom is in the second voice zone, the voice interaction link of the first voice zone is switched from the voice processing state to the monitoring state, and the voice interaction link of the second voice zone where mom is changed from the monitoring state to the voice processing state, for example, the mom answers: and (3) going to eat the chafing dish, the voice interaction link of the second voice zone is used for carrying out dialogue with the mother, and inquiring the chafing dish shop to which the mother goes through the vehicle machine terminal, and the mother answers: the voice interactive link of the second voice zone inquires whether the mother navigates to the nearby hot pot to eat through the vehicle terminal, and if the mother says: the voice interaction link of the second voice zone judges that the conversation is finished based on the voice signal of the mom, the processing result that the mom needs to navigate to the first hot pot shop is fed back to the car machine terminal, the car machine terminal displays a navigation route to help passengers navigate to nearby fast food restaurants for dining, when the voice interaction link of the second voice zone does not capture the voice signal within a period of time, the voice interaction link of the second voice zone judges that the conversation is finished, the voice interaction link of the second voice zone is switched to a monitoring state by a voice processing state, therefore, the voice interaction link can be switched to each voice zone to interact with the passengers of each voice zone by natural language voice, and finally, the users of each voice zone can control the car machine terminal by voice to obtain various services such as navigation, entertainment and the like.
In some embodiments of the present application, when the voice interaction link switches states, the vehicle terminal sends a voice message to notify the passenger in the corresponding voice zone, so that the passenger can learn the state of the voice interaction link, thereby reminding the passenger whether to enter the voice operation vehicle state, and more accurately performing voice interaction with the vehicle terminal.
In some embodiments of the present application, the audio zones include a main driver seat area and/or other passenger seat areas, and the interior of a typical automobile is divided into four audio zones and eight audio zones according to the type of vehicle, each audio zone corresponding to a passenger seat position.
In some embodiments of the present application, the vehicle terminal performs voice broadcasting according to a time sequence of obtaining the processing result, so as to avoid confusion caused by the vehicle terminal broadcasting the processing results of a plurality of voice interaction links at the same time.
The present application also provides an electronic device comprising a processor and a memory,
the memory stores instructions, and the processor is configured to read the instructions stored in the memory, so as to execute any step of the multi-voice-zone voice interaction method for the vehicle.
Therefore, the vehicle multi-voice-zone voice interaction method and the electronic equipment can be used for identifying passengers in different voice zones of a carriage, voice objects of a conversation can be switched according to voice instructions of the passengers, and according to characteristics of different passengers, the voice operation and emotion of the vehicle terminal can be dynamically adjusted, so that the voice interaction process in the carriage is more real, more natural and more personalized, meanwhile, the vehicle terminal can accurately distinguish, identify and process the voice instructions of different passengers, and finally, the vehicle terminal can quickly respond to various operations of different passengers on vehicle setting, navigation, music, video and the like, and the interaction in the carriage is more convenient and quicker.
It should be noted that in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
While the foregoing is directed to the preferred embodiments of the present application, it should be noted that modifications and adaptations to those embodiments may occur to one skilled in the art and that such modifications and adaptations are intended to be comprehended within the scope of the present application without departing from the principles set forth herein.

Claims (16)

1. A method of multi-voice zone voice interaction for a vehicle, the method comprising:
the vehicle-mounted terminal respectively creates voice interaction links corresponding to one or more positions of the voice areas according to the positions of the voice areas;
the vehicle-mounted terminal sets the voice interaction link to be in a monitoring state, and the monitoring state is used for monitoring voice signals of the voice zone for waking up;
when one or more voice interaction links monitor the voice signals of the voice zone, the vehicle terminal switches the one or more voice interaction links to a voice processing state, wherein the voice processing state is used for processing the voice signals input by passengers in the corresponding voice zone;
the vehicle terminal obtains the voice interaction link, determines a processing result based on the voice signal and performs voice broadcasting;
the voice interaction further includes: the first voice interaction link is in voice conversation with a first passenger in the first voice zone, when the semantic understanding identifies the target vocabulary of the first passenger in the first voice zone and the target vocabulary points to a second passenger, the first voice interaction link is switched into a monitoring state, the second voice interaction link state of the second passenger corresponding to the second voice zone is switched into a voice processing state from the monitoring state, the voice processing state is used for processing the voice signal of the second voice zone where the second passenger is located, and,
and when the second voice interaction link obtains the processing result of the voice signal of the second voice region, feeding back the processing result to the vehicle terminal, and performing voice broadcasting by the vehicle terminal, switching the state of the second voice interaction link from the voice processing state to the monitoring state, and switching the state of the first voice interaction link from the monitoring state to the voice processing state.
2. The method according to claim 1, wherein the vehicle terminal creates voice interaction links corresponding to the locations of one or more of the voice zones according to the locations of the voice zones, respectively, comprising:
and when the vehicle terminal recognizes that passengers exist in the voice zone, creating the voice interaction link for the voice zone where the passengers are located.
3. The method of claim 2, wherein the vehicle terminal identifying that the soundfield contains passengers, comprises:
the vehicle terminal acquires an ID for identifying the passenger;
a confidence that the passenger is a registered user is judged based on the ID for identifying the passenger,
when the confidence coefficient is larger than a preset value, determining that the passenger is a registered user, and acquiring user information of the passenger;
and when the confidence coefficient is smaller than a preset value, judging that the passenger is a new user, and registering user information.
4. A method according to claim 3, wherein registering user information comprises:
the vehicle terminal obtains an ID for identifying the user and one or more of a user name, a user nickname, a user age and a user preference.
5. The method of claim 4, wherein the ID for identifying the user comprises: one or more of face ID, voiceprint ID, and iris ID.
6. The method of claim 1, wherein the speech processing state of the speech interaction link comprises:
front-end signal processing, which comprises the steps that the voice interactive link acquires the voice signal corresponding to the voice zone and performs preprocessing to obtain a high-quality voice signal;
and the voice interaction link carries out voice dialogue with the passenger based on the high-quality voice signal.
7. The method of claim 6, wherein the front-end signal processing further comprises:
a voice endpoint detection, for detecting the starting position of the voice signal, obtaining an effective voice signal containing voice information and an ineffective voice signal not containing voice information;
noise reduction processing, which is used for reducing noise interference in the effective voice signal and improving signal to noise ratio;
echo cancellation for canceling echo in the active speech signal;
sound source localization, determining the position of a speaking passenger based on the voice signals collected by the microphone array;
and the wave beam forming is used for integrating the multipath voice signals collected by the microphone array into one path of voice signals and further accurately positioning the sound source.
8. The method according to claim 6 or 7, wherein the voice interaction comprises:
speech recognition for converting an active speech signal containing speech information into first text information;
semantic understanding for understanding meaning of the first text information;
dialog management, judging whether the current voice dialog of the passenger is finished or not based on the semantic understanding, and generating a decision;
performing speaking operation processing, namely generating a second text based on the decision sleeve and a preset speaking operation;
and synthesizing the voice, generating the voice from the second text and feeding back to the vehicle-mounted terminal for playing.
9. The method of claim 8, wherein the speech processing (NLG) comprises one or more of the following methods:
the voice interaction link selects a default speaking style;
the voice interaction link selection template is provided with a speaking style;
the voice interaction link selection model generates a speech style.
10. The method of claim 1, wherein the voice interaction further comprises:
and when the voice interaction link does not acquire the voice signal within the preset time range of the voice processing state, the voice interaction link is switched to the monitoring state by the voice processing state.
11. The method of claim 1, wherein the target vocabulary comprises:
one or more of a user name, a user nickname, and a user name.
12. The method of claim 1, wherein the vehicle terminal transmits a voice message to inform the passenger of the corresponding voice zone when the voice interactive link is switched to the state, so that the passenger can learn the state of the voice interactive link.
13. The method of claim 1, wherein a plurality of said voice interactive links can process said voice signals input by passengers of a plurality of zones in parallel.
14. The method of claim 1, wherein the soundfield comprises a main driver seat area and/or other passenger seat areas.
15. The method of claim 1, wherein the vehicle terminal performs voice broadcasting in a time sequence in which the processing result is obtained.
16. An electronic device comprising a processor and a memory,
the memory has stored therein instructions which,
the processor being configured to read the instructions stored in the memory to perform the method of any one of claims 1-15.
CN202010630094.6A 2020-07-03 2020-07-03 Multi-voice-zone voice interaction method for vehicle and electronic equipment Active CN111816189B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010630094.6A CN111816189B (en) 2020-07-03 2020-07-03 Multi-voice-zone voice interaction method for vehicle and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010630094.6A CN111816189B (en) 2020-07-03 2020-07-03 Multi-voice-zone voice interaction method for vehicle and electronic equipment

Publications (2)

Publication Number Publication Date
CN111816189A CN111816189A (en) 2020-10-23
CN111816189B true CN111816189B (en) 2023-12-26

Family

ID=72856710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010630094.6A Active CN111816189B (en) 2020-07-03 2020-07-03 Multi-voice-zone voice interaction method for vehicle and electronic equipment

Country Status (1)

Country Link
CN (1) CN111816189B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112599133A (en) * 2020-12-15 2021-04-02 北京百度网讯科技有限公司 Vehicle-based voice processing method, voice processor and vehicle-mounted processor
CN114179613B (en) * 2021-12-10 2024-03-05 常州星宇车灯股份有限公司 Audio-video touch interactive control method for co-driver control panel
CN114678026B (en) * 2022-05-27 2022-10-14 广州小鹏汽车科技有限公司 Voice interaction method, vehicle terminal, vehicle and storage medium
CN115691490A (en) * 2022-10-09 2023-02-03 蔚来汽车科技(安徽)有限公司 Method for dynamically switching sound zone, voice interaction method, equipment, medium and vehicle

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004172781A (en) * 2002-11-19 2004-06-17 Hitachi Ltd Information processing apparatus for voice interaction, voice interaction processing system, and car navigation terminal
JP2006189394A (en) * 2005-01-07 2006-07-20 Toyota Motor Corp Vehicle agent device
JP2009073428A (en) * 2007-09-24 2009-04-09 Clarion Co Ltd In-vehicle apparatus and system
DE102008051757A1 (en) * 2007-11-12 2009-05-14 Volkswagen Ag Multimodal user interface of a driver assistance system for entering and presenting information
JP2010102163A (en) * 2008-10-24 2010-05-06 Xanavi Informatics Corp Vehicle interior voice interaction device
CN107230476A (en) * 2017-05-05 2017-10-03 众安信息技术服务有限公司 A kind of natural man machine language's exchange method and system
DE102017109734A1 (en) * 2016-05-06 2017-11-09 GM Global Technology Operations LLC SYSTEM FOR PROVIDING PASSENGER-SPECIFIC ACOUSTICS FUNCTIONS IN A TRANSPORT VEHICLE
CN107340991A (en) * 2017-07-18 2017-11-10 百度在线网络技术(北京)有限公司 Switching method, device, equipment and the storage medium of speech roles
CN109754803A (en) * 2019-01-23 2019-05-14 上海华镇电子科技有限公司 Vehicle multi-sound area voice interactive system and method
CN110033775A (en) * 2019-05-07 2019-07-19 百度在线网络技术(北京)有限公司 Multitone area wakes up exchange method, device and storage medium
CN110070868A (en) * 2019-04-28 2019-07-30 广州小鹏汽车科技有限公司 Voice interactive method, device, automobile and the machine readable media of onboard system
CN110211585A (en) * 2019-06-05 2019-09-06 广州小鹏汽车科技有限公司 In-car entertainment interactive approach, device, vehicle and machine readable media
WO2020070878A1 (en) * 2018-10-05 2020-04-09 本田技研工業株式会社 Agent device, agent control method, and program
JP2020060623A (en) * 2018-10-05 2020-04-16 本田技研工業株式会社 Agent system, agent method, and program
WO2020079733A1 (en) * 2018-10-15 2020-04-23 三菱電機株式会社 Speech recognition device, speech recognition system, and speech recognition method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE112012006617B4 (en) * 2012-06-25 2023-09-28 Hyundai Motor Company On-board information device
CN110874202B (en) * 2018-08-29 2024-04-19 斑马智行网络(香港)有限公司 Interaction method, device, medium and operating system

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004172781A (en) * 2002-11-19 2004-06-17 Hitachi Ltd Information processing apparatus for voice interaction, voice interaction processing system, and car navigation terminal
JP2006189394A (en) * 2005-01-07 2006-07-20 Toyota Motor Corp Vehicle agent device
JP2009073428A (en) * 2007-09-24 2009-04-09 Clarion Co Ltd In-vehicle apparatus and system
DE102008051757A1 (en) * 2007-11-12 2009-05-14 Volkswagen Ag Multimodal user interface of a driver assistance system for entering and presenting information
JP2010102163A (en) * 2008-10-24 2010-05-06 Xanavi Informatics Corp Vehicle interior voice interaction device
DE102017109734A1 (en) * 2016-05-06 2017-11-09 GM Global Technology Operations LLC SYSTEM FOR PROVIDING PASSENGER-SPECIFIC ACOUSTICS FUNCTIONS IN A TRANSPORT VEHICLE
CN107230476A (en) * 2017-05-05 2017-10-03 众安信息技术服务有限公司 A kind of natural man machine language's exchange method and system
CN107340991A (en) * 2017-07-18 2017-11-10 百度在线网络技术(北京)有限公司 Switching method, device, equipment and the storage medium of speech roles
WO2020070878A1 (en) * 2018-10-05 2020-04-09 本田技研工業株式会社 Agent device, agent control method, and program
JP2020060623A (en) * 2018-10-05 2020-04-16 本田技研工業株式会社 Agent system, agent method, and program
WO2020079733A1 (en) * 2018-10-15 2020-04-23 三菱電機株式会社 Speech recognition device, speech recognition system, and speech recognition method
CN109754803A (en) * 2019-01-23 2019-05-14 上海华镇电子科技有限公司 Vehicle multi-sound area voice interactive system and method
CN110070868A (en) * 2019-04-28 2019-07-30 广州小鹏汽车科技有限公司 Voice interactive method, device, automobile and the machine readable media of onboard system
CN110033775A (en) * 2019-05-07 2019-07-19 百度在线网络技术(北京)有限公司 Multitone area wakes up exchange method, device and storage medium
CN110211585A (en) * 2019-06-05 2019-09-06 广州小鹏汽车科技有限公司 In-car entertainment interactive approach, device, vehicle and machine readable media

Also Published As

Publication number Publication date
CN111816189A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
CN111816189B (en) Multi-voice-zone voice interaction method for vehicle and electronic equipment
US9769296B2 (en) Techniques for voice controlling bluetooth headset
CN105390136B (en) Vehicle arrangement control device and method for user's adaptive type service
CN109192203A (en) Multitone area audio recognition method, device and storage medium
CN111294471B (en) Intelligent telephone answering method and system
CN105210355B (en) Equipment and correlation technique for the answer calls when recipient's judgement of call is not suitable for speaking
US20040013252A1 (en) Method and apparatus for improving listener differentiation of talkers during a conference call
KR20170088997A (en) Method and apparatus for processing voice information
CN109712615A (en) System and method for detecting the prompt in dialogic voice
CN110475180A (en) Vehicle multi-sound area audio processing system and method
CN108093653B (en) Voice prompt method, recording medium and voice prompt system
WO2022253003A1 (en) Speech enhancement method and related device
WO2018055898A1 (en) Information processing device and information processing method
CN103685783A (en) Information processing system and storage medium
CN111833875B (en) Embedded voice interaction system
CN110520323A (en) For controlling method, apparatus, mobile subscriber equipment and the computer program of vehicle audio frequency system
CN110696756A (en) Vehicle volume control method and device, automobile and storage medium
CN116417003A (en) Voice interaction system, method, electronic device and storage medium
CN111741394A (en) Data processing method and device and readable medium
CN109616122A (en) A kind of visualization hearing aid
JP6201279B2 (en) Server, server control method and control program, information processing system, information processing method, portable terminal, portable terminal control method and control program
CN115482830A (en) Speech enhancement method and related equipment
CN105957528A (en) Audio processing method and apparatus
Nishimuta et al. Toward a quizmaster robot for speech-based multiparty interaction
CN114005447A (en) Voice conversation interaction method, device, vehicle and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant