EP4030424B1 - Method and apparatus of processing voice for vehicle, electronic device and medium - Google Patents
Method and apparatus of processing voice for vehicle, electronic device and medium Download PDFInfo
- Publication number
- EP4030424B1 EP4030424B1 EP22176533.2A EP22176533A EP4030424B1 EP 4030424 B1 EP4030424 B1 EP 4030424B1 EP 22176533 A EP22176533 A EP 22176533A EP 4030424 B1 EP4030424 B1 EP 4030424B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- voice
- sub
- data
- vehicle
- working mode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012545 processing Methods 0.000 title claims description 47
- 238000000034 method Methods 0.000 title claims description 41
- 230000004044 response Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 11
- 238000000926 separation method Methods 0.000 claims description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 238000010586 diagram Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60R—VEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
- B60R16/00—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
- B60R16/02—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
- B60R16/037—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
- B60R16/0373—Voice control
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F02—COMBUSTION ENGINES; HOT-GAS OR COMBUSTION-PRODUCT ENGINE PLANTS
- F02D—CONTROLLING COMBUSTION ENGINES
- F02D41/00—Electrical control of supply of combustible mixture or its constituents
- F02D41/02—Circuit arrangements for generating control signals
- F02D41/021—Introducing corrections for particular conditions exterior to the engine
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F02—COMBUSTION ENGINES; HOT-GAS OR COMBUSTION-PRODUCT ENGINE PLANTS
- F02D—CONTROLLING COMBUSTION ENGINES
- F02D2200/00—Input parameters for engine control
- F02D2200/60—Input parameters for engine control said parameters being related to the driver demands or status
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- the present disclosure relates to a field of computer technology, in particular to a field of voice recognition, and specifically to a method and an apparatus of processing a voice for a vehicle, an electronic device, a medium and a program product.
- a vehicle has a voice recognition capability, and a voice receiver and a voice processor are usually arranged in the vehicle.
- the voice receiver is used for receiving voice data
- the voice processor is used for recognizing the received voice data.
- the cost of configuring the voice receiver for the vehicle is relatively high.
- CN 112 017 659 A discloses a multi-sound-area voice signal processing method.
- the method includes the steps that a voice signal of at least one to-be-identified sound area in a plurality of sound areas is received; if the at least one to-be-identified sound area is a non-main-driving sound area, the state of the vehicle is obtained, and the state of the vehicle is that an unfinished main-driving sound area task exists or an unfinished main-driving sound area task does not exist; a main driving sound area comprises a main driving position, and the main frame sound area task is a task related to a voice signal of the main driving sound area; and the voice signal of the at least one to-be-identified sound area is processed according to the state of the vehicle.
- US 2019/237067 A1 discloses a method for providing voice command operation in a passenger vehicle cabin having multiple occupants.
- the method operates to monitor microphone data relating to voice commands within a vehicle cabin and determine whether the microphone data includes wake-up-word data.
- the wake-up-word data relates to more than one of a plurality of vehicle cabin zones and more than one wake-up-words are coincident
- the method and device operate to monitor respective microphone data for voice command data from each of the more than one of the respective ones of the plurality of vehicle cabin zones.
- the voice command data may be processed to produce respective vehicle device commands and the vehicle device command(s) can be transmitted to effect the voice command data.
- CN 110 648 663 A discloses a vehicle-mounted audio management method. According to the method, a speech signal is collected and processed, a control intention, a target vehicle-mounted audio source and a target output region are determined, and output control is performed on the target vehicle-mounted audio source in the target output region according to the control intention.
- the present disclosure provides a method of processing a voice for a vehicle according to claim 1, an apparatus of processing a voice for a vehicle according to claim 6, an electronic device according to claim 9, a medium according to claim 10, and a program product according to claim 11.
- a system having at least one of A, B and C should include but not be limited to a system having only A, a system having only B, a system having only C, a system having A and B, a system having A and C, a system having B and C, and/or a system having A, B and C).
- FIG. 1 is a schematic diagram of an exemplary system architecture for a method and an apparatus of processing a voice for a vehicle according to an embodiment of the present disclosure. It should be noted that FIG. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied, so as to help those skilled in the art to understand the technical content of the present disclosure, but it does not mean that the embodiments of the present disclosure may not be used for other devices, systems, environments or scenes.
- the application scene may include a vehicle 100.
- the inside of the vehicle 100 includes, for example, a plurality of regions, and the plurality of regions include, for example, a main driving region 111 and a sub driving region 112.
- the plurality of regions may also include a rear seat region and the like.
- a plurality of voice receivers may be provided inside the vehicle 100 to receive a voice data.
- the voice receiver 121 is used, for example, to receive a voice data from the main driving region 111
- the voice receiver 122 is used, for example, to receive a voice data from the sub driving region 112.
- the vehicle 100 may perform different operations on the voice data from different regions.
- operations such as opening windows, turning on the air conditioner, and navigating are performed based on the voice data.
- operations such as playing music and checking the weather forecast are performed based on the voice data.
- the embodiments of the present disclosure provide a method of processing a voice for a vehicle.
- the method of processing a voice for a vehicle includes following operations.
- An initial voice data is separated in response to receiving the initial voice data from a plurality of regions inside the vehicle, so as to obtain a plurality of voice sub-data and a description information for each voice sub-data of the plurality of voice sub-data.
- the plurality of voice sub-data correspond to the plurality of regions respectively, and the description information for each voice sub-data indicates the region corresponding to the each voice sub-data in the plurality of regions.
- a voice working mode of the vehicle is determined based on the plurality of voice sub-data.
- the following describes a method of processing a voice for a vehicle according to an exemplary embodiment of the present disclosure with reference to FIGS. 2 to 4 and in conjunction with the application scene of FIG. 1 .
- FIG. 2 schematically shows a flowchart of a method of processing a voice for a vehicle according to an embodiment of the present disclosure.
- the method of processing a voice for a vehicle may include, for example, operations S210 to S220.
- an initial voice data is separated in response to receiving the initial voice data from a plurality of regions inside the vehicle, so as to obtain a plurality of voice sub-data and a description information for each voice sub-data of the plurality of voice sub-data.
- a voice working mode of the vehicle is determined based on the plurality of voice sub-data.
- the vehicle is, for example, provided with a voice receiver and a voice processor, and the voice receiver may include a microphone.
- the vehicle may receive the initial voice data from the plurality of regions through the voice receiver.
- the initial voice data is separated by using the voice processor, and the initial voice data is separated into a plurality of voice sub-data and a description information for each voice sub-data of the plurality of voice sub-data.
- the plurality of voice sub-data correspond to the plurality of regions respectively, and the description information for each voice sub-data indicates the region corresponding to the each voice sub-data in the plurality of regions.
- the vehicle may determine the voice working mode of the vehicle based on the plurality of voice sub-data respectively.
- the voice working mode indicates how to process a related voice data subsequently received by the vehicle and whether to perform a related operation based on the voice.
- the vehicle may receive the initial voice data from the plurality of regions through one voice receiver, and the initial voice data is separated to obtain the plurality of voice sub-data corresponding to the plurality of regions respectively. It is not required for the vehicle to configure a voice receiver for each of the plurality of regions, which may reduce the cost of the vehicle by reducing the number of voice receivers.
- the voice data is received through one voice receiver in the embodiments of the present disclosure, thereby reducing the data amount of the received voice data. In this way, the calculation amount for the vehicle to process the voice is reduced, and the voice processing performance of the vehicle is improved.
- FIG. 3 schematically shows a flowchart of a method of processing a voice for a vehicle according to another embodiment of the present disclosure.
- the method 300 of processing a voice for a vehicle may include, for example, operations S310 to S390.
- the initial voice data is separated, so as to obtain a plurality of voice sub-data and a description information for each voice sub-data of the plurality of voice sub-data.
- the initial voice data is separated by using a blind source separation algorithm, and the initial voice data is separated into a plurality of voice sub-data corresponding to the plurality of regions respectively.
- the plurality of regions include, for example, a main driving region and a sub driving region.
- the plurality of voice sub-data include a first voice sub-data and a second voice sub-data.
- a description information for the first voice sub-data indicates that the first voice sub-data is from the main driving region
- a description information for the second voice sub-data indicates that the second voice sub-data is from the sub driving region.
- a voice recognition is performed on the plurality of voice sub-data respectively, so as to obtain a plurality of voice recognition results, and the plurality of voice recognition results correspond to the plurality of voice sub-data respectively.
- a voice working mode of the vehicle is determined based on the plurality of voice recognition results. For example, it may be determined whether the voice recognition result corresponding to the first voice sub-data contains a first wake-up content or not and it may be determined whether the voice recognition result corresponding to the second voice sub-data contains a second wake-up content or not.
- the plurality of voice recognition results corresponding to the plurality of voice sub-data may be both determined whether the voice recognition result corresponding to the first voice sub-data contains a first wake-up content or not and whether the voice recognition result corresponding to the second voice sub-data contains a second wake-up content or not.
- the specific process is such as operation S340 to operation S390.
- operation S340 it is determined whether the voice recognition result corresponding to the first voice sub-data contains the first wake-up content or not. If the voice recognition result corresponding to the first voice sub-data contains the first wake-up content, then operation S350 is performed, otherwise, operation S370 is performed.
- the first wake-up content includes, for example, a specific wake-up word.
- operation S350 it is determined that the voice working mode of the vehicle is a first voice working mode, in response to the voice recognition result corresponding to the first voice sub-data containing the first wake-up content.
- the vehicle is controlled to operate based on the first voice working mode.
- Controlling the vehicle to operate based on the first voice working mode includes following operations.
- a third voice sub-data from the main driving region is extracted from a received first target voice data, a voice recognition is performed on the third voice sub-data to obtain a first operation instruction, and an operation is performed based on the first operation instruction.
- the voice receiver of the vehicle may continue to receive the first target voice data.
- the first target voice data is from, for example, the main driving region and the sub driving region. It should be noted that even if a user only speaks in the main driving region, the sound of the main driving region may be transmitted to the sub driving region due to the divergence and reflection of the sound. Alternatively, there are other noises in the sub driving region, so that the first target voice data usually includes sounds from the main driving region and the sub driving region.
- the third voice sub-data from the main driving region may be extracted from the received first target voice data by the vehicle.
- the first target voice data is separated into a plurality of voice sub-data by using the blind source separation algorithm, and the plurality of voice sub-data includes the voice sub-data corresponding to the main driving region and the voice sub-data corresponding to the sub driving region.
- the third voice sub-data from the main driving region is extracted from the plurality of voice sub-data.
- the vehicle performs the voice recognition on the third voice sub-data, so as to obtain the first operation instruction, and the first operation instruction is associated with the main driving region, and the operation is performed based on the first operation instruction.
- the first operation instruction obtained by performing the voice recognition on the third voice sub-data includes, for example, important instructions such as "open the window”, “turn on the air conditioner", and "navigate".
- operation S370 it is determined whether the voice recognition result corresponding to the second voice sub-data contains the second wake-up content or not. If the voice recognition result corresponding to the second voice sub-data contains the second wake-up content, operation S380 is performed, otherwise, the operations terminate.
- the second wake-up content includes, for example, a specific wake-up word.
- the vehicle is controlled to operate based on the second voice working mode.
- Controlling the vehicle to operate based on the second voice working mode includes following operations.
- a fourth voice sub-data from the sub driving region is extracted from a received second target voice data, a voice recognition is performed on the fourth voice sub-data to obtain a second operation instruction, and an operation is performed based on the second operation instruction.
- the voice receiver of the vehicle may continue to receive the second target voice data.
- the second target voice data is from, for example, the main driving region and the sub driving region. It should be noted that even if a user only speaks in the sub driving region, the sound of the sub driving region may be transmitted to the main driving region due to the divergence and reflection of the sound. Alternatively, there are other noises in the main driving region, so that the second target voice data usually includes sounds from the main driving region and the sub driving region.
- the fourth voice sub-data from the sub driving region may be extracted from the received second target voice data by the vehicle.
- the second target voice data is separated into a plurality of voice sub-data by using the blind source separation algorithm, and the plurality of voice sub-data includes the voice sub-data corresponding to the main driving region and the voice sub-data corresponding to the sub driving region. Then the fourth voice sub-data from the sub driving region is extracted from the plurality of voice sub-data.
- the vehicle performs the voice recognition on the fourth voice sub-data to obtain the second operation instruction
- the second operation instruction is associated with the sub driving region, and the operation is performed based on the second operation instruction.
- the second operation instruction obtained by performing the voice recognition on the fourth voice sub-data includes, for example, unimportant instructions such as "play music” and "check weather forecast”.
- the first voice working mode or the second voice working mode is in an awake state at the same time.
- the initial voice data includes both the first wake-up content and the second wake-up content
- the first voice working mode corresponding to the main driving region is preferentially woken up.
- the second voice working mode is woken up.
- the vehicle may receive the initial voice data for the plurality of regions through one voice receiver, and the initial voice data is separated to obtain the plurality of voice sub-data corresponding to the plurality of regions respectively, then the plurality of voice sub-data are recognized respectively to obtain the voice recognition results, and the voice working mode is determined based on the voice recognition results.
- the first voice work mode for the main driving region is different from the second voice work mode for the sub driving region, so that the vehicle implements a plurality of modes for voice recognition.
- FIG. 4 schematically shows a schematic diagram of a method of processing a voice for a vehicle according to an embodiment of the present disclosure.
- the vehicle 400 of the embodiment of the present disclosure may include a voice receiver 410, a voice processor 420 and an actuator 430.
- the voice processor 420 includes, for example, a blind source separating module 421, a main wake-up engine 422, a sub wake-up engine 423, a voice recognizing engine 424 and a semantic understanding module 425.
- the voice receiver 410 includes, for example, a microphone, and the microphone is used to, for example, receive a voice data from a main driving region and a sub driving region.
- the initial voice data A is sent to the blind source separating module 421 for separation processing, so as to obtain a plurality of voice sub-data and a description information for each voice sub-data of the plurality of voice sub-data.
- the plurality of voice sub-data include, for example, a first voice sub-data a1 and a second voice sub-data a2, the description information for the first voice sub-data a1, for example, indicates that the first voice sub-data a1 is from the main driving region, and the description information for the second voice sub-data a2, for example, indicates that the second voice sub-data a2 is from the sub driving region.
- the blind source separating module 421 uses a blind source separation algorithm to separate voice, and a separation result includes the voice sub-data and the description information for describing a source of the voice sub-data.
- the description information may include an angle information, and the angle information includes, for example, a first angle interval and a second angle interval.
- the first angle interval is, for example, [0° 90°)
- the second angle interval is, for example, [90° 180°].
- An angle in the description information for the first voice sub-data a1 from the main driving region is, for example, within [0°90°)
- an angle in the description information for the second voice sub-data a2 from the sub driving region is, for example, within [90°180°].
- the source of each voice sub-data may be determined by calculating direction of arrival (DOA) of the voice.
- DOA direction of arrival
- the first voice sub-data a1 is sent to the main wake-up engine 422 for recognition, so as to obtain a voice recognition result for the first voice sub-data a1.
- the voice recognition result includes a first wake-up content, it is determined that the voice working mode of the vehicle is a first voice working mode.
- the second voice sub-data a2 is sent to the sub wake-up engine 423, so as to obtain a voice recognition result for the second voice sub-data a2.
- the voice recognition result includes a second wake-up content, it is determined that the voice working mode of the vehicle is a second voice working mode.
- the voice receiver 410 of the vehicle may continue to receive a first target voice data B.
- the first target voice data B includes, for example, a voice of a user from the main driving region.
- the blind source separating module 421 may separate the first target voice data B and extract a third voice sub-data b from the main driving region.
- the blind source separating module 421 sends the extracted third voice sub-data b to the voice recognizing engine 424 for voice recognition, so as to obtain a voice recognition result b1.
- the voice recognition result b1 includes, for example, the text "open the window”, “turn on the air conditioner", “navigate” and so on.
- the voice recognizing engine 424 sends the voice recognition result b1 to the semantic understanding module 425 for semantic understanding, so as to determine a first operation instruction b2 corresponding to the text.
- the first operation instruction b2 corresponding to the text "open the window” is a window opening instruction.
- the first operation instruction b2 is sent to the actuator 430, and the actuator 430 performs related operations based on the first operation instruction b2. For example, the actuator 430 opens a window based on the window opening instruction.
- the vehicle may receive the initial voice data from the plurality of regions through one voice receiver, and the initial voice data is separated to obtain the plurality of voice sub-data corresponding to the plurality of regions respectively, which reducing the cost of the vehicle.
- the voice data is received through one voice receiver, thereby reducing the data amount of the received voice data. In this way, the calculation amount for the vehicle to process the voice is reduced, and the voice processing performance of the vehicle is improved.
- FIG. 5 schematically shows a block diagram of an apparatus of processing a voice for a vehicle according to an embodiment of the present disclosure.
- the apparatus of processing a voice for a vehicle in the embodiments of the present disclosure includes, for example, a processing module 510 and a determining module 520.
- the processing module 510 is used to separate an initial voice data in response to receiving the initial voice data from a plurality of regions inside the vehicle, so as to obtain a plurality of voice sub-data and a description information for each voice sub-data of the plurality of voice sub-data, the plurality of voice sub-data correspond to the plurality of regions respectively, and the description information for each voice sub-data indicates the region corresponding to the each voice sub-data in the plurality of regions.
- the processing module 510 may, for example, perform the operation S210 described above with reference to FIG. 2 , which will not be repeated here.
- the determining module 520 is used to determine a voice working mode of the vehicle based on the plurality of voice sub-data. According to the embodiments of the present disclosure, the determining module 520 may, for example, perform the operation S220 described above with reference to FIG. 2 , which will not be repeated here.
- the determining module 520 includes, for example, a first recognizing sub-module and a determining sub-module.
- the first recognizing sub-module is used to perform a voice recognition on the plurality of voice sub-data respectively, so as to obtain a plurality of voice recognition results, the plurality of voice recognition results correspond to the plurality of voice sub-data respectively.
- the determining sub-module is used to determine the voice working mode of the vehicle based on the plurality of voice recognition results.
- the plurality of regions includes a main driving region and a sub driving region;
- the plurality of voice sub-data includes a first voice sub-data and a second voice sub-data, a description information for the first voice sub-data indicates that the first voice sub-data is from the main driving region, and a description information for the second voice sub-data indicates that the second voice sub-data is from the sub driving region.
- the determining sub-module includes at least one of a first determining unit and a second determining unit.
- the first determining unit is used to determine that the voice working mode of the vehicle is a first voice working mode, in response to the voice recognition result corresponding to the first voice sub-data containing a first wake-up content.
- the second determining unit is used to determine that the voice working mode of the vehicle is a second voice working mode, in response to the voice recognition result corresponding to the second voice sub-data containing a second wake-up content.
- the apparatus 500 may further include a first controlling module used to control the vehicle to operate based on the first voice working mode.
- the first controlling module includes a first extracting sub-module, a second recognizing sub-module and a first operating sub-module.
- the first extracting sub-module is used to extract, from a received first target voice data, a third voice sub-data from the main driving region.
- the second recognizing sub-module is used to perform a voice recognition on the third voice sub-data, so as to obtain a first operation instruction, the first operation instruction is associated with the main driving region.
- the first operating sub-module is used to operate based on the first operation instruction.
- the apparatus 500 may further include a second controlling module used to control the vehicle to operate based on the second voice working mode.
- the second controlling module includes a second extracting sub-module, a third recognizing sub-module and a second operating sub-module.
- the second extracting sub-module is used to extract, from a received second target voice data, a fourth voice sub-data from the sub driving region.
- the third recognizing sub-module is used to perform a voice recognition on the fourth voice sub-data, so as to obtain a second operation instruction, the second operation instruction is associated with the sub driving region.
- the second operating sub-module is used to operate based on the second operation instruction.
- the vehicle includes a main wake-up engine and a sub wake-up engine; and the first recognizing sub-module includes a first recognizing unit and a second recognizing unit.
- the first recognizing unit is used to recognize the first voice sub-data by using the main wake-up engine, so as to obtain the voice recognition result for the first voice sub-data.
- the second recognizing unit is used to recognize the second voice sub-data by using the sub wake-up engine, so as to obtain the voice recognition result for the second voice sub-data.
- the processing module 510 is further used to separate the initial voice data by using a blind source separation algorithm.
- the collection, storage, use, processing, transmission, provision, disclosure, and application of the user's personal information involved are all in compliance with relevant laws and regulations, take essential confidentiality measures, and do not violate public order and good customs.
- authorization or consent is obtained from the user before the user's personal information is obtained or collected.
- the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
- FIG. 6 is a block diagram of an electronic device used to implement voice processing in the embodiments of the present disclosure.
- FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure.
- the electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers and other suitable computers.
- the electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices and other similar computing devices.
- the components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the present disclosure described and/or required herein.
- the device 600 includes a computing unit 601, which may execute various appropriate actions and processing according to a computer program stored in a read only memory (ROM) 602 or a computer program loaded from a storage unit 608 into a random access memory (RAM) 603. Various programs and data required for the operation of the device 600 may also be stored in the RAM 603.
- the computing unit 601, the ROM 602 and the RAM 603 are connected to each other through a bus 604.
- An input/output (I/O) interface 605 is also connected to the bus 604.
- the I/O interface 605 is connected to a plurality of components of the device 600, including: an input unit 606, such as a keyboard, a mouse, etc.; an output unit 607, such as various types of displays, speakers, etc.; a storage unit 608, such as a magnetic disk, an optical disk, etc.; and a communication unit 609, such as a network card, a modem, a wireless communication transceiver, etc.
- the communication unit 609 allows the device 600 to exchange information/data with other devices through the computer network such as the Internet and/or various telecommunication networks.
- the computing unit 601 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, central processing unit (CPU), graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing DSP and any appropriate processor, controller, microcontroller, etc.
- the computing unit 601 executes the various methods and processes described above, such as the method of processing a voice for a vehicle.
- the method of processing a voice for a vehicle may be implemented as computer software programs, which are tangibly contained in the machine-readable medium, such as the storage unit 608.
- part or all of the computer program may be loaded and/or installed on the device 600 via the ROM 602 and/or the communication unit 609.
- the computer program When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the method of processing a voice for a vehicle described above may be executed.
- the computing unit 601 may be configured to execute the method of processing a voice for a vehicle in any other suitable manner (for example, by means of firmware).
- Various implementations of the systems and technologies described in the present disclosure may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGA), application specific integrated circuits (ASIC), application-specific standard products (ASSP), system-on-chip SOC, complex programmable logic device (CPLD), computer hardware, firmware, software and/or their combination.
- the various implementations may include: being implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, the programmable processor may be a dedicated or general programmable processor.
- the programmable processor may receive data and instructions from a storage system, at least one input device and at least one output device, and the programmable processor transmit data and instructions to the storage system, the at least one input device and the at least one output device.
- the program code used to implement the method of the present disclosure may be written in any combination of one or more programming languages.
- the program codes may be provided to the processors or controllers of general-purpose computers, special-purpose computers or other programmable data processing devices, so that the program code enables the functions/operations specific in the flowcharts and/or block diagrams to be implemented when the program code executed by a processor or controller.
- the program code may be executed entirely on the machine, partly executed on the machine, partly executed on the machine and partly executed on the remote machine as an independent software package, or entirely executed on the remote machine or server.
- the machine-readable medium may be a tangible medium, which may contain or store a program for use by the instruction execution system, apparatus, or device or in combination with the instruction execution system, apparatus, or device.
- the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- the machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the above-mentioned content.
- machine-readable storage media would include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device or any suitable combination of the above-mentioned content.
- RAM random access memory
- ROM read-only memory
- EPROM or flash memory erasable programmable read-only memory
- CD-ROM compact disk read-only memory
- magnetic storage device magnetic storage device or any suitable combination of the above-mentioned content.
- the systems and techniques described here may be implemented on a computer, the computer includes: a display device (for example, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and a pointing device (for example, a mouse or trackball).
- a display device for example, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
- a keyboard and a pointing device for example, a mouse or trackball
- the user may provide input to the computer through the keyboard and the pointing device.
- Other types of devices may also be used to provide interaction with users.
- the feedback provided to the user may be any form of sensory feedback (for example, visual feedback, auditory feedback or tactile feedback); and any form (including sound input, voice input, or tactile input) may be used to receive input from the user.
- the systems and technologies described herein may be implemented in a computing system including back-end components (for example, as a data server), or a computing system including middleware components (for example, an application server), or a computing system including front-end components (for example, a user computer with a graphical user interface or a web browser through which the user may interact with the implementation of the system and technology described herein), or in a computing system including any combination of such back-end components, middleware components or front-end components.
- the components of the system may be connected to each other through any form or medium of digital data communication (for example, a communication network). Examples of communication networks include: local area network (LAN), wide area network (WAN) and the Internet.
- the computer system may include a client and a server.
- the client and the server are generally far away from each other and usually interact through the communication network.
- the relationship between the client and the server is generated by computer programs that run on the respective computers and have a client-server relationship with each other.
Landscapes
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Mechanical Engineering (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Chemical & Material Sciences (AREA)
- Combustion & Propulsion (AREA)
- General Engineering & Computer Science (AREA)
- Traffic Control Systems (AREA)
- Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)
- Navigation (AREA)
Description
- The present disclosure relates to a field of computer technology, in particular to a field of voice recognition, and specifically to a method and an apparatus of processing a voice for a vehicle, an electronic device, a medium and a program product.
- In the related art, a vehicle has a voice recognition capability, and a voice receiver and a voice processor are usually arranged in the vehicle. The voice receiver is used for receiving voice data, and the voice processor is used for recognizing the received voice data. However, in the related art, the cost of configuring the voice receiver for the vehicle is relatively high.
-
CN 112 017 659 A discloses a multi-sound-area voice signal processing method. The method includes the steps that a voice signal of at least one to-be-identified sound area in a plurality of sound areas is received; if the at least one to-be-identified sound area is a non-main-driving sound area, the state of the vehicle is obtained, and the state of the vehicle is that an unfinished main-driving sound area task exists or an unfinished main-driving sound area task does not exist; a main driving sound area comprises a main driving position, and the main frame sound area task is a task related to a voice signal of the main driving sound area; and the voice signal of the at least one to-be-identified sound area is processed according to the state of the vehicle. -
US 2019/237067 A1 discloses a method for providing voice command operation in a passenger vehicle cabin having multiple occupants. The method operates to monitor microphone data relating to voice commands within a vehicle cabin and determine whether the microphone data includes wake-up-word data. When the wake-up-word data relates to more than one of a plurality of vehicle cabin zones and more than one wake-up-words are coincident, the method and device operate to monitor respective microphone data for voice command data from each of the more than one of the respective ones of the plurality of vehicle cabin zones. Upon detection, the voice command data may be processed to produce respective vehicle device commands and the vehicle device command(s) can be transmitted to effect the voice command data. -
CN 110 648 663 A discloses a vehicle-mounted audio management method. According to the method, a speech signal is collected and processed, a control intention, a target vehicle-mounted audio source and a target output region are determined, and output control is performed on the target vehicle-mounted audio source in the target output region according to the control intention. - The present disclosure provides a method of processing a voice for a vehicle according to
claim 1, an apparatus of processing a voice for a vehicle according to claim 6, an electronic device according to claim 9, a medium according to claim 10, and a program product according to claim 11. - It should be understood that the content described in this section is not intended to identify critical or important features of embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood by the following description.
- The drawings are used for better understanding of the present solution, and do not constitute a limitation to the present disclosure, in which:
-
FIG. 1 schematically shows an application scene for a method and an apparatus of processing a voice for a vehicle according to an embodiment of the present disclosure; -
FIG. 2 schematically shows a flowchart of a method of processing a voice for a vehicle according to an embodiment of the present disclosure; -
FIG. 3 schematically shows a flowchart of a method of processing a voice for a vehicle according to another embodiment of the present disclosure; -
FIG. 4 schematically shows a schematic diagram of a method of processing a voice for a vehicle according to an embodiment of the present disclosure; -
FIG. 5 schematically shows a block diagram of an apparatus of processing a voice for a vehicle according to an embodiment of the present disclosure; and -
FIG. 6 is a block diagram of an electronic device used to implement voice processing in the embodiments of the present disclosure. - The following describes exemplary embodiments of the present disclosure with reference to the drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be regarded as merely exemplary. Therefore, those skilled in the art should recognize that various changes and modifications may be made to the embodiments described herein without departing from the scope of the appended claims. Likewise, for clarity and conciseness, descriptions of well-known functions and structures are omitted in the following description.
- The terms used herein are for the purpose of describing specific embodiments only and are not intended to limit the present disclosure. The terms "comprising", "including", etc. used herein indicate the presence of the feature, step, operation and/or part, but do not exclude the presence or addition of one or more other features, steps, operations or parts.
- All terms used herein (including technical and scientific terms) have the meanings generally understood by those skilled in the art, unless otherwise defined. It should be noted that the terms used herein shall be interpreted to have meanings consistent with the context of this specification, and shall not be interpreted in an idealized or too rigid way.
- In the case of using the expression similar to "at least one of A, B and C", it should be explained according to the meaning of the expression generally understood by those skilled in the art (for example, "a system having at least one of A, B and C" should include but not be limited to a system having only A, a system having only B, a system having only C, a system having A and B, a system having A and C, a system having B and C, and/or a system having A, B and C).
-
FIG. 1 is a schematic diagram of an exemplary system architecture for a method and an apparatus of processing a voice for a vehicle according to an embodiment of the present disclosure. It should be noted thatFIG. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied, so as to help those skilled in the art to understand the technical content of the present disclosure, but it does not mean that the embodiments of the present disclosure may not be used for other devices, systems, environments or scenes. - As shown in
FIG. 1 , the application scene according to this embodiment may include avehicle 100. The inside of thevehicle 100 includes, for example, a plurality of regions, and the plurality of regions include, for example, amain driving region 111 and asub driving region 112. The plurality of regions may also include a rear seat region and the like. - For example, a plurality of voice receivers may be provided inside the
vehicle 100 to receive a voice data. Thevoice receiver 121 is used, for example, to receive a voice data from themain driving region 111, and thevoice receiver 122 is used, for example, to receive a voice data from thesub driving region 112. Thevehicle 100 may perform different operations on the voice data from different regions. - For example, after the voice data from the
main driving region 111 is received, operations such as opening windows, turning on the air conditioner, and navigating are performed based on the voice data. After the voice data from thesub driving region 112 is received, operations such as playing music and checking the weather forecast are performed based on the voice data. - However, there is a problem of high cost when the
vehicle 100 is provided with the plurality of voice receivers. - In view of this, the embodiments of the present disclosure provide a method of processing a voice for a vehicle. The method of processing a voice for a vehicle includes following operations. An initial voice data is separated in response to receiving the initial voice data from a plurality of regions inside the vehicle, so as to obtain a plurality of voice sub-data and a description information for each voice sub-data of the plurality of voice sub-data. The plurality of voice sub-data correspond to the plurality of regions respectively, and the description information for each voice sub-data indicates the region corresponding to the each voice sub-data in the plurality of regions. Next, a voice working mode of the vehicle is determined based on the plurality of voice sub-data.
- The following describes a method of processing a voice for a vehicle according to an exemplary embodiment of the present disclosure with reference to
FIGS. 2 to 4 and in conjunction with the application scene ofFIG. 1 . -
FIG. 2 schematically shows a flowchart of a method of processing a voice for a vehicle according to an embodiment of the present disclosure. - As shown in
FIG. 2 , the method of processing a voice for a vehicle according to the embodiment of the present disclosure may include, for example, operations S210 to S220. - In operation S210, an initial voice data is separated in response to receiving the initial voice data from a plurality of regions inside the vehicle, so as to obtain a plurality of voice sub-data and a description information for each voice sub-data of the plurality of voice sub-data.
- In operation S220, a voice working mode of the vehicle is determined based on the plurality of voice sub-data.
- Exemplarily, the vehicle is, for example, provided with a voice receiver and a voice processor, and the voice receiver may include a microphone. The vehicle may receive the initial voice data from the plurality of regions through the voice receiver. After the initial voice data is received, the initial voice data is separated by using the voice processor, and the initial voice data is separated into a plurality of voice sub-data and a description information for each voice sub-data of the plurality of voice sub-data. The plurality of voice sub-data correspond to the plurality of regions respectively, and the description information for each voice sub-data indicates the region corresponding to the each voice sub-data in the plurality of regions.
- After the plurality of voice sub-data are obtained through the separation processing, the vehicle may determine the voice working mode of the vehicle based on the plurality of voice sub-data respectively. The voice working mode, for example, indicates how to process a related voice data subsequently received by the vehicle and whether to perform a related operation based on the voice.
- According to the embodiments of the present disclosure, the vehicle may receive the initial voice data from the plurality of regions through one voice receiver, and the initial voice data is separated to obtain the plurality of voice sub-data corresponding to the plurality of regions respectively. It is not required for the vehicle to configure a voice receiver for each of the plurality of regions, which may reduce the cost of the vehicle by reducing the number of voice receivers. In addition, compared to receiving a voice data from a plurality of regions through a plurality of voice receivers respectively, the voice data is received through one voice receiver in the embodiments of the present disclosure, thereby reducing the data amount of the received voice data. In this way, the calculation amount for the vehicle to process the voice is reduced, and the voice processing performance of the vehicle is improved.
-
FIG. 3 schematically shows a flowchart of a method of processing a voice for a vehicle according to another embodiment of the present disclosure. - As shown in
FIG. 3 , themethod 300 of processing a voice for a vehicle according to the embodiment of the present disclosure may include, for example, operations S310 to S390. - In operation S310, an initial voice data from a plurality of regions inside the vehicle is received.
- In operation S320, the initial voice data is separated, so as to obtain a plurality of voice sub-data and a description information for each voice sub-data of the plurality of voice sub-data.
- For example, the initial voice data is separated by using a blind source separation algorithm, and the initial voice data is separated into a plurality of voice sub-data corresponding to the plurality of regions respectively. The plurality of regions include, for example, a main driving region and a sub driving region. The plurality of voice sub-data include a first voice sub-data and a second voice sub-data. A description information for the first voice sub-data indicates that the first voice sub-data is from the main driving region, and a description information for the second voice sub-data indicates that the second voice sub-data is from the sub driving region.
- In operation S330, a voice recognition is performed on the plurality of voice sub-data respectively, so as to obtain a plurality of voice recognition results, and the plurality of voice recognition results correspond to the plurality of voice sub-data respectively.
- Exemplarily, a voice working mode of the vehicle is determined based on the plurality of voice recognition results. For example, it may be determined whether the voice recognition result corresponding to the first voice sub-data contains a first wake-up content or not and it may be determined whether the voice recognition result corresponding to the second voice sub-data contains a second wake-up content or not.
- In an example, after the plurality of voice recognition results corresponding to the plurality of voice sub-data are obtained, it may be both determined whether the voice recognition result corresponding to the first voice sub-data contains a first wake-up content or not and whether the voice recognition result corresponding to the second voice sub-data contains a second wake-up content or not.
- In another example, after the plurality of voice recognition results corresponding to the plurality of voice sub-data are obtained, it is possible to first determine whether the voice recognition result corresponding to the first voice sub-data contains a first wake-up content or not and then determine whether the voice recognition result corresponding to the second voice sub-data contains a second wake-up content or not. The specific process is such as operation S340 to operation S390.
- In operation S340, it is determined whether the voice recognition result corresponding to the first voice sub-data contains the first wake-up content or not. If the voice recognition result corresponding to the first voice sub-data contains the first wake-up content, then operation S350 is performed, otherwise, operation S370 is performed. The first wake-up content includes, for example, a specific wake-up word.
- In operation S350, it is determined that the voice working mode of the vehicle is a first voice working mode, in response to the voice recognition result corresponding to the first voice sub-data containing the first wake-up content.
- In operation S360, the vehicle is controlled to operate based on the first voice working mode.
- Controlling the vehicle to operate based on the first voice working mode includes following operations. A third voice sub-data from the main driving region is extracted from a received first target voice data, a voice recognition is performed on the third voice sub-data to obtain a first operation instruction, and an operation is performed based on the first operation instruction.
- For example, after the first wake-up content is recognized, the voice receiver of the vehicle may continue to receive the first target voice data. The first target voice data is from, for example, the main driving region and the sub driving region. It should be noted that even if a user only speaks in the main driving region, the sound of the main driving region may be transmitted to the sub driving region due to the divergence and reflection of the sound. Alternatively, there are other noises in the sub driving region, so that the first target voice data usually includes sounds from the main driving region and the sub driving region.
- The third voice sub-data from the main driving region may be extracted from the received first target voice data by the vehicle. For example, the first target voice data is separated into a plurality of voice sub-data by using the blind source separation algorithm, and the plurality of voice sub-data includes the voice sub-data corresponding to the main driving region and the voice sub-data corresponding to the sub driving region. Then the third voice sub-data from the main driving region is extracted from the plurality of voice sub-data.
- Next, the vehicle performs the voice recognition on the third voice sub-data, so as to obtain the first operation instruction, and the first operation instruction is associated with the main driving region, and the operation is performed based on the first operation instruction. The first operation instruction obtained by performing the voice recognition on the third voice sub-data includes, for example, important instructions such as "open the window", "turn on the air conditioner", and "navigate".
- In operation S370, it is determined whether the voice recognition result corresponding to the second voice sub-data contains the second wake-up content or not. If the voice recognition result corresponding to the second voice sub-data contains the second wake-up content, operation S380 is performed, otherwise, the operations terminate. The second wake-up content includes, for example, a specific wake-up word.
- In operation S380, it is determined that the voice working mode of the vehicle is a second voice working mode, in response to the voice recognition result corresponding to the second voice sub-data containing the second wake-up content.
- In operation S390, the vehicle is controlled to operate based on the second voice working mode.
- Controlling the vehicle to operate based on the second voice working mode includes following operations. A fourth voice sub-data from the sub driving region is extracted from a received second target voice data, a voice recognition is performed on the fourth voice sub-data to obtain a second operation instruction, and an operation is performed based on the second operation instruction.
- For example, after the second wake-up content is recognized, the voice receiver of the vehicle may continue to receive the second target voice data. The second target voice data is from, for example, the main driving region and the sub driving region. It should be noted that even if a user only speaks in the sub driving region, the sound of the sub driving region may be transmitted to the main driving region due to the divergence and reflection of the sound. Alternatively, there are other noises in the main driving region, so that the second target voice data usually includes sounds from the main driving region and the sub driving region.
- The fourth voice sub-data from the sub driving region may be extracted from the received second target voice data by the vehicle. For example, the second target voice data is separated into a plurality of voice sub-data by using the blind source separation algorithm, and the plurality of voice sub-data includes the voice sub-data corresponding to the main driving region and the voice sub-data corresponding to the sub driving region. Then the fourth voice sub-data from the sub driving region is extracted from the plurality of voice sub-data.
- Next, the vehicle performs the voice recognition on the fourth voice sub-data to obtain the second operation instruction, the second operation instruction is associated with the sub driving region, and the operation is performed based on the second operation instruction. The second operation instruction obtained by performing the voice recognition on the fourth voice sub-data includes, for example, unimportant instructions such as "play music" and "check weather forecast".
- In the embodiments of the present disclosure, usually only the first voice working mode or the second voice working mode is in an awake state at the same time. When the initial voice data includes both the first wake-up content and the second wake-up content, the first voice working mode corresponding to the main driving region is preferentially woken up. When the initial voice data does not include the first wake-up content but includes the second wake-up content, the second voice working mode is woken up.
- According to the embodiments of the present disclosure, the vehicle may receive the initial voice data for the plurality of regions through one voice receiver, and the initial voice data is separated to obtain the plurality of voice sub-data corresponding to the plurality of regions respectively, then the plurality of voice sub-data are recognized respectively to obtain the voice recognition results, and the voice working mode is determined based on the voice recognition results. The first voice work mode for the main driving region is different from the second voice work mode for the sub driving region, so that the vehicle implements a plurality of modes for voice recognition.
-
FIG. 4 schematically shows a schematic diagram of a method of processing a voice for a vehicle according to an embodiment of the present disclosure. - As shown in
FIG. 4 , thevehicle 400 of the embodiment of the present disclosure may include avoice receiver 410, avoice processor 420 and anactuator 430. Thevoice processor 420 includes, for example, a blindsource separating module 421, a main wake-upengine 422, a sub wake-upengine 423, avoice recognizing engine 424 and asemantic understanding module 425. - The
voice receiver 410 includes, for example, a microphone, and the microphone is used to, for example, receive a voice data from a main driving region and a sub driving region. - After the
voice receiver 410 receives an initial voice data A, the initial voice data A is sent to the blindsource separating module 421 for separation processing, so as to obtain a plurality of voice sub-data and a description information for each voice sub-data of the plurality of voice sub-data. The plurality of voice sub-data include, for example, a first voice sub-data a1 and a second voice sub-data a2, the description information for the first voice sub-data a1, for example, indicates that the first voice sub-data a1 is from the main driving region, and the description information for the second voice sub-data a2, for example, indicates that the second voice sub-data a2 is from the sub driving region. - In an example, the blind
source separating module 421 uses a blind source separation algorithm to separate voice, and a separation result includes the voice sub-data and the description information for describing a source of the voice sub-data. The description information may include an angle information, and the angle information includes, for example, a first angle interval and a second angle interval. The first angle interval is, for example, [0° 90°), and the second angle interval is, for example, [90° 180°]. An angle in the description information for the first voice sub-data a1 from the main driving region is, for example, within [0°90°), and an angle in the description information for the second voice sub-data a2 from the sub driving region is, for example, within [90°180°]. When the blind source separation algorithm is used to separate the voice data, for example, the source of each voice sub-data may be determined by calculating direction of arrival (DOA) of the voice. - Next, the first voice sub-data a1 is sent to the main wake-up
engine 422 for recognition, so as to obtain a voice recognition result for the first voice sub-data a1. When the voice recognition result includes a first wake-up content, it is determined that the voice working mode of the vehicle is a first voice working mode. - The second voice sub-data a2 is sent to the sub wake-up
engine 423, so as to obtain a voice recognition result for the second voice sub-data a2. When the voice recognition result includes a second wake-up content, it is determined that the voice working mode of the vehicle is a second voice working mode. - Take the voice working mode of the vehicle as the first voice working mode as an example. In the first voice working mode, the
voice receiver 410 of the vehicle may continue to receive a first target voice data B. The first target voice data B includes, for example, a voice of a user from the main driving region. The blindsource separating module 421 may separate the first target voice data B and extract a third voice sub-data b from the main driving region. - Then, the blind
source separating module 421 sends the extracted third voice sub-data b to thevoice recognizing engine 424 for voice recognition, so as to obtain a voice recognition result b1. The voice recognition result b1 includes, for example, the text "open the window", "turn on the air conditioner", "navigate" and so on. Thevoice recognizing engine 424 sends the voice recognition result b1 to thesemantic understanding module 425 for semantic understanding, so as to determine a first operation instruction b2 corresponding to the text. For example, the first operation instruction b2 corresponding to the text "open the window" is a window opening instruction. - Next, the first operation instruction b2 is sent to the
actuator 430, and theactuator 430 performs related operations based on the first operation instruction b2. For example, theactuator 430 opens a window based on the window opening instruction. - It should be understood that, in the embodiments of the present disclosure, the vehicle may receive the initial voice data from the plurality of regions through one voice receiver, and the initial voice data is separated to obtain the plurality of voice sub-data corresponding to the plurality of regions respectively, which reducing the cost of the vehicle. In addition, the voice data is received through one voice receiver, thereby reducing the data amount of the received voice data. In this way, the calculation amount for the vehicle to process the voice is reduced, and the voice processing performance of the vehicle is improved.
-
FIG. 5 schematically shows a block diagram of an apparatus of processing a voice for a vehicle according to an embodiment of the present disclosure. - As shown in
FIG. 5 , the apparatus of processing a voice for a vehicle in the embodiments of the present disclosure includes, for example, aprocessing module 510 and a determiningmodule 520. - The
processing module 510 is used to separate an initial voice data in response to receiving the initial voice data from a plurality of regions inside the vehicle, so as to obtain a plurality of voice sub-data and a description information for each voice sub-data of the plurality of voice sub-data, the plurality of voice sub-data correspond to the plurality of regions respectively, and the description information for each voice sub-data indicates the region corresponding to the each voice sub-data in the plurality of regions. According to the embodiments of the present disclosure, theprocessing module 510 may, for example, perform the operation S210 described above with reference toFIG. 2 , which will not be repeated here. - The determining
module 520 is used to determine a voice working mode of the vehicle based on the plurality of voice sub-data. According to the embodiments of the present disclosure, the determiningmodule 520 may, for example, perform the operation S220 described above with reference toFIG. 2 , which will not be repeated here. - According to the embodiments of the present disclosure, the determining
module 520 includes, for example, a first recognizing sub-module and a determining sub-module. The first recognizing sub-module is used to perform a voice recognition on the plurality of voice sub-data respectively, so as to obtain a plurality of voice recognition results, the plurality of voice recognition results correspond to the plurality of voice sub-data respectively. The determining sub-module is used to determine the voice working mode of the vehicle based on the plurality of voice recognition results. - According to the embodiments of the present disclosure, the plurality of regions includes a main driving region and a sub driving region; the plurality of voice sub-data includes a first voice sub-data and a second voice sub-data, a description information for the first voice sub-data indicates that the first voice sub-data is from the main driving region, and a description information for the second voice sub-data indicates that the second voice sub-data is from the sub driving region. The determining sub-module includes at least one of a first determining unit and a second determining unit. The first determining unit is used to determine that the voice working mode of the vehicle is a first voice working mode, in response to the voice recognition result corresponding to the first voice sub-data containing a first wake-up content. The second determining unit is used to determine that the voice working mode of the vehicle is a second voice working mode, in response to the voice recognition result corresponding to the second voice sub-data containing a second wake-up content.
- According to the embodiments of the present disclosure, the
apparatus 500 may further include a first controlling module used to control the vehicle to operate based on the first voice working mode. The first controlling module includes a first extracting sub-module, a second recognizing sub-module and a first operating sub-module. The first extracting sub-module is used to extract, from a received first target voice data, a third voice sub-data from the main driving region. The second recognizing sub-module is used to perform a voice recognition on the third voice sub-data, so as to obtain a first operation instruction, the first operation instruction is associated with the main driving region. The first operating sub-module is used to operate based on the first operation instruction. - According to the embodiments of the present disclosure, the
apparatus 500 may further include a second controlling module used to control the vehicle to operate based on the second voice working mode. The second controlling module includes a second extracting sub-module, a third recognizing sub-module and a second operating sub-module. The second extracting sub-module is used to extract, from a received second target voice data, a fourth voice sub-data from the sub driving region. The third recognizing sub-module is used to perform a voice recognition on the fourth voice sub-data, so as to obtain a second operation instruction, the second operation instruction is associated with the sub driving region. The second operating sub-module is used to operate based on the second operation instruction. - According to the embodiments of the present disclosure, the vehicle includes a main wake-up engine and a sub wake-up engine; and the first recognizing sub-module includes a first recognizing unit and a second recognizing unit. The first recognizing unit is used to recognize the first voice sub-data by using the main wake-up engine, so as to obtain the voice recognition result for the first voice sub-data. The second recognizing unit is used to recognize the second voice sub-data by using the sub wake-up engine, so as to obtain the voice recognition result for the second voice sub-data.
- According to the embodiments of the present disclosure, the
processing module 510 is further used to separate the initial voice data by using a blind source separation algorithm. - In the technical solution of the present disclosure, the collection, storage, use, processing, transmission, provision, disclosure, and application of the user's personal information involved are all in compliance with relevant laws and regulations, take essential confidentiality measures, and do not violate public order and good customs.
- In the technical solution of the present disclosure, authorization or consent is obtained from the user before the user's personal information is obtained or collected.
- According to the embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
-
FIG. 6 is a block diagram of an electronic device used to implement voice processing in the embodiments of the present disclosure. -
FIG. 6 illustrates a schematic block diagram of an exampleelectronic device 600 that may be used to implement embodiments of the present disclosure. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the present disclosure described and/or required herein. - As shown in
FIG. 6 , thedevice 600 includes acomputing unit 601, which may execute various appropriate actions and processing according to a computer program stored in a read only memory (ROM) 602 or a computer program loaded from astorage unit 608 into a random access memory (RAM) 603. Various programs and data required for the operation of thedevice 600 may also be stored in theRAM 603. Thecomputing unit 601, theROM 602 and theRAM 603 are connected to each other through abus 604. An input/output (I/O)interface 605 is also connected to thebus 604. - The I/
O interface 605 is connected to a plurality of components of thedevice 600, including: aninput unit 606, such as a keyboard, a mouse, etc.; anoutput unit 607, such as various types of displays, speakers, etc.; astorage unit 608, such as a magnetic disk, an optical disk, etc.; and acommunication unit 609, such as a network card, a modem, a wireless communication transceiver, etc. Thecommunication unit 609 allows thedevice 600 to exchange information/data with other devices through the computer network such as the Internet and/or various telecommunication networks. - The
computing unit 601 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples ofcomputing unit 601 include, but are not limited to, central processing unit (CPU), graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing DSP and any appropriate processor, controller, microcontroller, etc. Thecomputing unit 601 executes the various methods and processes described above, such as the method of processing a voice for a vehicle. For example, in some embodiments, the method of processing a voice for a vehicle may be implemented as computer software programs, which are tangibly contained in the machine-readable medium, such as thestorage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed on thedevice 600 via theROM 602 and/or thecommunication unit 609. When the computer program is loaded into theRAM 603 and executed by thecomputing unit 601, one or more steps of the method of processing a voice for a vehicle described above may be executed. Alternatively, in other embodiments, thecomputing unit 601 may be configured to execute the method of processing a voice for a vehicle in any other suitable manner (for example, by means of firmware). - Various implementations of the systems and technologies described in the present disclosure may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGA), application specific integrated circuits (ASIC), application-specific standard products (ASSP), system-on-chip SOC, complex programmable logic device (CPLD), computer hardware, firmware, software and/or their combination. The various implementations may include: being implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, the programmable processor may be a dedicated or general programmable processor. The programmable processor may receive data and instructions from a storage system, at least one input device and at least one output device, and the programmable processor transmit data and instructions to the storage system, the at least one input device and the at least one output device.
- The program code used to implement the method of the present disclosure may be written in any combination of one or more programming languages. The program codes may be provided to the processors or controllers of general-purpose computers, special-purpose computers or other programmable data processing devices, so that the program code enables the functions/operations specific in the flowcharts and/or block diagrams to be implemented when the program code executed by a processor or controller. The program code may be executed entirely on the machine, partly executed on the machine, partly executed on the machine and partly executed on the remote machine as an independent software package, or entirely executed on the remote machine or server.
- In the context of the present disclosure, the machine-readable medium may be a tangible medium, which may contain or store a program for use by the instruction execution system, apparatus, or device or in combination with the instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the above-mentioned content. More specific examples of the machine-readable storage media would include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device or any suitable combination of the above-mentioned content.
- In order to provide interaction with users, the systems and techniques described here may be implemented on a computer, the computer includes: a display device (for example, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and a pointing device (for example, a mouse or trackball). The user may provide input to the computer through the keyboard and the pointing device. Other types of devices may also be used to provide interaction with users. For example, the feedback provided to the user may be any form of sensory feedback (for example, visual feedback, auditory feedback or tactile feedback); and any form (including sound input, voice input, or tactile input) may be used to receive input from the user.
- The systems and technologies described herein may be implemented in a computing system including back-end components (for example, as a data server), or a computing system including middleware components (for example, an application server), or a computing system including front-end components (for example, a user computer with a graphical user interface or a web browser through which the user may interact with the implementation of the system and technology described herein), or in a computing system including any combination of such back-end components, middleware components or front-end components. The components of the system may be connected to each other through any form or medium of digital data communication (for example, a communication network). Examples of communication networks include: local area network (LAN), wide area network (WAN) and the Internet.
- The computer system may include a client and a server. The client and the server are generally far away from each other and usually interact through the communication network. The relationship between the client and the server is generated by computer programs that run on the respective computers and have a client-server relationship with each other.
- It should be understood that the various forms of processes shown above may be used to reorder, add or delete steps. For example, the steps described in the present disclosure may be executed in parallel, sequentially or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure may be achieved, which is not limited herein.
- The above-mentioned implementations do not constitute a limitation on the protection scope of the present disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations and substitutions may be made according to design requirements and other factors. Any modification and improvement made within the principle of the present disclosure shall be included in the protection scope of the appended claims.
Claims (11)
- A method (200) of processing a voice for a vehicle (400), comprising:separating (S210, S320) an initial voice data (A) in response to receiving (S310) the initial voice data (A) from a plurality of regions inside the vehicle (400), so as to obtain a plurality of voice sub-data (a1, a2) and a description information for each voice sub-data of the plurality of voice sub-data (a1, a2), wherein the plurality of voice sub-data (a1, a2) correspond to the plurality of regions respectively, and the description information for each voice sub-data (a1, a2) indicates the region corresponding to the each voice sub-data (a1, a2) in the plurality of regions; anddetermining (S220) a voice working mode of the vehicle (400) based on the plurality of voice sub-data (a1, a2),wherein the determining (S220) a voice working mode of the vehicle (400) based on the plurality of voice sub-data comprises:performing (S330) a voice recognition on the plurality of voice sub-data respectively, so as to obtain a plurality of voice recognition results, wherein the plurality of voice recognition results correspond to the plurality of voice sub-data respectively; anddetermining the voice working mode of the vehicle (400) based on the plurality of voice recognition results,wherein the plurality of regions comprises a main driving region and a sub driving region; the plurality of voice sub-data comprises a first voice sub-data (a1) and a second voice sub-data (a2), a description information for the first voice sub-data (a1) indicates that the first voice sub-data (a1) is from the main driving region, and a description information for the second voice sub-data (a2) indicates that the second voice sub-data (a2) is from the sub driving region; andwherein the determining the voice working mode of the vehicle (400) based on thedetermining (S350) that the voice working mode of the vehicle (400) is a first voice working mode, in response to the voice recognition result corresponding to the first voice sub-data (a1) containing a first wake-up content; anddetermining (S380) that the voice working mode of the vehicle (400) is a second voice working mode, in response to the voice recognition result corresponding to the second voice sub-data (a2) containing a second wake-up content.
- The method according to claim 1, further comprising: controlling (S360) the vehicle (400) to operate based on the first voice working mode;
wherein the controlling (S360) the vehicle (400) to operate based on the first voice working mode comprises:extracting, from a received first target voice data (B), a third voice sub-data (b) from the main driving region;performing a voice recognition on the third voice sub-data (b), so as to obtain a first operation instruction (b2), wherein the first operation instruction (b2) is associated with the main driving region; andoperating based on the first operation instruction (b2). - The method according to claim 1, further comprising: controlling (S390) the vehicle (400) to operate based on the second voice working mode;
wherein controlling (S390) the vehicle (400) to operate based on the second voice working mode comprises:extracting, from a received second target voice data, a fourth voice sub-data from the sub driving region;performing a voice recognition on the fourth voice sub-data, so as to obtain a second operation instruction, wherein the second operation instruction is associated with the sub driving region; andoperating based on the second operation instruction. - The method according to claim 1, wherein the vehicle (400) comprises a main wake-up engine (422) and a sub wake-up engine (423); and
wherein the performing a voice recognition on the plurality of voice sub-data respectively, so as to obtain a plurality of voice recognition results comprises:recognizing the first voice sub-data (a1) by using the main wake-up engine (422), so as to obtain the voice recognition result for the first voice sub-data (ai); andrecognizing the second voice sub-data (a2) by using the sub wake-up engine (423), so as to obtain the voice recognition result for the second voice sub-data (a2). - The method according to any one of claims 1 to 4, wherein the separating an initial voice data comprises:
separating the initial voice data (A) by using a blind source separation algorithm. - An apparatus (500) of processing a voice for a vehicle, comprising:a processing module (510) configured to separate an initial voice data in response to receiving the initial voice data from a plurality of regions inside the vehicle, so as to obtain a plurality of voice sub-data and a description information for each voice sub-data of the plurality of voice sub-data, wherein the plurality of voice sub-data correspond to the plurality of regions respectively, and the description information for each voice sub-data indicates the region corresponding to the each voice sub-data in the plurality of regions; anda determining module (520) configured to determine a voice working mode of the vehicle based on the plurality of voice sub-data,wherein the determining module (520) comprises:a first recognizing sub-module configured to perform a voice recognition on the plurality of voice sub-data respectively, so as to obtain a plurality of voice recognition results, wherein the plurality of voice recognition results correspond to the plurality of voice sub-data respectively; anda determining sub-module configured to determine the voice working mode of the vehicle based on the plurality of voice recognition results,wherein the plurality of regions comprises a main driving region and a sub driving region; the plurality of voice sub-data comprises a first voice sub-data and a second voice sub-data, a description information for the first voice sub-data indicates that the first voice sub-data is from the main driving region, and a description information for the second voice sub-data indicates that the second voice sub-data is from the sub driving region; andwherein the determining sub-module comprises at least one of:a first determining unit configured to determine that the voice working mode of the vehicle is a first voice working mode, in response to the voice recognition result corresponding to the first voice sub-data containing a first wake-up content; anda second determining unit configured to determine that the voice working mode of the vehicle is a second voice working mode, in response to the voice recognition result corresponding to the second voice sub-data containing a second wake-up content.
- The apparatus according to claim 6, further comprising: a first controlling module configured to control the vehicle to operate based on the first voice working mode;wherein the first controlling module comprises:a first extracting sub-module configured to extract, from a received first target voice data, a third voice sub-data from the main driving region;a second recognizing sub-module configured to perform a voice recognition on the third voice sub-data, so as to obtain a first operation instruction, wherein the first operation instruction is associated with the main driving region; anda first operating sub-module configured to operate based on the first operation instruction,the apparatus further comprises: a second controlling module configured to control the vehicle to operate based on the second voice working mode;wherein the second controlling module comprises:a second extracting sub-module configured to extract, from a received second target voice data, a fourth voice sub-data from the sub main driving region;a third recognizing sub-module configured to perform a voice recognition on the fourth voice sub-data, so as to obtain a second operation instruction, wherein the second operation instruction is associated with the sub driving region; anda second operating sub-module configured to operate based on the second operation instruction.
- The apparatus according to claim 6, wherein the vehicle comprises a main wake-up engine and a sub driving engine; andwherein the first recognizing sub-module comprises:a first recognizing unit configured to recognize the first voice sub-data by using the main wake-up engine, so as to obtain the voice recognition result for the first voice sub-data; anda second recognizing unit configured to recognize the second voice sub-data by using the sub wake-up engine, so as to obtain the voice recognition result for the second voice sub-data,wherein the processing module (510) is further configured to:
separate the initial voice data by using a blind source separation algorithm. - An electronic device, comprising:at least one processor; anda memory communicatively connected with the at least one processor,wherein the memory stores instructions executable by the at least one processor, and the instructions, when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1 to 5.
- A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are configured to cause a computer to implement the method of any one of claims 1 to 5.
- A computer program product, comprising a computer program, wherein the computer program, when executed by a processor, implements the method of any one of claims 1 to 5.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110621889.5A CN113327608B (en) | 2021-06-03 | 2021-06-03 | Voice processing method and device for vehicle, electronic equipment and medium |
Publications (3)
Publication Number | Publication Date |
---|---|
EP4030424A2 EP4030424A2 (en) | 2022-07-20 |
EP4030424A3 EP4030424A3 (en) | 2022-11-02 |
EP4030424B1 true EP4030424B1 (en) | 2024-02-07 |
Family
ID=77419608
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP22176533.2A Active EP4030424B1 (en) | 2021-06-03 | 2022-05-31 | Method and apparatus of processing voice for vehicle, electronic device and medium |
Country Status (5)
Country | Link |
---|---|
US (1) | US20220293103A1 (en) |
EP (1) | EP4030424B1 (en) |
JP (1) | JP7383761B2 (en) |
KR (1) | KR20220082789A (en) |
CN (1) | CN113327608B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7186375B2 (en) * | 2018-03-29 | 2022-12-09 | パナソニックIpマネジメント株式会社 | Speech processing device, speech processing method and speech processing system |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1815556A (en) * | 2005-02-01 | 2006-08-09 | 松下电器产业株式会社 | Method and system capable of operating and controlling vehicle using voice instruction |
JP2009020423A (en) * | 2007-07-13 | 2009-01-29 | Fujitsu Ten Ltd | Speech recognition device and speech recognition method |
JP6603919B2 (en) * | 2015-06-18 | 2019-11-13 | 本田技研工業株式会社 | Speech recognition apparatus and speech recognition method |
CN109509465B (en) * | 2017-09-15 | 2023-07-25 | 阿里巴巴集团控股有限公司 | Voice signal processing method, assembly, equipment and medium |
US20190237067A1 (en) * | 2018-01-31 | 2019-08-01 | Toyota Motor Engineering & Manufacturing North America, Inc. | Multi-channel voice recognition for a vehicle environment |
US20220036877A1 (en) * | 2018-10-15 | 2022-02-03 | Mitsubishi Electric Corporation | Speech recognition device, speech recognition system, and speech recognition method |
CN109841214B (en) * | 2018-12-25 | 2021-06-01 | 百度在线网络技术(北京)有限公司 | Voice wakeup processing method and device and storage medium |
US10917717B2 (en) * | 2019-05-30 | 2021-02-09 | Nuance Communications, Inc. | Multi-channel microphone signal gain equalization based on evaluation of cross talk components |
WO2020240789A1 (en) * | 2019-05-30 | 2020-12-03 | 三菱電機株式会社 | Speech interaction control device and speech interaction control method |
US11170790B2 (en) * | 2019-06-27 | 2021-11-09 | Bose Corporation | User authentication with audio reply |
CN110648663A (en) * | 2019-09-26 | 2020-01-03 | 科大讯飞(苏州)科技有限公司 | Vehicle-mounted audio management method, device, equipment, automobile and readable storage medium |
CN111402877B (en) * | 2020-03-17 | 2023-08-11 | 阿波罗智联(北京)科技有限公司 | Noise reduction method, device, equipment and medium based on vehicle-mounted multitone area |
CN111599357A (en) * | 2020-04-07 | 2020-08-28 | 宁波吉利汽车研究开发有限公司 | In-vehicle multi-tone-area pickup method and device, electronic equipment and storage medium |
CN112017659A (en) * | 2020-09-01 | 2020-12-01 | 北京百度网讯科技有限公司 | Processing method, device and equipment for multi-sound zone voice signals and storage medium |
-
2021
- 2021-06-03 CN CN202110621889.5A patent/CN113327608B/en active Active
-
2022
- 2022-05-31 EP EP22176533.2A patent/EP4030424B1/en active Active
- 2022-05-31 KR KR1020220067069A patent/KR20220082789A/en unknown
- 2022-06-02 US US17/831,052 patent/US20220293103A1/en not_active Abandoned
- 2022-06-02 JP JP2022090504A patent/JP7383761B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
JP2022116285A (en) | 2022-08-09 |
EP4030424A2 (en) | 2022-07-20 |
JP7383761B2 (en) | 2023-11-20 |
CN113327608B (en) | 2022-12-09 |
EP4030424A3 (en) | 2022-11-02 |
US20220293103A1 (en) | 2022-09-15 |
KR20220082789A (en) | 2022-06-17 |
CN113327608A (en) | 2021-08-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11875799B2 (en) | Method and device for fusing voiceprint features, voice recognition method and system, and storage medium | |
JP7213943B2 (en) | Audio processing method, device, device and storage medium for in-vehicle equipment | |
EP3647914B1 (en) | Electronic apparatus and controlling method thereof | |
US11197094B2 (en) | Noise reduction method and apparatus based on in-vehicle sound zones, and medium | |
US20220301552A1 (en) | Method of performing voice wake-up in multiple speech zones, method of performing speech recognition in multiple speech zones, device, and storage medium | |
US20230066021A1 (en) | Object detection | |
EP4030424B1 (en) | Method and apparatus of processing voice for vehicle, electronic device and medium | |
CN113658586A (en) | Training method of voice recognition model, voice interaction method and device | |
EP4027336B1 (en) | Context-dependent spoken command processing | |
CN112017659A (en) | Processing method, device and equipment for multi-sound zone voice signals and storage medium | |
US20220392436A1 (en) | Method for voice recognition, electronic device and storage medium | |
US20230005490A1 (en) | Packet loss recovery method for audio data packet, electronic device and storage medium | |
JP2024537258A (en) | Voice wake-up method, device, electronic device, storage medium, and computer program | |
CN114399992B (en) | Voice instruction response method, device and storage medium | |
CN114220430A (en) | Multi-sound-zone voice interaction method, device, equipment and storage medium | |
CN115312042A (en) | Method, apparatus, device and storage medium for processing audio | |
CN114119972A (en) | Model acquisition and object processing method and device, electronic equipment and storage medium | |
US20230067861A1 (en) | Speech control method and apparatus, electronic device and storage medium | |
US20220343400A1 (en) | Method and apparatus for providing state information of taxi service order, and storage medium | |
CN114495923A (en) | Intelligent control system implementation method and device, electronic equipment and storage medium | |
CN118366438A (en) | Voice control method and device, electronic equipment and storage medium | |
CN116521113A (en) | Multi-screen control method and device and vehicle | |
CN114842839A (en) | Vehicle-mounted human-computer interaction method, device, equipment, storage medium and program product | |
CN114678023A (en) | Voice processing method, device, equipment, medium and vehicle for vehicle environment | |
CN114201225A (en) | Method and device for awakening function of vehicle machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20220531 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 21/0272 20130101ALN20220926BHEP Ipc: B60R 16/037 20060101ALI20220926BHEP Ipc: G10L 15/22 20060101AFI20220926BHEP |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 21/0272 20130101ALN20230718BHEP Ipc: G10L 15/08 20060101ALI20230718BHEP Ipc: G10L 15/18 20130101ALI20230718BHEP Ipc: B60R 16/037 20060101ALI20230718BHEP Ipc: G10L 15/22 20060101AFI20230718BHEP |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 21/0272 20130101ALN20230721BHEP Ipc: G10L 15/08 20060101ALI20230721BHEP Ipc: G10L 15/18 20130101ALI20230721BHEP Ipc: B60R 16/037 20060101ALI20230721BHEP Ipc: G10L 15/22 20060101AFI20230721BHEP |
|
INTG | Intention to grant announced |
Effective date: 20230816 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20231207 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602022001846 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG9D |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: MP Effective date: 20240207 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240607 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240508 |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 1655875 Country of ref document: AT Kind code of ref document: T Effective date: 20240207 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240507 Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240507 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240507 Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240607 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240508 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240607 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240607 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240207 |