CN115243104A

CN115243104A - Method and system for automatically adjusting vehicle-mounted multimedia volume

Info

Publication number: CN115243104A
Application number: CN202111438420.4A
Authority: CN
Inventors: 庞健宇; 李太华; 刘涵昱; 张亚; 陈俊伊; 于成龙
Original assignee: Guangzhou Automobile Group Co Ltd
Current assignee: Guangzhou Automobile Group Co Ltd
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2022-10-25

Abstract

The invention discloses a method and a system for automatically adjusting the volume of vehicle-mounted multimedia, wherein the method comprises the following steps: s1, when a passenger monitoring system monitors that a passenger mouth is opened, acquiring corresponding characteristic audio according to a real-time mouth shape, and sending the characteristic audio to a vehicle-mounted voice recognition system along with a monitoring signal; s2, the vehicle-mounted voice recognition system receives the monitoring signal sent by the passenger monitoring system and compares the coincidence degree of the real-time audio frequency in the vehicle and the characteristic audio frequency; and S3, when the coincidence degree of the real-time audio and the characteristic audio in the vehicle reaches a preset threshold value, the vehicle-mounted voice recognition system triggers and reduces the volume of the audio played by the vehicle-mounted multimedia equipment. The method and the device do not need to identify sentences possibly output by specific mouth shape combination, only need to output the characteristic audio according to the real-time mouth shape so as to compare the characteristic audio with the real-time audio in the vehicle, can quickly and automatically respond to the requirement of volume adjustment, and improve the riding experience.

Description

Method and system for automatically adjusting vehicle-mounted multimedia volume

Technical Field

The invention belongs to the technical field of intelligent networked automobiles, and particularly relates to a method and a system for automatically adjusting vehicle-mounted multimedia volume.

Background

In the current intelligent vehicle cabin technology, an OMS (Occupancy Monitoring System) has been gradually popularized, but the related functions are relatively few, and only have the functions of automatic window opening for smoking, reminding of articles left in a vehicle, monitoring of passenger emotion, basic gesture recognition and the like, and the richness is relatively insufficient. Many vehicle factories configure OMS, which belongs to hardware reservation, and have not abundant related functions and perfect development planning.

In the process of taking a car, if a passenger has a conversation with a driver when music is played in the car, a first sentence often appears and cannot be clearly heard, the music/multimedia volume needs to be manually adjusted and reduced to finish the conversation scene, and the volume cannot be manually increased after the conversation is finished; the whole body can influence the experience of talking and listening to music among passengers. Although the in-vehicle speech recognition has already been applied, the current in-vehicle speech recognition technology cannot correctly recognize whether the sound emitted from the in-vehicle environment belongs to the sound generated by passenger communication, and cannot monitor the speech content in real time before the in-vehicle speech recognition technology is awakened by an awakening word. Therefore, the function of automatically lowering and raising the volume of the background music media cannot be realized only by means of the in-vehicle voice recognition technology.

Disclosure of Invention

The technical problem to be solved by the embodiments of the present invention is to provide a method and a system for automatically adjusting the volume of a vehicle-mounted multimedia, so as to improve the riding experience.

In order to solve the technical problem, the invention provides a method for automatically adjusting the volume of vehicle-mounted multimedia, which comprises the following steps:

the method comprises the following steps that S1, when a passenger monitoring system monitors that a passenger mouth is opened, corresponding characteristic audio is obtained according to a real-time mouth shape, and the characteristic audio is sent to a vehicle-mounted voice recognition system along with a monitoring signal;

s2, the vehicle-mounted voice recognition system receives the monitoring signal sent by the passenger monitoring system and compares the coincidence degree of the real-time audio frequency in the vehicle and the characteristic audio frequency;

and S3, when the coincidence degree of the real-time audio frequency and the characteristic audio frequency in the vehicle reaches a preset threshold value, the vehicle-mounted voice recognition system triggers to reduce the volume of the audio frequency played by the vehicle-mounted multimedia equipment.

Further, in the step S1, acquiring the corresponding feature audio according to the real-time mouth shape specifically includes inputting the real-time mouth shape to a trained neural network, and outputting the feature audio corresponding to the real-time mouth shape, where the neural network is trained by using a mouth shape feature data set obtained by processing a standard pronunciation video and a voice feature extracted from the standard pronunciation video as inputs.

Further, in the step S2, the vehicle-mounted voice recognition system compares the coincidence degree of the real-time audio frequency in the vehicle and the characteristic audio frequency, specifically, compares the coincidence degree of the waveform of the real-time audio frequency in the vehicle and the waveform of the characteristic audio frequency, and includes stretching and shrinking the waveform of the characteristic audio frequency to match the waveform of the real-time audio frequency in the vehicle; in the step S3, when the contact ratio of the wave peaks reaches a preset threshold, the vehicle-mounted voice recognition system recognizes that the vehicle-mounted passenger is speaking, and triggers to reduce the volume of the vehicle-mounted multimedia device for playing the audio.

Further, the step S2 further includes: the vehicle-mounted voice recognition system compares the coincidence degree of the audio played by the vehicle-mounted multimedia equipment with the characteristic audio; the step S3 further includes: when the coincidence degree of the audio played by the vehicle-mounted multimedia equipment and the characteristic audio reaches a preset threshold value, the vehicle-mounted voice recognition system triggers and reduces the volume of the voice in the audio played by the vehicle-mounted multimedia equipment.

Further, if the opening and closing of the mouth of the passenger is not monitored in the step S1, or when the overlap ratio of the in-vehicle real-time audio and the characteristic audio is compared in the step S2, the overlap ratio of the in-vehicle real-time audio and the characteristic audio does not reach a preset threshold, the step S3 further includes: the vehicle-mounted voice recognition system compares the in-vehicle noise energy with the audio energy played by the vehicle-mounted multimedia equipment, and triggers and increases the audio volume played by the vehicle-mounted multimedia equipment when the in-vehicle noise energy is larger than the audio energy played by the vehicle-mounted multimedia equipment.

Further, in step S1, if the plurality of passenger monitoring systems monitor that the mouths of the corresponding passengers are open and closed, in step S2, the vehicle-mounted voice recognition system superimposes the feature audio sent by each passenger monitoring system, and then compares the real-time audio in the vehicle with the superimposed feature audio.

Further, after the step S3, the method further includes: the passenger monitoring system does not monitor the opening and closing of the passenger mouth, and the vehicle-mounted voice recognition system triggers and improves the volume of the audio played by the vehicle-mounted multimedia equipment to the initial volume.

The invention also provides a system for automatically adjusting the vehicle-mounted multimedia volume, which comprises a passenger monitoring system and a vehicle-mounted voice recognition system,

the passenger monitoring system is used for acquiring corresponding characteristic audio according to a real-time mouth shape when the opening of the passenger mouth is monitored, and sending the characteristic audio to the vehicle-mounted voice recognition system along with a monitoring signal;

the vehicle-mounted voice recognition system is used for receiving the monitoring signal sent by the passenger monitoring system and comparing the coincidence degree of the real-time audio frequency in the vehicle and the characteristic audio frequency; and when the coincidence degree of the real-time audio and the characteristic audio in the vehicle reaches a preset threshold value, the volume of the vehicle-mounted multimedia equipment for playing the audio is reduced in a triggering mode.

Further, the vehicle-mounted voice recognition system is further used for comparing the coincidence degree of the audio played by the vehicle-mounted multimedia equipment with the characteristic audio, and triggering and reducing the volume of the voice in the audio played by the vehicle-mounted multimedia equipment when the coincidence degree of the audio played by the vehicle-mounted multimedia equipment and the characteristic audio reaches a preset threshold value.

Further, if the passenger monitoring system does not monitor that the mouth of the passenger is opened or closed, or when the coincidence degree of the real-time audio frequency in the vehicle and the characteristic audio frequency is compared, the coincidence degree of the real-time audio frequency in the vehicle and the characteristic audio frequency does not reach a preset threshold value, the vehicle-mounted voice recognition system is further used for comparing the noise energy in the vehicle and the audio energy played by the vehicle-mounted multimedia equipment, and triggering to improve the audio volume played by the vehicle-mounted multimedia equipment when the noise energy in the vehicle is greater than the audio energy played by the vehicle-mounted multimedia equipment.

The implementation of the invention has the following beneficial effects: the method does not need to identify sentences possibly output by specific mouth shape combination, only needs to output the characteristic audio according to the real-time mouth shape for carrying out contact ratio comparison with the real-time audio in the vehicle, reduces the requirement on calculation force, can quickly and automatically respond to the requirement on audio volume adjustment, and improves the riding experience; the audio volume can be adjusted in a self-adaptive manner according to the noise level in the vehicle, so that the influence of noise on communication between passengers is reduced; and after the voice communication of the passengers is finished, the volume of the audio played by the vehicle-mounted multimedia equipment is recovered, and the audio played by the vehicle-mounted multimedia equipment is continuously listened when the passengers take the bus without being influenced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a method for automatically adjusting the volume of a vehicle-mounted multimedia according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments refers to the accompanying drawings, which are included to illustrate specific embodiments in which the invention may be practiced.

Referring to fig. 1, an embodiment of the present invention provides a method for automatically adjusting a volume of a vehicle-mounted multimedia, including:

s1, when a passenger monitoring system monitors that a passenger mouth is opened, acquiring corresponding characteristic audio according to a real-time mouth shape, and sending the characteristic audio to a vehicle-mounted voice recognition system along with a monitoring signal;

Specifically, in step S1, the passenger monitoring system OMS monitors the mouth shape of the passenger in real time, and when the mouth of the passenger opens and closes, the passenger acquires the possible characteristic audio according to the real-time mouth shape. It can be understood that obtaining characteristic audio according to a real-time mouth shape requires pre-learning (machine learning) in the early stage, and belongs to the field of image recognition, specifically, a pre-learning method is as follows: processing the standard pronunciation video to enable the frame rates of the standard pronunciation video to be equal, for example, 30f/s, tracking the face in the video, extracting mouth regions, adjusting all the mouth regions to be the same size, splicing the mouth regions to form a mouth-shaped feature data set by taking 15 frames as a sample (sample) unit, and inputting the mouth-shaped feature data sets into a coupled 3D convolutional neural network; meanwhile, voice features are extracted from the standard pronunciation video by using an FFmpeg frame, the voice features correspond to the mouth shape features within a required duration, the voice features are input to the coupled 3D convolution neural network, and the trained neural network is finally obtained through training. In specific application, the passenger monitoring system monitors opening and closing of the passenger mouth, inputs the real-time mouth shape into the trained neural network, and outputs the characteristic audio corresponding to the real-time mouth shape as a comparison object of the subsequent steps. It should be noted that, the existing lip language recognition model usually converts the continuous lip picture frames into the hanzi sequence of the hanzi sentence (the middle part can map the continuous lip picture frames to the pinyin sequence of the pinyin sentence first, and then translate the continuous lip picture frames from the pinyin sequence of the pinyin sentence to the hanzi sequence of the hanzi sentence), so that the input data of the lip language recognition model can be changed, and the voice feature is used to replace the hanzi sequence, thereby being applied to the embodiment. Of course, the embodiment of the invention does not need to output sentences according to the mouth shape, thereby reducing the requirement on computing power.

In step S2, the vehicle-mounted voice recognition system receives the signal of the OMS, and then starts to monitor the sound in the vehicle to obtain the real-time audio in the vehicle. And the vehicle-mounted voice recognition system compares the coincidence degree of the in-vehicle real-time audio and the characteristic audio, specifically the coincidence degree of the waveform of the in-vehicle real-time audio and the waveform of the characteristic audio given by the OMS. Stretching and shrinking the waveform of the characteristic audio, matching the waveform of the real-time audio in the vehicle, and finding out the highest contact ratio; if the contact ratio of the wave peaks reaches a preset threshold value (for example, 70%), it is recognized that the vehicle-mounted passenger is speaking, and the vehicle-mounted voice recognition system triggers and adjusts the vehicle-mounted multimedia volume to be reduced.

Further, step S2 further includes: the vehicle-mounted voice recognition system compares the coincidence degree of the audio played by the vehicle-mounted multimedia equipment with the characteristic audio; the step S3 further includes: when the coincidence degree of the audio played by the vehicle-mounted multimedia equipment and the characteristic audio reaches a preset threshold value, the vehicle-mounted voice recognition system triggers and reduces the volume of the voice in the audio played by the vehicle-mounted multimedia equipment. In the riding process, the vehicle-mounted multimedia equipment may be playing music, at the moment, if a passenger takes a song with an opening, the passenger monitoring system acquires corresponding characteristic audio according to the monitored real-time mouth shape of the passenger and sends the characteristic audio to the vehicle-mounted voice recognition system, and the vehicle-mounted voice recognition system compares the audio played by the vehicle-mounted multimedia equipment with the characteristic audio in a coincidence degree manner; when the coincidence degree of the audio frequency played by the vehicle-mounted multimedia equipment and the characteristic audio frequency reaches a preset threshold value, the vehicle-mounted voice recognition system triggers and reduces the volume of the voice in the audio frequency played by the vehicle-mounted multimedia equipment, so that a vocal accompaniment scene is created, and vocal accompaniment experience is brought to passengers.

When the coincidence degree of the real-time audio frequency in the vehicle and the characteristic audio frequency is compared in the step S2, if the opening and closing of the mouth of a passenger is not monitored in the step S1, or the coincidence degree of the real-time audio frequency in the vehicle and the characteristic audio frequency sent by the OMS does not reach a preset threshold value, in the step S3, the vehicle-mounted voice recognition system can adjust the audio volume played by the vehicle-mounted multimedia device according to the relation between the noise energy in the vehicle and the audio energy played by the vehicle-mounted multimedia device, and the specific mode is as follows: the vehicle-mounted voice recognition system compares the in-vehicle noise energy with the audio energy played by the vehicle-mounted multimedia equipment, and triggers and increases the audio volume played by the vehicle-mounted multimedia equipment when the in-vehicle noise energy is larger than the audio energy played by the vehicle-mounted multimedia equipment. That is to say, in the foregoing scenario, the speech recognition system of this embodiment may correspondingly adjust the volume of audio (e.g., music, song, etc.) played by the in-vehicle multimedia device according to the in-vehicle noise level, so as to reduce the influence of noise.

In addition, because the passenger monitoring system monitors each passenger in the vehicle individually, if a plurality of passengers are monitored by the corresponding passenger monitoring system to open and close the mouth, in step S2, the vehicle-mounted voice recognition system superposes the characteristic audio sent by each passenger monitoring system, and then the real-time audio in the vehicle and the superposed characteristic audio are subjected to coincidence comparison. The advantage of processing like this lies in, if a plurality of passengers chat in the low voice, according to the characteristic audio frequency that single passenger's mouth type obtained, probably with the interior real-time audio frequency overlap ratio of car not reach preset threshold value (can't trigger at this moment and reduce the volume of on-vehicle multimedia equipment broadcast audio frequency), through carrying out the coincidence ratio comparison to the characteristic audio frequency that obtains respectively according to a plurality of passenger's mouths, the characteristic audio frequency after the stack again carries out the overlap ratio with the interior real-time audio frequency of car, from this easier reaching preset threshold value, thus reduce the volume of on-vehicle multimedia equipment broadcast audio frequency, reduce the influence to communication between the passenger.

It can be understood that, after the communication between the passengers is completed, the passenger monitoring system does not monitor the opening and closing of the passenger's mouth, and the method further comprises: and the vehicle-mounted voice recognition system triggers and increases the volume of the audio played by the vehicle-mounted multimedia equipment to the initial volume. When the coincidence degree of the real-time audio and the characteristic audio in the vehicle reaches a preset threshold value, the vehicle-mounted voice recognition system triggers to reduce the volume of the vehicle-mounted multimedia equipment for playing the audio, so that the reduced volume of the vehicle-mounted multimedia equipment for playing the audio is increased to the level before adjustment in the subsequent process, the whole volume adjusting process is automatically completed, the volume of the vehicle-mounted multimedia equipment for playing the audio is reduced when voice communication of passengers is not disturbed, the volume of the vehicle-mounted multimedia equipment for playing the audio is recovered after the voice communication is completed, and the vehicle-mounted multimedia equipment for playing the audio is continuously listened when the passengers are not disturbed.

Corresponding to the method for automatically adjusting the volume of the vehicle-mounted multimedia in the first embodiment of the invention, the second embodiment of the invention provides a system for automatically adjusting the volume of the vehicle-mounted multimedia, which comprises a passenger monitoring system and a vehicle-mounted voice recognition system,

the vehicle-mounted voice recognition system is used for receiving the monitoring signal sent by the passenger monitoring system and comparing the coincidence degree of the real-time audio frequency in the vehicle and the characteristic audio frequency; and when the coincidence degree of the real-time audio frequency and the characteristic audio frequency in the vehicle reaches a preset threshold value, the volume of the vehicle-mounted multimedia equipment for playing the audio frequency is triggered and reduced.

Further, the vehicle-mounted voice recognition system is also used for comparing the contact ratio of the audio played by the vehicle-mounted multimedia equipment with the characteristic audio, and triggering and reducing the volume of the voice in the audio played by the vehicle-mounted multimedia equipment when the contact ratio of the audio played by the vehicle-mounted multimedia equipment and the characteristic audio reaches a preset threshold value.

For the working principle and process of the present embodiment, please refer to the description of the first embodiment of the present invention, which is not repeated herein.

As can be seen from the above description, compared with the prior art, the beneficial effects of the present invention are as follows: the method does not need to identify sentences possibly output by specific mouth shape combination, only needs to output the characteristic audio according to the real-time mouth shape for carrying out contact ratio comparison with real-time audio in the vehicle, reduces the requirement on calculation force, can quickly and automatically respond to the requirement on audio volume adjustment, and improves the riding experience; the audio volume can be adjusted in a self-adaptive manner according to the noise level in the vehicle, so that the influence of noise on communication between passengers is reduced; and after the voice communication of the passengers is finished, the volume of the audio played by the vehicle-mounted multimedia equipment is recovered, and the audio played by the vehicle-mounted multimedia equipment is continuously listened when the passengers take the bus without being influenced.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A method for automatically adjusting the volume of vehicle-mounted multimedia is characterized by comprising the following steps:

2. The method according to claim 1, wherein in step S1, the obtaining of the corresponding characteristic audio according to the real-time mouth shape includes inputting the real-time mouth shape to a trained neural network, and outputting the characteristic audio corresponding to the real-time mouth shape, and the neural network is trained by using a mouth shape characteristic data set obtained by processing a standard pronunciation video and a voice characteristic extracted from the standard pronunciation video as input.

3. The method according to claim 1, wherein in step S2, the comparing of the coincidence of the in-vehicle real-time audio and the characteristic audio by the in-vehicle voice recognition system specifically comprises comparing the coincidence of the waveform of the in-vehicle real-time audio and the waveform of the characteristic audio, including stretching and shrinking the waveform of the characteristic audio to match the waveform of the in-vehicle real-time audio; in the step S3, when the contact ratio of the wave peaks reaches a preset threshold, the vehicle-mounted voice recognition system recognizes that the vehicle-mounted passenger is speaking, and triggers to reduce the volume of the vehicle-mounted multimedia device for playing the audio.

4. The method of claim 3, wherein the step S2 further comprises: the vehicle-mounted voice recognition system compares the coincidence degree of the audio played by the vehicle-mounted multimedia equipment with the characteristic audio; the step S3 further includes: when the coincidence degree of the audio played by the vehicle-mounted multimedia equipment and the characteristic audio reaches a preset threshold value, the vehicle-mounted voice recognition system triggers and reduces the volume of the voice in the audio played by the vehicle-mounted multimedia equipment.

5. The method according to claim 3, wherein if the opening and closing of the mouth of the occupant is not monitored in step S1, or when the coincidence ratio between the in-vehicle real-time audio and the characteristic audio is compared in step S2, the coincidence ratio between the in-vehicle real-time audio and the characteristic audio does not reach a preset threshold, the step S3 further comprises: the vehicle-mounted voice recognition system compares the in-vehicle noise energy with the audio energy played by the vehicle-mounted multimedia equipment, and triggers and increases the audio volume played by the vehicle-mounted multimedia equipment when the in-vehicle noise energy is larger than the audio energy played by the vehicle-mounted multimedia equipment.

6. The method according to claim 1, wherein in step S1, if a plurality of passenger monitoring systems monitor that the mouths of corresponding passengers are open and closed, in step S2, the in-vehicle voice recognition system superimposes the characteristic audio sent by each passenger monitoring system, and then compares the in-vehicle real-time audio with the superimposed characteristic audio in a coincidence manner.

7. The method according to claim 1, wherein the step S3 is further followed by: the passenger monitoring system does not monitor the opening and closing of the passenger mouth, and the vehicle-mounted voice recognition system triggers and improves the volume of the audio played by the vehicle-mounted multimedia equipment to the initial volume.

8. A system for automatically adjusting the volume of vehicle-mounted multimedia is characterized by comprising a passenger monitoring system and a vehicle-mounted voice recognition system,

9. The system of claim 8, wherein the vehicle-mounted voice recognition system is further configured to compare a coincidence degree of an audio frequency played by the vehicle-mounted multimedia device with the characteristic audio frequency, and trigger a reduction in a volume of a voice in the audio frequency played by the vehicle-mounted multimedia device when the coincidence degree of the audio frequency played by the vehicle-mounted multimedia device with the characteristic audio frequency reaches a preset threshold value.

10. The system of claim 8, wherein if the occupant monitoring system does not monitor that the mouth of the occupant is open or closed, or when the coincidence of the real-time audio frequency in the vehicle and the characteristic audio frequency is compared, the coincidence of the real-time audio frequency in the vehicle and the characteristic audio frequency does not reach a preset threshold, the vehicle-mounted voice recognition system is further configured to compare the magnitude of the noise energy in the vehicle and the audio energy played by the vehicle-mounted multimedia device, and trigger the audio volume played by the vehicle-mounted multimedia device to be increased when the noise energy in the vehicle is greater than the audio energy played by the vehicle-mounted multimedia device.