CN110696756A

CN110696756A - Vehicle volume control method and device, automobile and storage medium

Info

Publication number: CN110696756A
Application number: CN201910954142.4A
Authority: CN
Inventors: 朱晚贺
Original assignee: Guangzhou Xiaopeng Motors Technology Co Ltd
Current assignee: Guangzhou Xiaopeng Motors Technology Co Ltd
Priority date: 2019-10-09
Filing date: 2019-10-09
Publication date: 2020-01-17

Abstract

The embodiment of the application provides a volume control method and device for a vehicle, the vehicle and a storage medium, and the volume control method comprises the following steps: continuously collecting sound data when the vehicle loudspeaker works; when the voice data contains the human voice signals meeting the preset conditions, the first volume control processing is carried out on the loudspeaker, the automatic volume adjustment is realized when the communication of the user is detected, the volume adjustment mode is judged by combining the current environment, and the accuracy of automatic volume adjustment is improved.

Description

Vehicle volume control method and device, automobile and storage medium

Technical Field

The present application relates to the field of speech recognition technologies, and in particular, to a method and an apparatus for controlling a volume of a vehicle, and a storage medium.

Background

With the development of science and technology, smart vehicles gradually become a common vehicle in people's lives, smart vehicles also carry more and more functions, and audio-video entertainment is one of the functions.

In the prior art, when the smart car plays audio or video, the vehicle-mounted system can detect the noise of the surrounding environment and correspondingly adjust the volume of the loudspeaker according to the noise, for example, when the environmental noise is large, the volume of the loudspeaker is increased.

However, when the user communicates with the third party, the voice of the conversation between the users is also recognized as ambient noise by the in-vehicle system, and the volume of the speaker is increased accordingly.

Disclosure of Invention

In view of the above problems, it is proposed to provide a volume control method and apparatus, a car, a storage medium for a vehicle that overcome or at least partially solve the above problems, including:

a method of volume control of a vehicle, the method comprising:

continuously collecting sound data when a vehicle loudspeaker works;

and when the voice data contains the human voice signal meeting the preset condition, carrying out first volume control processing on the loudspeaker.

Optionally, the playing duration of the speaker is greater than the playing duration threshold, and when the voice data contains a human voice signal meeting a preset condition, the step of performing a first volume control process on the speaker includes:

when the voice data contains the human voice signal containing the semantic meaning, turning down the volume output of the loudspeaker and/or pausing the volume output of the loudspeaker.

Optionally, the pausing of the volume output of the speaker comprises:

recording a breakpoint of the volume output of the loudspeaker;

the method further comprises the following steps:

and when the human voice signal containing the semantics existing in the voice data disappears, restoring the volume output of the loudspeaker which is turned down by taking the breakpoint as a starting point.

Optionally, the vehicle has a plurality of speakers, and when there is a human voice signal satisfying a preset condition in the sound data, the step of performing a first volume control process on the speakers includes:

when a human voice signal containing semantics exists in the voice data, acquiring the sound source position of the human voice signal;

determining a loudspeaker corresponding to the sound source position as a first loudspeaker;

turning down a volume output of the first speaker, and turning off speakers other than the first speaker in the vehicle.

Optionally, after the step of determining the sound source position of the human voice signal when the human voice signal containing the semantic meaning exists in the sound data, the method further includes:

acquiring a motion image of a vehicle user;

when the action image comprises a dialogue action, acquiring the position information of the vehicle user;

and determining the loudspeaker corresponding to the position information as a first loudspeaker.

Optionally, the method further comprises:

and when the dialogue action in the action image disappears, restoring the volume output of the loudspeaker corresponding to the position information.

when an adult human voice signal containing semantics exists in the voice data, acquiring a seat position of a child user in the vehicle and a sound source position of the adult human voice signal;

turning down a volume output of a second speaker corresponding to the sound source position, and maintaining a volume output of a third speaker corresponding to the child user's seat position.

A volume control device of a vehicle, the device comprising:

the acquisition module is used for continuously acquiring sound data when the vehicle loudspeaker works;

and the detection module is used for calling the first volume control module when the voice signal meeting the preset condition exists in the voice data.

A vehicle comprising a processor, a memory and a computer program stored on the memory and being executable on the processor, the computer program, when executed by the processor, implementing the steps of the method of volume control as described above.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of volume control as set forth above.

The embodiment of the application has the following advantages:

in this application embodiment, through at vehicle speaker during operation, continuously gather sound data, when having the people's voice signal who satisfies preset condition in the sound data, carry out first volume control to the speaker and handle, realized when detecting the user and exchange, automatically adjust the volume to combine current environment to judge the volume adjustment mode, improved volume automatically regulated's accuracy.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings needed to be used in the description of the present application will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.

Fig. 1 is a flowchart illustrating steps of a method for controlling a volume of a vehicle according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of volume control provided by an embodiment of the present application;

FIG. 3 is a flow chart illustrating steps of another method for controlling the volume of a vehicle according to an embodiment of the present application;

FIG. 4 is a flow chart illustrating steps of another method for controlling the volume of a vehicle according to an embodiment of the present application;

FIG. 5 is a flow chart illustrating steps of another method for controlling volume of a vehicle according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a volume control device of a vehicle according to an embodiment of the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, a flowchart illustrating steps of a method for controlling a volume of a vehicle according to an embodiment of the present application may be applied to an on-board system, and specifically, may include the following steps:

step 101, continuously collecting sound data when a vehicle loudspeaker works;

as an example, the sound data may be sound generated by a sound source inside or outside the vehicle.

In practical application, one or more speakers may be installed in the vehicle, and when the speakers work, audio may be played for a user, or voice information may be fed back to the user in the vehicle or a voice prompt may be provided.

For example, when a user clicks an operation key of the vehicle-mounted system, such as a music playing key, a video playing key, and a broadcast listening key on a large screen of the vehicle-mounted system, the vehicle-mounted system may acquire an audio file or a multimedia file from a local file or a cloud server, and the speaker starts to work and plays a related audio; for another example, when the user inputs a voice "small P" to help me acquire a navigation route of the memorial hall in zhongshan ", the in-vehicle system may reply" a navigation route has been acquired for you, please view the screen ", or, when the vehicle is over-speeding on the speed-limited road section, the in-vehicle system may broadcast a prompt tone" please do not over-speed "through the speaker.

When the loudspeaker works, the microphone in the vehicle can continuously collect sound data in and out of the vehicle, such as sound of talking and sending voice commands by users in the vehicle, sound generated when the vehicle operates, sound on the street and the like.

Specifically, when collecting the sound data, the microphone may collect only the sound data heard by the human ear. Because the sound that human ears can hear is concentrated between 20Hz to 20000Hz and is more sensitive to the sound in the range of 1000Hz to 3000Hz, when collecting the sound data, the sound data can be filtered in a physical way, for example, a band-pass filter with a frequency range of 20Hz to 20000Hz or 1000Hz to 3000Hz is used for extracting the frequency range of the collected sound data, or the collected sound data is processed by system software to filter the sound data that the human ears cannot hear.

Meanwhile, when the microphone collects the sound data, the loudspeaker plays the audio, the sound data generated by the audio is collected by the microphone, and at the moment, the vehicle-mounted system can identify the audio played by the vehicle-mounted player and filter the sound data of the audio from the collected sound data.

In an embodiment of the present application, a playing time threshold may be preset, and when the playing time of the speaker is greater than the threshold, the vehicle-mounted system may collect the sound data.

In a specific implementation, when audio is played, a user talks to another person at the same time, and for a sound audience, the sound of the talking user needs to be listened to more than the sound of the played audio.

Based on this, the vehicle-mounted system can give higher priority to the human voice signal, when the loudspeaker plays audio, such as playing an audio file or a multimedia file, the threshold value of the playing time length can be set to be thirty seconds, when the playing time length is greater than the threshold value, the audio needing to be played and amplified can be determined, in order to avoid that in the process of playing the audio for a long time, the voice of the surrounding environment is further collected by the vehicle-mounted system, whether the human voice signal exists is judged, and whether the volume needs to be adjusted is determined; when the playing time is less than the playing time threshold, if a short video of only ten seconds is played, the sound data is not collected, and the workload of the vehicle-mounted system is reduced.

Or, when the user starts the voice interaction function of the vehicle-mounted system through the awakening word and has a conversation with the voice assistant or the AI assistant, the vehicle-mounted system can judge whether the voice fed back by the voice assistant through the loudspeaker is greater than the play time threshold, and when the voice fed back by the voice assistant is greater than the play time threshold, the vehicle-mounted system can determine that a large voice section needs to be fed back to the user. For example, when a user asks the voice assistant a question "do you get, little P, do you have a girlfriend? "the voice assistant needs to reply" i do not, but there is always an unmanned aerial vehicle asking i not to see the outside world in the recent time ", then the vehicle-mounted system determines that the playing time of the voice is greater than the threshold, and when voice feedback is carried out, voice data can be further collected.

And step 102, when the voice data contains a human voice signal meeting a preset condition, performing first volume control processing on a loudspeaker.

As an example, the human voice signal may be a signal generated when a human voice is generated, such as a voice generated when a user makes a conversation, sings, or issues a voice command.

After the sound data is acquired, the sound data may be further analyzed, for example, sound characteristic data, such as sound pressure, frequency, amplitude, tone, phase, which may represent sound characteristics, may be acquired, the sound characteristic data may be comprehensively analyzed, whether a human voice signal meeting a preset condition exists in the sound data may be determined, and when a human voice signal meeting the condition exists, the vehicle-mounted system may automatically perform a first volume control process on the speaker.

In an embodiment of the present application, step 102 may include the following sub-steps: when the voice data contains the human voice signal containing the semantic meaning, turning down the volume output of the loudspeaker and/or pausing the volume output of the loudspeaker.

As an example, the human voice signal containing semantics may be a voice signal generated by a user at the time of a conversation.

In practical applications, ASR (Automatic Speech Recognition) technology and NLP (Natural Language Processing) technology may be combined to obtain and analyze the voice data, so as to determine whether there is a human voice signal containing semantics in the voice data.

Specifically, when determining whether there is a semantic human voice signal, the ASR technique may be first used to perform speech recognition on the voice data to convert the human voice data into text data, and the ASR technique may be used to convert the vocabulary content in human voice into computer-readable input, such as a keystroke, binary code, or character sequence.

After the sound data is collected, the sound data can be stored in a pure waveform file, such as a wav format file, the sound is subjected to framing processing, characteristic parameters are extracted, a characteristic parameter sequence is generated, then, a hidden markov model can be adopted to perform model training on the characteristic parameter sequence, a matched speech language model is obtained according to a model training result, and characters contained in the sound data are recognized.

After the characters contained in the voice data are determined, the NLP technology may be adopted to perform word segmentation, part-of-speech tagging, syntactic analysis, and other processing on the voice recognition result, and determine whether there is voice interaction in the content of the voice data, such as determining whether there is a question-and-answer conversation mode, whether different voices appear alternately, whether the sound source of the voice signal comes from different positions, and the like.

When the content in the sound data is determined to have voice interaction, it can be determined that the voice signal containing the semantic meaning exists in the sound data, and a user is in conversation in the vehicle, and at the moment, the volume output of the loudspeaker can be turned down, or the volume output of the loudspeaker can be suspended, so that the conversation of the user is prevented from being disturbed by the voice of the audio broadcasting.

In practical application, the specified volume can be preset, the audio data can be played without interfering with conversation of a user under the specified volume, and when the vehicle-mounted system reduces the playing volume, the current playing volume of the audio data can be recorded, and the volume can be adjusted to the specified volume within the preset time.

When the designated volume is set, the same designated volume such as 0dB can be set for all audio frequencies, or different designated volumes can be set for different audio frequency types, for example, the volume of normal conversation of the user is 50dB to 60dB, and then the volume of normal conversation can be adjusted to 30dB for relaxing background music or classical music, so as to provide a relaxing conversation environment for the user; for some exciting music or multimedia videos, such as electronic music or action movies, 0dB may be set, reducing the interference to the user's conversation.

In an embodiment of the present application, the volume output of the speaker may be paused using the following sub-steps: the breakpoint of the speaker output is recorded.

After the volume output of the loudspeaker is paused, the vehicle-mounted system can record the breakpoint of the volume output of the loudspeaker, and if the volume output of the loudspeaker is paused when the audio file is played for 3 minutes and 55 seconds, the position of the playing progress of 3 minutes and 55 seconds can be determined as the breakpoint of the volume output.

In an embodiment of the present application, the method may further include: and when the human voice signal containing the semantics existing in the voice data disappears, restoring the volume output of the loudspeaker which is turned down by taking the breakpoint as a starting point.

After the volume output of the loudspeaker is turned down and suspended, the vehicle-mounted system can continuously monitor the sound data in the vehicle, judge whether the voice signal containing the semantics still exists in the sound data, and when the voice signal containing the semantics is not detected in the preset time, if the voice of the user conversation is not received, or only the sound data containing the voice instruction is detected, the voice signal containing the semantics can be determined to disappear, the user does not communicate with a third party, at the moment, the breakpoint of the volume output can be used as a starting point, the playing volume is increased, and the volume output of the loudspeaker is recovered.

After the sound data is collected and analyzed, if no human voice signal is detected from the sound data, it can be determined that no user is speaking all around while the audio data is played, and the sound data collected by the microphone is environmental sound generated by the environment all around.

In practical applications, the ambient sound may be heard by the user, but may not interfere with the listening of the audio played in the car by the user, for example, when the ambient sound is not greater than the current volume of the speaker, the user may still hear the sound of the speaker.

Based on this, when the human voice signal is not detected, the decibel value of the sound can be further read from the sound data and compared with the volume threshold, wherein the volume threshold can be the current volume value of the speaker or a preset decibel value, such as 45 dB.

When the decibel value of the sound data is greater than the volume threshold, it may be determined that the user cannot clearly hear the sound of the speaker, and at this time, the volume output of the speaker may be increased.

In a specific implementation, when the vehicle-mounted system increases the volume output of the speaker, a table may be stored in advance, and the sound data is set to be at different decibel values for adjusting the target volume of the speaker, for example, when the decibel value falls within a range of 40dB to 50dB, the target volume may be set to 55dB, when the decibel value falls within a range of 50dB to 60dB, the target volume may be set to 60dB, and after the decibel value of the sound data is determined, the vehicle-mounted system may query the preset table to obtain the target volume.

Alternatively, a preset decibel value may be added to the current decibel value of the sound data to determine the target volume, for example, the preset decibel value may be set to 5dB, and when the decibel value is 44dB, the target volume may be 49 dB. When the target volume is generated by the method, a maximum decibel value, such as 65dB, can be set, and when the calculated target volume is greater than the maximum decibel value, the maximum decibel value can be used as the target volume.

After determining the target volume for adjusting the volume output of the speaker, the in-vehicle system may increase the volume of the speaker uniformly for a preset time, or may increase the volume per unit time, for example, increase the volume by one unit every 1 second, until the target volume is adjusted and the volume output of the speaker is increased.

In order to enable those skilled in the art to better understand the above steps, the following is an example to illustrate the embodiments of the present invention, but it should be understood that the embodiments of the present invention are not limited thereto. Referring to fig. 2, a flow chart of volume control provided by an embodiment of the present application is shown.

After entering a vehicle, a user can click a key on a large screen of the vehicle-mounted system to generate a playing instruction to listen to audio or watch multimedia video, or the user can start an automatic playing function in advance, and when the vehicle-mounted system acquires facial data of the user through a camera, detects that the user is on a seat through a pressure sensor, or collects a preset voice instruction or voice information through a microphone, the user can be determined to enter the vehicle, and the playing instruction is automatically generated; for another example, the user may generate the play instruction through voice input, such as inputting voice "you are, little P, i want to listen to a song", and in response to the voice input, the in-vehicle system may receive the play instruction and perform audio playing.

In response to the playing instruction, the vehicle-mounted system can display the multimedia picture on a large screen of the vehicle-mounted system, play the audio and simultaneously turn on the microphone. In the audio playing process, the microphone continuously receives sound, sound data around the microphone are collected and then sent to the vehicle-mounted system, and the sound data are recognized by adopting an ASR technology.

When the user starts a conversation, as the main driver asks the rear passengers "do you feel how today is the weather? ", the back row passenger reverts to" being upright. "then the in-vehicle system may get to" how do you feel today? 'and' are well erected. "corresponding voice data, at which time the in-vehicle system performs voice recognition on the data.

After speech recognition, the onboard system can analyze the specific meaning of the sound using NLP technology, in combination with "how do you feel today? And is well erected. After that, it can be determined that the two sentences are related up and down, and it can be recognized that the two sentences are from two different sound sources (i.e. from two persons), it can be determined that the user is conversing, and accordingly, the volume output of the speaker is turned down, and the volume fading process is realized.

Or, when the user in the vehicle answers the call and talks with the user at the other end of the call, the vehicle-mounted system is referred to as "i go to the vehicle exhibition today", although the vehicle-mounted system only collects the voice data of one user, after voice recognition and semantic analysis are performed, it can be determined that a human voice signal exists in the voice data, and the vehicle-mounted system can also turn down the volume output of the loudspeaker.

During the conversation of the user, the vehicle-mounted system can continuously recognize the sound data, and when the voice signal is not recognized from the sound data, the communication between the user and the third party is determined to be finished, the volume is faded in, and the audio is continuously played. The volume adjustment mode displayed above can be matched with the communication condition between the user and the third party, the volume is automatically increased or reduced, the opportunity that the voice information is acquired by the user is increased, and the user experience is improved.

In this application embodiment, through at vehicle speaker during operation, continuously gather sound data, when having the people's voice signal who satisfies preset condition in the sound data, carry out first volume control to the speaker and handle, realized when detecting the user and exchange, adjust the volume automatically, combine current environment to judge the volume adjustment mode, improved volume automatically regulated's accuracy.

Referring to fig. 3, a flowchart illustrating steps of another vehicle volume control method according to an embodiment of the present application may be applied to an on-board system, and specifically, may include the following steps:

step 301, continuously collecting sound data when a vehicle loudspeaker works;

the sound data and the sound data may be sounds generated by sound sources inside or outside the vehicle, such as sounds of users inside the vehicle, sounds generated when the vehicle is running, sounds on the street, and the like.

After a user clicks an operation key of the vehicle-mounted system, such as a music playing key, a video playing key and a broadcast listening key on a large screen of the vehicle-mounted system, the vehicle-mounted system responds to the user operation, can acquire an audio file or a multimedia file from a local or cloud server, a loudspeaker starts to work, relevant files are played, and in the playing process, a microphone in the vehicle can continuously acquire sound data inside and outside the vehicle.

In an embodiment of the application, a plurality of speakers are arranged in a carriage of a vehicle, and when an on-board system plays audio, the audio can be played simultaneously through the plurality of speakers, for example, for a car with two seats, the speakers can be arranged at a main driving position and a secondary driving position of the vehicle respectively; for a four-seat automobile, speakers may be provided in the front and rear rows of the vehicle, or may be provided separately in the main driver's seat, the passenger seat, and the passenger seat in the rear row.

Step 302, when a human voice signal containing semantics exists in the voice data, acquiring a sound source position of the human voice signal;

when the audio is played through the plurality of loudspeakers, the vehicle-mounted system can identify whether the voice data contains the human voice signals containing the semantics, and when the human voice signals containing the semantics are determined to exist, the vehicle-mounted system can determine that a user is in the vehicle for talking, further determine the sound source position of the human voice signals, and determine the seat of the user participating in the talking in the vehicle.

In a specific implementation, a microphone or a microphone array may be disposed in the car to receive the vocal signals of different vocal regions in the car, for example, for a two-seat car, a main driving vocal region and a sub-driving vocal region may be disposed, and for a four-seat car, a main driving vocal region, a sub-driving vocal region, a rear left vocal region and a rear right vocal region may be disposed.

When the users in the vehicle talk, due to different positions, the voice signals received by the microphones have time difference and signal strength difference, and therefore, the sources of the voice signals and the user seats can be determined.

Step 303, determining a loudspeaker corresponding to the sound source position as a first loudspeaker;

after determining the sound source location, the in-vehicle system may further determine a first speaker corresponding to the sound source location. In practical applications, a list of tables can be preset in the vehicle-mounted system as shown in the following table, and the relationship between different positions in the vehicle and the speakers is stored, so that after the sound source position is determined, the corresponding speaker identifier can be looked up in the table, and then the speaker identifier is determined as the first speaker, for example, after the user is determined to be in the main driving seat and the rear left seat, the speaker # 1 and the speaker # 4 can be determined as the first speaker.

Vehicle seat	Loudspeaker identification
		Main driver seat	1
Front passenger seat	2
		Rear row right seat	3
Rear left seat	4

TABLE 1 vehicle seat and speaker identification relationship List

Step 304, turning down the volume output of the first speaker, and turning off speakers in the vehicle except the first speaker.

After the first speaker is determined, the plurality of speakers may be adjusted in different ways.

In practical application, because the sound source position can be the current position of the user, and the conversation of the user can be temporary, the vehicle-mounted system can turn down the volume output of the first loudspeaker without turning off the first loudspeaker, when the user stops the conversation, the vehicle-mounted system can timely recover the volume output of the loudspeaker, the loudspeaker does not need to be turned on again in a short time, and the abrasion of devices of the loudspeaker is reduced.

For the speakers except the first speaker in the vehicle, the vehicle-mounted system does not acquire the human voice signals at the position of the speaker, so that the situation that no user exists at the position can be determined, and the speakers can be directly turned off when being adjusted, so that the energy consumption is reduced.

In this application embodiment, through when vehicle speaker is worked, continuously gather sound data, wherein, the vehicle has a plurality of speakers, when there is the human voice signal who contains the semanteme in the sound data, confirms the sound source position of human voice signal, the volume output of the first speaker that adjusts down and correspond with the sound source position to close the speaker except first speaker in the car, realized the various adjustment to the speaker, when detecting the user's conversation, the speaker of different seats has been adjusted to pertinence, has improved the accuracy of volume automatically regulated.

Referring to fig. 4, a flowchart illustrating steps of another vehicle volume control method according to an embodiment of the present application may be applied to an on-vehicle system, and specifically, may include the following steps:

step 401, continuously collecting sound data when a vehicle loudspeaker works;

as an example, a plurality of speakers may be installed in a vehicle, and the sound data and sound data may be sounds generated from a sound source inside or outside the vehicle, such as a user's sound inside the vehicle, a sound generated while the vehicle is operating, a sound on the street, and the like.

After a user clicks an operation key of the vehicle-mounted system, such as a music playing key, a video playing key and a broadcast listening key on a large screen of the vehicle-mounted system, the vehicle-mounted system responds to the user operation, can acquire an audio file or a multimedia file from a local or cloud server, a plurality of loudspeakers in the vehicle start to work, play related files, and in the playing process, a microphone in the vehicle can continuously acquire sound data in and out of the vehicle.

Step 402, when a human voice signal containing semantics exists in the voice data, acquiring a sound source position of the human voice signal;

when audio is played through a plurality of loudspeakers, the vehicle-mounted system can identify whether voice signals containing semantics exist in the voice data, and when the voice signals containing semantics exist, a user in the vehicle can be determined to talk.

Step 403, determining that the loudspeaker corresponding to the sound source position is a first loudspeaker;

after determining the sound source location, the in-vehicle system may further determine a first speaker corresponding to the sound source location. In practical application, a list can be preset in the vehicle-mounted system, the relationship between different positions in the vehicle and the loudspeaker is stored, and after the sound source position is determined, the corresponding loudspeaker identifier can be searched in the list, so that the loudspeaker identifier is determined as the first loudspeaker.

Step 404, collecting an action image of a vehicle user;

specifically, the user images may be captured by installing one or more cameras in the vehicle, for example, the cameras may be installed on the top of the vehicle cabin, or the cameras may be installed in each seating area. After the speaker at the sound source position is determined as the first speaker, the vehicle-mounted system may collect the motion image of the user within a preset time, for example, five minutes from the time of acquiring the human voice signal, and further determine the user participating in the conversation in the vehicle.

Step 405, when the action image comprises a dialogue action, acquiring the position information of the vehicle user;

as an example, the dialogue action may include a lip vocalization action, a forward leaning conversation action, a forward leaning listening action, an eye attention action, and the like.

In practical applications, when a plurality of users talk, some users may not speak, for example, when a user a in a vehicle talks with a user B, the user C does not speak, but listens to the speech of the user a and the user B all the time, while the user B talks and does not often make an opinion, but only occasionally says "kaye", and in response, when it is determined whether the user is involved in the conversation only by a human voice signal, the vehicle-mounted system may determine that the user C does not participate in the conversation because the user C does not speak, and the speaker at the position of the user C is determined as the first speaker; the human voice signal generated by the user B occasionally answering a "kay" is not necessarily collected in case the microphone is not sensitive.

Based on the above, the vehicle-mounted system can further perform image recognition on the action image on the basis of determining the position of the sound source, judge whether the image contains the conversation action, determine that the user participates in the conversation when the action image always comprises the conversation action, and acquire the position information of the user.

For example, if the body of the user C leans forward without speaking, and the in-vehicle system recognizes the motion image of the user C and then determines that there is a forward leaning talking motion, it is possible to acquire the position information of the user C in the vehicle and determine the seat position.

Step 406, determining that the loudspeaker corresponding to the position information is a first loudspeaker;

after the position information is acquired, the vehicle-mounted system searches a loudspeaker corresponding to the position information in a preset list and determines the loudspeaker as a first loudspeaker.

Step 407, turning down the volume output of the first speaker, and turning off speakers in the vehicle except the first speaker;

after determining the first speaker, the in-vehicle system may adjust the plurality of speakers in different ways.

In a specific implementation, because the conversation of the user can be temporary, the vehicle-mounted system can turn down the volume output of the first loudspeaker without turning off the first loudspeaker, when the user stops the conversation, the vehicle-mounted system can timely resume playing the volume output of the loudspeaker without turning on the loudspeaker again in a short time, and the abrasion of devices of the loudspeaker is reduced; and for the speakers except the first speaker in the vehicle, the speakers can be directly turned off during adjustment, so that the energy consumption is reduced.

And step 408, when the dialogue action in the action image disappears, the volume output of the loudspeaker corresponding to the position information is recovered.

When the user's dialogue action disappears in the action image, for example, the user does not speak, does not have lip sounding action, or does not have forward leaning action for a preset time, but resumes a normal sitting posture or leans on a seat, or the focal points of both eyes are no longer focused on the user at the sound source position, and with the cheek rest with both hands, the in-vehicle system can determine that the user is no longer involved in the conversation, and then can resume the speaker volume output at the user position.

In the embodiment of the application, when a vehicle loudspeaker works, sound data are continuously collected, when a human voice signal containing semantics exists in the sound data, the sound source position of the human voice signal is obtained, a loudspeaker corresponding to the sound source position is determined to be a first loudspeaker, an action image of a vehicle user is collected, when the action image comprises a dialogue action, position information of the vehicle user is obtained, and the loudspeaker corresponding to the position information is determined to be the first loudspeaker; the volume output of the first loudspeaker is reduced, the loudspeakers except the first loudspeaker in the vehicle are closed, when the dialogue action in the action image disappears, the volume output of the loudspeaker corresponding to the position information is recovered, the various adjustment of the loudspeakers is realized, users participating in the dialogue in the vehicle can be determined by combining the sound source position and the action image, and the accuracy of the volume output adjustment of the loudspeakers is improved.

Referring to fig. 5, a flowchart illustrating steps of another vehicle volume control method according to an embodiment of the present application may be applied to an on-vehicle system, and specifically, may include the following steps:

step 501, continuously collecting sound data when a vehicle loudspeaker works;

as an example, a plurality of speakers may be installed in the vehicle, and the sound data may be sound generated from a sound source inside or outside the vehicle, such as sound of a user inside the vehicle, sound generated while the vehicle is operating, sound on the street, and the like.

After a user clicks an operation key of the vehicle-mounted system, such as a music playing key, a video playing key and a broadcast listening key on a large screen of the vehicle-mounted system, the vehicle-mounted system responds to the user operation, can acquire an audio file or a multimedia file from a local or cloud server, a loudspeaker starts to work, relevant audio data is played, and in the playing process, a microphone in the vehicle can continuously acquire sound data inside and outside the vehicle.

In practical applications, the audio played by the speaker may be audio files or multimedia files of children's themes, such as children's songs, children's movies, animations, etc. Before playing the audio, the vehicle-mounted system can acquire the theme type of the file from the cloud server in advance, or read the theme type of the file from the file identifier.

Step 502, when an adult human voice signal containing semantics exists in the sound data, acquiring a seat position of a child user in the vehicle and a sound source position of the adult human voice signal;

as an example, the adult human voice signal may be a voice signal generated by an inter-adult conversation.

When the theme type of the audio is a child theme, the vehicle-mounted system can determine that a child user exists in the vehicle, and can further judge whether the human voice signal only contains the sound of adult voice interaction when the human voice signal containing the semantic meaning is collected.

When the adult human voice signals containing semantics exist in the voice data, the fact that the adult users in the vehicle are in conversation can be determined, and the child users do not participate in the conversation can be determined.

In a specific implementation, whether the user is a child may be determined by providing a pressure sensor on a seat of the vehicle, for example, the weight of the child is mainly 10Kg to 65Kg, and when the weight obtained by the pressure sensor is within this interval, the user on the seat may be determined to be a child; or, the facial features of the user on the seat can be acquired through the camera in the vehicle, the age range of the user is judged, and then the seat position of the child user is determined.

When the sound source position of the human voice signal is obtained, a microphone or a microphone array can be arranged in the carriage, the time difference and the signal intensity difference of the received sound data are analyzed, and the sound source position of the human voice signal is determined.

Step 503, turning down the volume output of the second speaker corresponding to the sound source position, and maintaining the volume output of the third speaker corresponding to the child user's seat position.

After determining the position of the child user and the position of the adult user participating in the conversation, in order to avoid disturbing the conversation between the adult users with the audio data without hindering the child user from continuing to listen to the audio data, at this time, the volume output of the second speaker at the sound source position may be turned down, and the play volume of the third speaker at the seat position of the child user is maintained.

In order to enable those skilled in the art to better understand the above steps, the following is an example to illustrate the embodiments of the present invention, but it should be understood that the embodiments of the present invention are not limited thereto.

The user A (father), the user B (mother) and the user C (child) take a four-seat vehicle together, the user A is a primary driver, the user B is a secondary driver, the user C is a passenger in the back row and sits at the position close to the left of the back row, and a microphone and a loudspeaker are respectively installed in the primary driving area, the secondary driving area, the left area of the back row and the right area of the back row in the vehicle.

When the user A clicks the vehicle-mounted large screen to play a juvenile song, the vehicle-mounted system determines that a juvenile user exists in the vehicle and continuously collects sound data. When user a is talking to user B, but user C is not participating, the in-vehicle system may recognize the adult human voice signal containing semantics and further determine the location of the sound source in the adult human voice signal and the child user's seat.

After determining that the adult human voice signals come from the main driving area and the assistant driving area respectively, and the user C sits at the left position of the rear row, the vehicle-mounted system reduces the volume output of the speakers of the main driving area and the assistant driving area and keeps the volume of the left area of the rear row.

In the embodiment of the application, when the vehicle speakers work, the sound data are continuously collected, and the vehicle is provided with the speakers, so that when the adult human voice signals containing semantics exist in the sound data, the seat position of a child user in the vehicle and the sound source position of the adult human voice signals can be obtained, the volume output of the second speaker corresponding to the sound source position is reduced, the volume output of the third speaker corresponding to the seat position of the child user is kept, the targeted adjustment of the playing volume is realized, when the user conversation is detected, the playing volume is adjusted according to the user type, and the user experience is improved.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.

Referring to fig. 6, a schematic structural diagram of a volume control device for a vehicle according to an embodiment of the present application is shown, which may be applied to an on-vehicle system, and specifically, may include the following modules:

the acquisition module 601 is used for continuously acquiring sound data when a vehicle loudspeaker works;

the detecting module 602 is configured to invoke the first volume control module when a human voice signal meeting a preset condition exists in the sound data.

In an embodiment of the present application, the playing duration of the speaker is greater than the playing duration threshold, and the detecting module 602 includes:

and the first volume output adjusting submodule is used for reducing the volume output of a loudspeaker and/or calling a pause unit when a human voice signal containing semantics exists in the voice data.

In an embodiment of the present application, the suspension unit includes:

the breakpoint recording subunit is used for recording the breakpoint of the volume output of the loudspeaker;

the device further comprises:

and the first volume output recovery module is used for recovering the volume output of the loudspeaker which is turned down by taking the breakpoint as a starting point when the human voice signal containing the semantic meaning in the voice data disappears.

In an embodiment of the present application, the vehicle has a plurality of speakers, and the detecting module 602 includes:

the first positioning submodule is used for acquiring the sound source position of the human voice signal when the human voice signal containing the semantics exists in the sound data;

the first determining submodule is used for determining the loudspeaker corresponding to the sound source position as a first loudspeaker;

and the second volume output adjusting submodule is used for reducing the volume output of the first loudspeaker and closing the loudspeakers in the vehicle except the first loudspeaker.

In an embodiment of the present application, the detecting module 602 further includes:

the acquisition submodule is used for acquiring an action image of a vehicle user;

the first position acquisition submodule is used for acquiring the position information of the vehicle user when the action image comprises a dialogue action;

and the second determining submodule is used for determining the loudspeaker corresponding to the position information as the first loudspeaker.

In an embodiment of the present application, the apparatus further includes:

and the second volume output recovery module is used for recovering the volume output of the loudspeaker corresponding to the position information when the dialogue action in the action image disappears.

the second positioning submodule is used for acquiring the seat position of a child user in the vehicle and the sound source position of the adult human voice signal when the adult human voice signal containing the semantic meaning exists in the sound data;

and the volume adjusting submodule is used for reducing the volume output of the second loudspeaker corresponding to the sound source position and keeping the volume output of the third loudspeaker corresponding to the seat position of the child user.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

An embodiment of the present application also provides a vehicle, which may include a processor, a memory, and a computer program stored on the memory and capable of running on the processor, wherein the computer program, when executed by the processor, implements the steps of the volume control method of the vehicle as above.

An embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the volume control method of the vehicle as above.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The method and apparatus for controlling volume of a vehicle, the vehicle, and the storage medium are described in detail above, and the principle and the embodiment of the present application are explained herein by applying specific examples, and the description of the above embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of controlling volume of a vehicle, the method comprising:

continuously collecting sound data when a vehicle loudspeaker works;

2. The method according to claim 1, wherein the playing duration of the speaker is greater than the playing duration threshold, and the step of performing the first volume control process on the speaker when the human voice signal meeting the preset condition exists in the sound data includes:

3. The method of claim 2, wherein said pausing the volume output of the speaker comprises:

recording a breakpoint of the volume output of the loudspeaker;

the method further comprises the following steps:

4. The method according to claim 1, wherein the vehicle has a plurality of speakers, and the step of performing a first volume control process on the speakers when there is a human voice signal satisfying a preset condition in the sound data includes:

5. The method according to claim 4, wherein after the step of determining the sound source position of the human voice signal when the human voice signal containing the semantic meaning exists in the sound data, the method further comprises:

acquiring a motion image of a vehicle user;

6. The method of claim 5, further comprising:

7. The method according to claim 1, wherein the vehicle has a plurality of speakers, and the step of performing a first volume control process on the speakers when there is a human voice signal satisfying a preset condition in the sound data includes:

8. A volume control apparatus of a vehicle, characterized in that the apparatus comprises:

9. A vehicle comprising a processor, a memory and a computer program stored on the memory and operable on the processor, the computer program when executed by the processor implementing the steps of the method of volume control of a vehicle as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the volume control method of a vehicle according to any one of claims 1 to 7.