CN112017636A

CN112017636A - Vehicle-based user pronunciation simulation method, system, device and storage medium

Info

Publication number: CN112017636A
Application number: CN202010881113.2A
Authority: CN
Inventors: 张文瑜
Original assignee: Volkswagen Mobvoi Beijing Information Technology Co Ltd
Current assignee: Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2020-12-01
Anticipated expiration: 2040-08-27
Also published as: CN112017636B

Abstract

The embodiment of the invention discloses a vehicle-based user pronunciation simulation method, a vehicle-based user pronunciation simulation system, vehicle-based user pronunciation simulation equipment and a storage medium. The method comprises the following steps: acquiring target playing data from a prestored voice data set; determining playing control parameters matched with playing equipment in the vehicle according to the vehicle environment sound and/or the semantic information of the target playing data, wherein the position of the playing equipment in the vehicle is matched with the position of the real human mouth; and controlling the playing equipment to play the target playing data according to the playing control parameters. The method can play the target playing data through the playing equipment, replaces the mode of voice acquisition by a speaker getting on the vehicle in the prior art, can reduce the voice simulation cost, and has simple implementation mode and high voice simulation efficiency; the playing control parameters of the playing device are determined through the vehicle environment sound and/or semantic information, so that the playing quality of the playing device can achieve the same quality effect as that of a speaker who gets on the vehicle to acquire voice.

Description

Vehicle-based user pronunciation simulation method, system, device and storage medium

Technical Field

The embodiment of the invention relates to the technical field of intelligent automobiles, in particular to a user pronunciation simulation method, a system, equipment and a storage medium based on a vehicle.

Background

With the development of vehicle intelligent application, more and more automobile built-in voice recognition technologies are used for collecting user voice and recognizing voice instructions, and corresponding functions are realized according to the voice instructions.

In the prior art, the voice of a plurality of speakers is collected in advance to carry out experiments in different real environments, so that the voice recognition is accurate. In order to ensure the real driving environment of the experiment, a speaker is required to get on the vehicle for voice acquisition, and meanwhile, the vehicle needs to reach the designated speed and the vehicle window and the air conditioner need to be adjusted to the designated state.

However, the time occupied by the speaker in the prior art comprises the waiting time for the vehicle to reach the specified state and the voice acquisition time, the time occupied by the speaker is long, and the cost for paying the speaker is high. Meanwhile, in some driving states, the voice acquisition on the vehicle may cause discomfort of the speaker, for example, when the vehicle bumps or the temperature in the vehicle is low, the speaker is unwilling to participate in the voice acquisition, the voice acquisition is difficult, and the number of samples for voice acquisition is affected. In addition, the voices of different vehicle types cannot be copied, a speaker needs to get on the vehicle for multiple times for collection, and the voice collection efficiency is low.

Disclosure of Invention

The embodiment of the invention provides a vehicle-based user pronunciation simulation method, a vehicle-based user pronunciation simulation system, a vehicle-based user pronunciation simulation device and a storage medium, which can reduce pronunciation simulation cost, improve pronunciation simulation efficiency and ensure pronunciation simulation quality.

In a first aspect, an embodiment of the present invention provides a vehicle-based user pronunciation simulation method, including:

acquiring target playing data from a prestored voice data set;

determining playing control parameters matched with playing equipment in the vehicle according to the vehicle environment sound and/or the semantic information of the target playing data, wherein the position of the playing equipment in the vehicle is matched with the position of the real human mouth;

and controlling the playing equipment to play the target playing data according to the playing control parameters.

In a second aspect, an embodiment of the present invention further provides a vehicle-based user pronunciation simulation apparatus, including:

the target playing data acquisition module is used for acquiring target playing data from pre-stored voice data set;

the playing control parameter determining module is used for determining playing control parameters matched with playing equipment in the vehicle according to vehicle environment sound and/or semantic information of the target playing data, wherein the position of the playing equipment in the vehicle is matched with the position of a real human mouth;

and the playing control module is used for controlling the playing equipment to play the target playing data according to the playing control parameters.

In a third aspect, an embodiment of the present invention further provides a vehicle-based user pronunciation simulation system, including: the system comprises a processor, a playing device, a sound acquisition assembly, a sound card, a digital signal processor and an audio detection module;

the playing device, the sound collection assembly, the sound card, the digital signal processor and the audio detection module are all electrically connected with the processor; the playing device, the sound collecting assembly, the sound card, the digital signal processor and the audio detection module are electrically connected in sequence; the position of the playing device in the vehicle is matched with the position of the real human mouth;

the processor is used for acquiring target playing data from a pre-stored voice data set; determining playing control parameters matched with playing equipment in the vehicle according to the vehicle environment sound and/or the semantic information of the target playing data; sending the playing control parameters and the target playing data to the playing device;

the playing device is used for playing the target playing data according to the received playing control parameter;

the sound acquisition component is used for carrying out audio acquisition on the target playing data played by the playing equipment to obtain an audio signal corresponding to the target playing data and transmitting the audio signal to the sound card;

the sound card is used for converting the received audio signal into a digital signal and transmitting the digital signal to the digital signal processor;

the digital signal processor is used for carrying out noise reduction processing and/or echo cancellation processing on the received digital signal to obtain a digital signal to be detected and transmitting the digital signal to be detected to the audio detection module;

the audio detection module is used for displaying the received digital signal to be detected so as to determine whether the user pronunciation simulation system based on the vehicle is normal or not according to the displayed digital signal to be detected.

In a fourth aspect, an embodiment of the present invention further provides a vehicle-based user pronunciation simulation method, where the method includes:

acquiring target playing data from a pre-stored voice data set through a processor; determining playing control parameters matched with playing equipment in the vehicle according to the vehicle environment sound and/or the semantic information of the target playing data; sending the playing control parameters and the target playing data to the playing device;

playing the target playing data through the playing device according to the received playing control parameter;

audio acquisition is carried out on the target playing data played by the playing equipment through a sound acquisition component to obtain an audio signal corresponding to the target playing data, and the audio signal is transmitted to a sound card;

converting the received audio signal into a digital signal through the sound card, and transmitting the digital signal to the digital signal processor;

carrying out noise reduction processing and/or echo cancellation processing on the received digital signal through the digital signal processor to obtain a digital signal to be detected, and transmitting the digital signal to be detected to an audio detection module;

and displaying the received digital signal to be detected through the audio detection module so as to determine whether the vehicle-based user pronunciation analog system is normal or not according to the displayed digital signal to be detected.

In a fifth aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:

one or more processors;

the playing device is used for playing the set playing data according to the set playing control parameters;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a vehicle-based user pronunciation simulation method according to any embodiment of the invention.

In a sixth aspect, embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a vehicle-based user pronunciation simulation method according to any of the embodiments of the present invention.

According to the technical scheme of the embodiment of the invention, target playing data is obtained from pre-stored voice data in a centralized manner; determining playing control parameters matched with playing equipment in the vehicle according to the vehicle environment sound and/or the semantic information of the target playing data; the target playing data is played by controlling the playing equipment according to the playing control parameters, so that the problems of high cost, low efficiency and difficult acquisition due to the influence of acquisition conditions when a speaker gets on the vehicle for voice acquisition in the prior art are solved, the target playing data is played by the playing equipment to replace the mode of voice acquisition by the speaker getting on the vehicle in the prior art, the voice simulation cost can be reduced, the implementation mode is simple, and the voice simulation effect is high; the playing control parameters of the playing device are determined through the vehicle environment sound and/or semantic information, so that the playing quality of the playing device can achieve the effect of the same quality as that of voice acquisition of a speaker getting on the vehicle.

Drawings

FIG. 1 is a flow chart of a vehicle-based user pronunciation simulation method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a vehicle-based user pronunciation simulation method according to a second embodiment of the present invention;

FIG. 3 is a flowchart of a vehicle-based user pronunciation simulation method according to a third embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a vehicle-based user pronunciation simulation apparatus according to a fourth embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a vehicle-based user pronunciation simulation system according to a fifth embodiment of the present invention;

FIG. 6 is a flowchart of a vehicle-based user pronunciation simulation method according to a sixth embodiment of the present invention;

fig. 7 is a schematic structural diagram of a computer device according to a seventh embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a vehicle-based user pronunciation simulation method according to an embodiment of the present invention, where the embodiment is applicable to a situation where a voice is played in a vehicle and voice collection is performed to detect whether a voice collection device of the vehicle is normal, the method may be executed by a vehicle-based user pronunciation simulation apparatus, the apparatus may be implemented by software and/or hardware, and the apparatus may be integrated in a controller of the vehicle, as shown in fig. 1, and the method specifically includes:

step 110, obtaining target playing data from a pre-stored voice data set.

In order to ensure high quality of each audio in the voice data set and avoid distortion of the audio, in the embodiment of the invention, the voice data set is a set formed by high-fidelity voice data recorded in advance in a recording studio by at least one speaker.

The speaker is a person who plays the specified voice data to form audio. The recording studio can ensure that the speaker has no noise when broadcasting to form audio, and can ensure the high quality of the audio. High fidelity voice data refers to voice data generated by a device or carrier that is capable of perfectly reproducing the original sound. When the voice frequency of the speaker is collected to form a voice data set, a sampling frequency of 48 kilohertz can be adopted, and the voice data can be ensured to be close to 0 distortion.

The voice data may be voice data corresponding to functional instructions occurring within the vehicle, such as "turn on air conditioner", "query weather" or "turn on navigation to a location" etc.; alternatively, the voice data may also be voice data corresponding to non-functional instructions occurring within the vehicle, e.g., "who are your parents? "," where your home? "or" do you have an object? "and the like. Accurate collection of voice data facilitates accurate realization of functional instructions of a user or can avoid accidental safety of the user in a vehicle. The voice data set made up of voice data may be stored in advance in the memory of the vehicle. The same or different sets of voice data may be stored for different models of vehicles. Vehicles of various models share the same voice data set, so that multiple times of voice data recording can be reduced, the cost of recording voice data by a speaker can be reduced, and the efficiency of recording voice data can be improved.

The target playing data is any one or more voice data in the voice data set, namely, the voice data pre-recorded by the speaker can be used as the target playing data. The target playing data can be obtained by randomly selecting the voice data in the voice data set as the target playing data; alternatively, the voice data in the voice data set may be selected as the target playing data according to a preset sequence. The preset sequence may be a storage sequence of the voice data, or a sequence specified by a manager, and the like, and the embodiment of the present invention is not particularly limited.

And step 120, determining playing control parameters matched with playing equipment in the vehicle according to the vehicle environment sound and/or the semantic information of the target playing data.

Wherein, the position of the playing device in the vehicle is matched with the position of the real human mouth. The playing device is an audio playing device, and is a device for playing voice data by simulating human mouth, such as artificial mouth, hi-fi sound or loudspeaker. The real human mouth is the mouth when the speaker gets on the car for voice broadcasting.

In the embodiment of the invention, the optional playing device is an artificial mouth, the artificial mouth has a better playing effect than a high fidelity sound box, and the artificial mouth is a special loudspeaker and can make a sound which is closer to the sound made by a real person. In order to ensure that the playing device is close to the effect of the real person, the position of the playing device in the vehicle is matched with the position of the real person's mouth, and the playing device can be arranged at a position close to the position of the real person's mouth. For example, the position of the mouth when the speaker gets on the vehicle for voice announcement can be determined, and the placing position of the playing device is kept consistent with the position of the mouth.

When playing the voice data, the playing device may adopt different playing control parameters, such as playing volume, playing sound source direction, playing speed, and the like. In order to ensure that the playing device has the same effect as the speaker, the playing speed may be the speed of the voice data itself in this embodiment, and no speed change processing is performed. The playing volume and the playing sound source direction can be determined according to the vehicle environment sound and the semantic information of the target playing data respectively.

The vehicle environment sound may be a noise volume inside the vehicle, for example, a decibel value of noise inside the vehicle. The vehicle environment sound can be obtained by a controller or a processor of the vehicle receiving the noise volume collected by the microphone. The controller or processor of the vehicle may determine the playback control parameter of the playback device according to the vehicle environment sound, for example, may determine the playback volume of the playback device according to the vehicle environment sound. The determined playing volume of the playing device is larger when the vehicle environment sound is larger; the smaller the vehicle environment sound is, the smaller the determined playing volume of the playing device is.

Further, in order to accurately control the playback volume of the playback device, a mapping table between the vehicle environment sound and the playback volume of the playback device may be stored in the memory of the vehicle in advance. Illustratively, table 1 is a mapping table between vehicle ambient sounds and playback volume of the playback device. As can be seen from table 1, the playback volume of different playback devices can be determined for different vehicle environmental sounds. The mapping table shown in table 1 may be determined according to the broadcasting habit of the speaker in different vehicle environment sounds in the actual environment. The broadcast volume that can guarantee playback devices keeps unanimous with pronunciation people at the report volume under the same situation, can guarantee that the sound that playback devices sent is close real person's pronunciation, avoids the distortion of pronunciation collection.

TABLE 1

The semantic information of the target playing data may include content classification of the target playing data, for example, "turn on air conditioner" or "turn up temperature" and the like belong to the air conditioner class; "inquire weather" or "dressing index" etc. belong to the weather category; and "navigate to B restaurant" belongs to the navigation category, etc. Target playback data classified for different contents may have different playback sound source directions when played by a real person. In order to ensure that the playing device is close to the effect of a real person, the playing sound source direction of the playing device can be determined according to the semantic information of the target playing data.

Illustratively, table 2 is a mapping table of semantic information and playback sound source direction. The semantic information of the voice data in the pre-stored voice data set can be manually determined, that is, the voice data set can include at least one piece of voice data and semantic information corresponding to each piece of voice data. The mapping table of semantic information and playback sound source direction may be stored in advance in the memory of the vehicle. The processor or controller of the vehicle can determine the playing sound source direction when the playing device plays the target playing data according to the mapping relation between the semantic information and the playing sound source direction in the table 2, so that the effect of the playing device is close to that of a real person.

It should be noted that, in table 2, the same semantic information may correspond to multiple playback sound source directions, and for multiple pieces of voice data belonging to the same semantic information, the playback sound source directions may be determined according to the proportion of the corresponding playback sound source directions. For example, the number of air conditioners is 1000, and any 500 air conditioners can be determined to play in the front-down direction according to the proportion, and the remaining 500 air conditioners can be determined to play in the front. The specific dividing manner and the specific playing order of the voice data are not limited in the embodiments of the present invention.

TABLE 2

And step 130, controlling the playing device to play the target playing data according to the playing control parameters.

The controller or processor of the vehicle may pass the playback control parameter to the playback device, and control the playback device to play the target playback data according to the playback control parameter, for example, play the target playback data according to the determined playback volume, playback sound source direction, and playback speed. The control of the playing volume of the playing device can be realized according to the current volume adjusting technology; the adjustment of the playing sound source direction can be realized by the brain bag rotation technology of the robot.

According to the technical scheme of the embodiment, target playing data are obtained from pre-stored voice data set; determining playing control parameters matched with playing equipment in the vehicle according to the vehicle environment sound and/or the semantic information of the target playing data; the target playing data is played by controlling the playing equipment according to the playing control parameters, so that the problems of high cost, low efficiency and difficult acquisition due to the influence of acquisition conditions when a speaker gets on the vehicle for voice acquisition in the prior art are solved, the target playing data is played by the playing equipment to replace the mode of voice acquisition by the speaker getting on the vehicle in the prior art, the voice simulation cost can be reduced, the implementation mode is simple, and the voice simulation effect is high; the playing control parameters of the playing device are determined through the vehicle environment sound and/or semantic information, so that the playing quality of the playing device can achieve the effect of the same quality as that of voice acquisition of a speaker getting on the vehicle.

Example two

Fig. 2 is a flowchart of a vehicle-based user pronunciation simulation method according to a second embodiment of the present invention. The present embodiment is a further refinement of the above technical solutions, and the technical solutions in the present embodiment may be combined with various alternatives in one or more of the above embodiments. As shown in fig. 2, the method includes:

step 210, obtaining target playing data from a pre-stored voice data set.

The voice data set is a set formed by high-fidelity voice data recorded in advance by at least one speaker in a recording studio.

And step 220, acquiring a noise decibel value of the environmental sound of the vehicle, and determining the playing volume matched with the playing equipment in the vehicle according to the noise decibel value.

In order to obtain the noise decibel value of the environmental sound of the vehicle accurately, a sound pressure detection device, such as a sound pressure meter, can be installed at the microphone in the vehicle. And acquiring a noise decibel value of the environmental sound of the vehicle through a sound pressure meter arranged at a microphone, and transmitting the noise decibel value to a processor or a controller of the vehicle. The sound pressure meter at the microphone is used for detecting the volume of sound collected at the microphone. When no voice data is played, the sound pressure meter at the microphone collects the noise decibel value of the environmental sound of the vehicle. The noise decibel value of the environmental sound of the vehicle can be obtained before the voice data are played.

The processor or the controller of the vehicle can transmit the received noise decibel value of the environmental sound of the vehicle to the playing device, and the playing device is controlled to automatically adjust to the matched playing volume. The mapping relationship between the noise decibel value of the vehicle environment sound and the playing volume of the playing device can be preset and stored in a memory of the vehicle. The processor or the controller of the vehicle can directly call the mapping relation from the memory, and the playing volume of the corresponding playing device is determined according to the noise decibel value of the environmental sound of the vehicle, so that the playing device is controlled to play according to the playing volume. Specifically, the mapping relationship between the decibel value of the noise of the environmental sound of the vehicle and the playback volume of the playback device may be as shown in table 1.

In order to ensure that the actual playing volume of the playing device matches the decibel value of the noise of the environmental sound of the vehicle, in an optional implementation manner of this embodiment, determining the playing volume matching the playing device in the vehicle according to the decibel value includes: adjusting the playing volume of playing equipment in the vehicle according to the noise decibel value; controlling the playing equipment to play the set audio according to the playing volume and acquiring a playing decibel value matched with the set audio; if the playing decibel value is matched with the noise decibel value, determining that the current playing volume of the playing equipment corresponds to the noise decibel value; and if the playing decibel value is not matched with the noise decibel value, the playing volume of the playing equipment is readjusted, and the operation of controlling the playing equipment to play the set audio according to the playing volume is returned to be executed.

The processor or the controller of the vehicle can determine a noise decibel value according to a sound pressure meter installed at the microphone, determine the playing volume of the matched playing device according to the mapping relation between the noise decibel value and the playing volume, and adjust the playing volume of the playing device. The matching of the playing volume and the noise decibel value means that the two have a mapping relation. Then, the playing device can be controlled to play the set audio at the matched playing volume. The set audio may be a preset audio for testing, or may be any voice data in the voice data set, which is not specifically limited in this embodiment of the present invention.

The playing device may be provided with a sound pressure device, such as a sound pressure meter. The playback decibel value of the playback device during playback of the set audio can be detected by the sound pressure meter at the playback device. If the playing decibel value of the playing equipment is matched with the noise decibel value, the current playing volume of the playing equipment can meet the expected requirement, and the playing equipment can continue to use the playing volume as the playing volume under the current environment condition of the vehicle without adjustment. If the playing decibel value of the playing equipment is not matched with the noise decibel value, the matched playing decibel value can be determined according to the noise decibel value. The playing decibel value is matched with the noise decibel value, namely the playing equipment sets the current playing decibel value when the audio is played according to the playing volume to meet the requirement of the current noise decibel value on the playing decibel value. And determining whether the playing decibel value of the current playing equipment is larger or smaller according to the matched playing decibel value. And adjusting the playing volume of the playing equipment according to the determined result of the larger or smaller volume. For example, when the determination result is larger, the playing volume of the playing device may be decreased; or, when the determination result is smaller, the playing volume of the playing device may be increased. And after the playing volume is determined again by the playing equipment, the set audio can be continuously played, and the playing decibel value is determined until the playing decibel value when the playing equipment plays the set audio by adopting the playing volume is matched with the noise decibel value.

The mapping relationship between the playing decibel value of the set audio played by the playing device and the noise decibel value can be obtained in advance through experimental means. Table 3 is a mapping table of playback decibel values and noise decibel values. As shown in table 3, when the sound pressure meter at the microphone detects a decibel value of noise, the playback volume of the playback device can be determined in advance according to the mapping table of table 3. And detecting a playing decibel value when the playing equipment plays the set audio according to the predetermined playing volume by using a sound pressure meter at the playing equipment, determining whether the playing volume is matched with the noise decibel value, and further determining whether the playing volume of the playing equipment is adjusted.

In fact, when the playing device plays the set audio at the predetermined playing volume, the playing decibel value matching the noise decibel value can be achieved. However, in some cases, the playback device may have a certain fault, so that when the playback device plays the set audio with the predetermined playback volume, the playback decibel value matching the noise decibel value cannot be achieved, and the playback volume of the playback device needs to be adjusted, for example, the playback volume has a deviation due to long-term use of the playback device.

TABLE 3

For example, as shown in table 3, in the first scenario, when the vehicle is parked, the sound pressure meter at the microphone detects that the noise decibel value of the environmental sound of the vehicle is within the interval 53-58 decibels. At the moment, the decibel value of the normal speaking of the real person is between 72 decibels and 77 decibels, and the microphone can collect clear audio. Therefore, when the noise decibel value of the vehicle environment sound under the first scene is within the interval 53-58 decibels, the playing decibel value of the matched playing device can be set to be within the interval 72-77 decibels. According to the playing decibel value being 72-77 decibels, the playing volume of the playing device can be predetermined to be 3 rd gear. The playing device can be controlled to play the set audio according to the playing volume of the 3 rd gear. The sound pressure meter at the playing device can detect whether the current playing decibel value of the playing device is within the interval 72-77 decibels.

And if the current playing decibel value of the playing equipment is within the interval of 72-77 decibels, determining that the current playing volume of the playing equipment is matched with the noise decibel value. In the first scenario, the playback device may play the voice data in the voice data set according to the playback volume of the 3 rd gear. If the current playing decibel value of the playing device is not within the interval 72-77 decibels, it can be determined that the current playing decibel value of the playing device is greater than 77 decibels, or less than 72 decibels.

If the current playing decibel value of the playing device is greater than 77 decibels, the current playing volume of the playing device is reduced, for example, playing is performed by adopting the 2 nd gear. If the current playing decibel value of the playing device is less than 72 decibels, the current playing volume of the playing device is increased, for example, the 4 th gear is used for playing. The playing decibel value can be determined by the sound pressure meter at the playing equipment according to the adjusted playing volume. And if the playing decibel value is within the interval of 72-77 decibels, determining that the adjusted playing volume is matched with the noise decibel value. If the playing decibel value is not within the interval 72-77 decibels, the playing volume of the playing device can be continuously adjusted until the playing decibel value detected by the sound pressure meter at the playing device is within the interval 72-77 decibels.

And step 230, controlling the playing device to play the target playing data according to the playing volume.

The controller or the processor of the vehicle can control the playing device to play target playing data according to the playing volume matched with the noise decibel value of the vehicle environment sound, the same effect as that of broadcasting by a real person can be achieved, the fact that the real person gets on the vehicle to play voice and collect voice can be avoided, the cost is reduced, and the efficiency is improved.

According to the technical scheme of the embodiment of the invention, the noise decibel value of the environmental sound of the vehicle is determined through the sound pressure device, and the playing volume matched with playing equipment in the vehicle is determined according to the noise decibel value; control playback devices plays target broadcast data according to the broadcast volume that matches, it is with high costs when carrying out pronunciation collection to have solved among the prior art that the pronunciation person gets on the bus, inefficiency and pronunciation collection receive the collection condition influence and gather the problem of difficulty, confirm accurate broadcast volume for playback devices, play through playback devices according to broadcast volume carry out speech data reaches and carries out the same effect of pronunciation collection with the pronunciation person gets on the bus, can also reach simultaneously and reduce pronunciation analog cost, improve the effect of pronunciation analog efficiency.

EXAMPLE III

Fig. 3 is a flowchart of a vehicle-based user pronunciation simulation method according to a third embodiment of the present invention. The present embodiment is a further refinement of the above technical solutions, and the technical solutions in the present embodiment may be combined with various alternatives in one or more of the above embodiments. As shown in fig. 3, the method includes:

step 310, obtaining target playing data from the pre-stored voice data set.

And step 320, obtaining semantic information of the target playing data, and determining a playing sound source direction matched with playing equipment in the vehicle according to the semantic information.

The semantic information may be acquired in various manners, for example, the semantic information may be set manually in advance; alternatively, it may be realized by a semantic analysis technique. The chip may be provided in a processor or controller of the vehicle, or in the playback device, and the chip may be provided with a program related to semantic analysis. The speech information of the speech data in the speech data set can be determined in real time by the semantic analysis program, for example, the content classification of the speech data is determined. When the playing device acquires the target playing data, the semantic information of the target playing data can be synchronously determined according to the semantic analysis technology.

The matching relationship between the semantic information and the direction of the playing sound source can be implemented in various ways, for example, the matching relationship can be determined in advance by human beings, or the matching relationship can also be determined in real time by a processor or a controller of the vehicle according to the semantic information.

For example, when the semantic information of the target playing data is the air conditioner class, the processor or the controller of the vehicle may determine in real time that the playing sound source direction of the playing device is towards the front or is inclined downwards towards the front according to the air conditioner class. For a plurality of target playing data with semantic information being air conditioners, a processor or a controller of the vehicle can control the playing sound source direction of the playing equipment to be switched back and forth between the front and the lower front.

In order to make the direction of the played sound source determined according to the semantic information more representative of the reality of the user in the user pronunciation simulation, in an optional implementation manner of this embodiment, the semantic information at least includes one of the following items: air-conditioning, canon, weather, smart home, or navigation; determining a playing sound source direction matched with a playing device in a vehicle according to semantic information, comprising: if the semantic information is of an air conditioner type, determining that the direction of a playing sound source matched with playing equipment in the vehicle is lower than the front or in front; if the semantic information is of a canon type, determining that a playing sound source direction matched with playing equipment in the vehicle is a vehicle terminal control direction or a front direction; if the semantic information is weather, determining that the direction of a playing sound source matched with playing equipment in the vehicle is out of a window, right ahead or the control direction of a vehicle terminal; if the semantic information is of an intelligent home class, determining that a playing sound source direction matched with playing equipment in a vehicle is a front direction or a vehicle terminal control direction; and if the semantic information is of a navigation type, determining that the playing sound source direction matched with the playing equipment in the vehicle is the front direction or the vehicle terminal control direction.

For the same semantic information, there may be multiple pieces of corresponding voice data, and the playing sound source directions of the voice data may be the same or different. For the same semantic information, there may be one or more play sound source directions. In order to determine the specific playing sound source direction of the voice data, the proportion can be set for the playing sound source direction, so that the real playing situation can be reflected better. Specifically, the mapping relationship between the semantic information and the direction of the playback sound source can be shown in table 2. When determining the semantic information of the target playing data, the playing sound source direction of the target playing data can be determined according to the playing sound source direction corresponding to the semantic information and the corresponding proportion.

The semantic information may represent meaning classification of the voice data, for example, the air conditioner class means that the voice data is data related to an air conditioner in a vehicle, such as adjusting the temperature of the vehicle, turning on or off the air conditioner, and the like; the canon class refers to that voice data is data generated during chatting and does not have a function of starting a vehicle, such as discussion about family or working conditions; the weather class refers to that voice data is data related to weather, such as inquiring weather; the smart home type means that voice data is data of other home devices connected with a vehicle through vehicle control, such as turning on an air conditioner or a washing machine at home; the navigation class refers to that voice data is data related to navigation, such as navigation to a place or opening of navigation software.

Illustratively, the semantic information of the target playing data is of a weather type, and the processor or the controller of the vehicle can control the playing device to switch between the off-window direction, the front direction and the vehicle terminal control direction, wherein the switching ratio is 5:3: 2. For example, the memory of the vehicle may record the playback sound source direction of the target playback data whose semantic information is the weather class. The data may be played with 10 weather-like objects as a group. If the playing sound source direction of the target playing data of the weather class which does not exceed 4 pieces already exists in the same group is out of the window, the playing sound source direction of the target playing data in the group can still be out of the window. If there are 5 pieces of target play data with the play sound source direction outside the window already in the same group, the play sound source direction of the target play data may be the front or the vehicle terminal control. If the playing sound source direction of the target playing data of the weather class which does not exceed 2 pieces already exists in the same group is the front direction, the playing sound source direction of the target playing data in the group can still be the front direction. If there are already 3 pieces of target play data in the same group, whose play sound source direction is right ahead, the play sound source direction of the target play data can be controlled by the vehicle terminal.

And step 330, controlling the playing device to play the target playing data according to the playing sound source direction.

The processor or controller of the vehicle may control the playback device to play the target playback data according to the playback sound source direction matched with the semantic information of the target playback data. The direction of the playing sound source of the playing equipment can be consistent with the direction of the real person during playing.

According to the technical scheme of the embodiment of the invention, semantic information of target playing data is obtained, and a playing sound source direction matched with playing equipment in a vehicle is determined according to the semantic information; the control playing equipment plays the target playing data according to the playing sound source direction, the problems that in the prior art, when a speaker gets on the vehicle to perform voice acquisition, the cost is high, the efficiency is low, and the voice acquisition is influenced by acquisition conditions and difficult to acquire are solved.

On the basis of the foregoing embodiment, optionally, before controlling the playback device to play the target playback data according to the playback control parameter, the method further includes: detecting the bumping state of the vehicle, and determining the playing frequency matched with playing equipment in the vehicle according to the bumping state; controlling the playing device to play the target playing data according to the playing control parameter, comprising: and controlling the playing equipment to play the target playing data according to the playing frequency.

The vehicle state of pitching refers to the state of pitching of the vehicle during running, such as whether or not the vehicle is pitching or the degree of pitching. The determination of the bumping state may be performed in various manners, such as detection by a detection instrument or experimental test. Wherein, the detecting instrument can be a car bump tester. The experimental test method can be that the vehicle transports the product to be tested, the product to be tested can be arranged on the tray, and the bumping state of the vehicle is determined by monitoring the state of the product to be tested in real time.

There may also be a mapping between the state of vehicle bump and the playback frequency of the playback device, e.g., the playback frequency is higher as the vehicle bumps. For example, the vehicle bump state can be divided into different bump levels, such as slight bump, and the playing frequency is 20 Hz to 200 Hz; moderate jerkiness, the playing frequency is 200 Hz to 2000 Hz; the playing frequency is 2000 Hz to 20000 Hz. For another example, the vibration frequency of the vehicle may be measured by an instrument as the pitching state of the vehicle, and the playing frequency may be determined according to the vibration frequency, for example, the pitching state of the vehicle is that the vibration frequency is lower than 1 hz, and the playing frequency is 20 hz to 200 hz; the vibration frequency is 1 Hz to 10 Hz, and the playing frequency is 200 Hz to 2000 Hz; the vibration frequency is more than 10 Hz, and the playing frequency is 2000 Hz to 20000 Hz.

The current playing frequency of the playing device in the current bumping state can be determined according to the mapping relation between the bumping state and the playing frequency; the playing device can play the target playing data under the current playing frequency, so that the playing of the playing device can be ensured to be clear under each jolting state of the vehicle, and the effect of real people broadcasting is achieved.

Example four

Fig. 4 is a schematic structural diagram of a vehicle-based user pronunciation simulation apparatus according to a fourth embodiment of the present invention. With reference to fig. 4, the apparatus comprises: a target playing data obtaining module 410, a playing control parameter determining module 420 and a playing control module 430.

The target playing data obtaining module 410 is configured to obtain target playing data from a pre-stored voice data set;

a playing control parameter determining module 420, configured to determine, according to the vehicle environmental sound and/or the semantic information of the target playing data, a playing control parameter matching a playing device in the vehicle, where a position of the playing device in the vehicle matches a position of a real human mouth;

and a playing control module 430, configured to control the playing device to play the target playing data according to the playing control parameter.

Optionally, the playing control parameter determining module 420 includes:

and the playing volume determining unit is used for acquiring a noise decibel value of the environmental sound of the vehicle and determining the playing volume matched with the playing equipment in the vehicle according to the noise decibel value.

Optionally, the play control module 430 includes:

and the playing volume control unit is used for controlling the playing equipment to play the target playing data according to the playing volume.

Optionally, the playing volume determining unit is specifically configured to:

adjusting the playing volume of playing equipment in the vehicle according to the noise decibel value;

controlling the playing equipment to play the set audio according to the playing volume and acquiring a playing decibel value matched with the set audio;

if the playing decibel value is matched with the noise decibel value, determining that the current playing volume of the playing equipment corresponds to the noise decibel value;

and if the playing decibel value is not matched with the noise decibel value, the playing volume of the playing equipment is readjusted, and the operation of controlling the playing equipment to play the set audio according to the playing volume is returned to be executed.

Optionally, the playing control parameter determining module 420 includes:

and the playing sound source direction determining unit is used for acquiring the semantic information of the target playing data and determining the playing sound source direction matched with the playing equipment in the vehicle according to the semantic information.

Optionally, the play control module 430 includes:

and the playing sound source direction control unit is used for controlling the playing equipment to play the target playing data according to the playing sound source direction.

Optionally, the semantic information at least includes one of the following items: air-conditioning, canon, weather, smart home, or navigation;

a play sound source direction determining unit, specifically configured to:

if the semantic information is of an air conditioner type, determining that the direction of a playing sound source matched with playing equipment in the vehicle is lower than the front or is in the front;

if the semantic information is of a canon type, determining that the playing sound source direction matched with the playing equipment in the vehicle is the vehicle terminal control direction or the right front;

if the semantic information is weather, determining that the direction of a playing sound source matched with playing equipment in the vehicle is out of a window, right ahead or the control direction of a vehicle terminal;

if the semantic information is of the intelligent home class, determining that the playing sound source direction matched with playing equipment in the vehicle is the right front direction or the vehicle terminal control direction;

and if the semantic information is of navigation type, determining that the playing sound source direction matched with the playing equipment in the vehicle is the front direction or the vehicle terminal control direction.

Optionally, the voice data set is a set of high fidelity voice data pre-recorded by at least one speaker at a recording studio.

Optionally, the apparatus further comprises:

the playing frequency determining module is used for detecting the bumping state of the vehicle before controlling the playing equipment to play the target playing data according to the playing control parameters and determining the playing frequency matched with the playing equipment in the vehicle according to the bumping state;

the play control module 430 includes:

and the playing frequency control unit is used for controlling the playing equipment to play the target playing data according to the playing frequency.

Optionally, the playing device is a manual mouth.

The vehicle-based user pronunciation simulation device provided by the embodiment of the invention can execute the vehicle-based user pronunciation simulation method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

EXAMPLE five

Fig. 5 is a schematic structural diagram of a vehicle-based user pronunciation simulation system according to a fifth embodiment of the present invention. In connection with fig. 5, the system comprises: a processor (or controller) 510, a playback device 520, a sound collection component 530, a sound card 540, a digital signal processor 550, and an audio detection module 560.

The playing device 520, the sound collection component 530, the sound card 540, the digital signal processor 550 and the audio detection module 560 are all electrically connected with the processor 510; the playing device 520, the sound collection component 530, the sound card 540, the digital signal processor 550 and the audio detection module 560 are electrically connected in sequence; the position of the playback device 520 in the vehicle matches the position of the real person's mouth.

A processor 510, configured to obtain target play data from a pre-stored voice data set; determining playing control parameters matched with playing equipment in the vehicle according to the vehicle environment sound and/or the semantic information of the target playing data; and sending the playing control parameters and the target playing data to the playing equipment.

Optionally, determining a playing control parameter matched with the playing device 520 in the vehicle according to the vehicle environment sound and/or the semantic information of the target playing data includes:

acquiring a noise decibel value of the environmental sound of the vehicle, and determining the playing volume matched with the playing equipment 520 in the vehicle according to the noise decibel value;

sending the playing control parameters and the target playing data to the playing device 520 includes:

the playback volume and the target playback data are sent to the playback device 520.

Optionally, determining the playing volume matched with the playing device 520 in the vehicle according to the decibel value of the noise includes:

adjusting the playing volume of the playing device 520 in the vehicle according to the decibel value of the noise;

controlling the playing device 520 to play the set audio according to the playing volume and obtaining a playing decibel value matched with the set audio;

if the playing decibel value matches the noise decibel value, determining that the current playing volume of the playing device 520 corresponds to the noise decibel value;

if the playback decibel value is not matched with the noise decibel value, the playback volume of the playback device 520 is readjusted, and the operation of controlling the playback device 520 to play the set audio according to the playback volume is executed.

semantic information of target playing data is obtained, and a playing sound source direction matched with playing equipment 520 in the vehicle is determined according to the semantic information;

the playback sound source direction and the target playback data are sent to the playback device 520.

determining a playing sound source direction matching with the playing device 520 in the vehicle according to the semantic information includes:

The playing device 520 is configured to play the target playing data according to the received playing control parameter.

Optionally, the playing device 520 is specifically configured to: and playing the target playing data according to the received playing volume and/or playing sound source direction.

Optionally, the processor 510 is further configured to detect a bumping state of the vehicle, and determine a playing frequency matched with a playing device in the vehicle according to the bumping state; sending the playing frequency to the playing equipment;

the playing device 520 is further specifically configured to: and playing the target playing data according to the received playing frequency.

Optionally, the playback device 520 is a mouthpiece.

The sound collection component 530 is configured to collect audio from target playing data played by the playing device, obtain an audio signal corresponding to the target playing data, and transmit the audio signal to the sound card 540.

The sound collection component 530 may be a microphone or a microphone array, etc. The sound collection component 530 may be installed in a vehicle terminal control accessory in a vehicle, facilitating collection of voice data played by a real person or a playback device.

The sound card 540 is configured to convert the received audio signal into a digital signal and transmit the digital signal to the digital signal processor 550.

The digital signal processor 550 is configured to perform noise reduction processing and/or echo cancellation processing on the received digital signal to obtain a digital signal to be detected, and transmit the digital signal to be detected to the audio detection module 560.

The audio signal collected by the sound collection component 530 includes various noise signals, such as wind noise, tire noise, air conditioning noise, nearby vehicle noise, and external environment noise with different frequencies and intensities. The digital signal processor 550 can weaken noise to a certain extent, reduce noise interference, make the signal-to-noise ratio of the digital information to be detected higher, and achieve the effect that the sound of the target playing data broadcasted by the playing device is easier to distinguish and clearer.

And the audio detection module 560 is configured to display the received digital signal to be detected, so as to determine whether the vehicle-based user pronunciation simulation system is normal according to the displayed digital signal to be detected.

The audio detection module 560 may include a display, which may be a separate module, a display installed in the vehicle, or a display of a computer. The audio waveform and spectrogram of the digital signal to be detected can be displayed by the audio detection module 560. The listener of the voice collection can judge whether the digital signal to be detected is normal through the displayed audio waveform diagram, the displayed spectrogram and the like, for example, whether the amplitude interception condition exists, whether the audio sampling rate is correct, whether signal interference of special frequency exists or not, and the like. And according to the result of determining whether the digital signal to be detected is normal, determining whether the user pronunciation simulation system based on the vehicle is normal in real time.

For example, when voice collection is performed inside a vehicle, the environment is relatively complex, and for example, voice collection is caused to be problematic due to an accident caused by vehicle bump generated by a speed bump. Or the speech acquisition is interrupted due to the fact that wiring among all devices in the vehicle-based user pronunciation simulation system is disconnected caused by the fact that an engine of the vehicle is halted and the vehicle jolts. And for another example, the battery in the new energy automobile generates stable interference of a certain frequency and the like.

In the embodiment of the invention, the target playing data played by the playing device can be subjected to real-time voice acquisition and real-time detection. Real-time sampling, namely detection, can avoid the problems that batch processing leads to voice collection cannot be found in time, and a user pronunciation simulation system based on a vehicle cannot be adjusted in time, so that voice signals need to be repeatedly collected, and the efficiency is low.

According to the technical scheme of the embodiment of the invention, a user pronunciation simulation system based on a vehicle is adopted, and target playing data in a high-fidelity voice data set is obtained through a processor; determining playing control parameters matched with playing equipment in the vehicle according to the vehicle environment sound and/or the semantic information of the target playing data; playing the target playing data according to the playing control parameters through the playing equipment; acquiring target playing data played by playing equipment through a sound acquisition component to obtain an audio signal; converting the audio signal into a digital signal through a sound card; carrying out noise reduction processing and/or echo cancellation processing on the digital signal through a digital signal processor to obtain a digital signal to be detected; detect digital signal through audio frequency detection module demonstration, whether it is normal to confirm user pronunciation analog system based on the vehicle according to detecting digital signal, it is normal when having solved prior art real person's on-board report voice data and gathering and then detecting voice acquisition system, high cost, the problem of inefficiency and sampling difficulty, it gets on the bus to have realized replacing real person through playback equipment, reach and report the same effect with real person's on-board, save the cost simultaneously, the sampling is high-efficient, the sampling scene can not be restricted by real person and abundanter, the voice data set can used repeatedly many times, the report that avoids real person to lead to when working for a long time is unclear scheduling problem because of tired, can carry out audio frequency detection in real time, in time discover audio signal's problem, and then confirm whether voice sampling equipment exists the problem.

EXAMPLE six

Fig. 6 is a flowchart of a vehicle-based user pronunciation simulation method according to a sixth embodiment of the present invention. In conjunction with fig. 6, the method includes:

step 610, acquiring target playing data from a pre-stored voice data set through a processor; determining playing control parameters matched with playing equipment in the vehicle according to the vehicle environment sound and/or the semantic information of the target playing data; and sending the playing control parameters and the target playing data to the playing equipment.

Optionally, determining a playing control parameter matched with a playing device in the vehicle according to the vehicle environment sound and/or the semantic information of the target playing data, including:

acquiring a noise decibel value of the environmental sound of the vehicle, and determining the playing volume matched with playing equipment in the vehicle according to the noise decibel value;

sending the playing control parameters and the target playing data to the playing device, including:

and sending the playing volume and the target playing data to the playing equipment.

Optionally, determining a playing volume matched with a playing device in the vehicle according to the decibel value of the noise includes:

semantic information of target playing data is obtained, and a playing sound source direction matched with playing equipment in a vehicle is determined according to the semantic information;

and sending the playing sound source direction and the target playing data to the playing equipment.

Optionally, determining a direction of a playback sound source matched with a playback device in a vehicle according to the semantic information includes:

and acquiring a mapping relation between preset semantic information and a playing sound source direction, and determining the playing sound source direction matched with the semantic information according to the mapping relation.

And step 620, playing the target playing data through the playing device according to the received playing control parameters.

Optionally, playing the target playing data by the playing device according to the received playing control parameter, including: and playing the target playing data according to the received playing volume and/or playing sound source direction.

Step 630, audio acquisition is performed on the target playing data played by the playing device through the sound acquisition component, so as to obtain an audio signal corresponding to the target playing data, and the audio signal is transmitted to the sound card.

Step 640, converting the received audio signal into a digital signal through the sound card, and transmitting the digital signal to the digital signal processor.

And 650, performing noise reduction processing and/or echo cancellation processing on the received digital signal through the digital signal processor to obtain a digital signal to be detected, and transmitting the digital signal to be detected to the audio detection module.

And 660, displaying the received digital signal to be detected through the audio detection module so as to determine whether the user pronunciation simulation system based on the vehicle is normal or not according to the displayed digital signal to be detected.

The vehicle-based user pronunciation simulation method provided by the embodiment of the invention is an execution method corresponding to the vehicle-based user pronunciation simulation system, and has the same or similar technical characteristics and beneficial effects as the vehicle-based user pronunciation simulation system.

EXAMPLE seven

Fig. 7 is a schematic structural diagram of a computer device according to a seventh embodiment of the present invention, and as shown in fig. 7, the computer device includes:

one or more processors 510, one processor 510 being illustrated in FIG. 7;

a playing device 520, configured to play the set playing data according to the set playing control parameter;

a memory 720;

the apparatus may further include: an input device 730 and an output device 740.

The processor 510, the memory 720, the input device 730 and the output device 740 of the apparatus may be connected by a bus or other means, for example, in fig. 7.

The memory 720, which is a non-transitory computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a method for simulating pronunciation of a vehicle-based user according to an embodiment of the present invention (for example, the target play data acquiring module 410, the play control parameter determining module 420, and the play control module 430 shown in fig. 4). The processor 510 executes various functional applications and data processing of the computer device by executing software programs, instructions and modules stored in the memory 720, so as to implement a vehicle-based user pronunciation simulation method of the above method embodiment, that is:

acquiring target playing data from a prestored voice data set;

The memory 720 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 720 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 720 may optionally include memory located remotely from processor 510, which may be connected to a terminal device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 730 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus. The output device 740 may include a display device such as a display screen.

Example eight

An eighth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a vehicle-based user pronunciation simulation method according to an embodiment of the present invention:

acquiring target playing data from a prestored voice data set;

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A vehicle-based user pronunciation simulation method, comprising:

acquiring target playing data from a prestored voice data set;

2. The method of claim 1, wherein determining the playing control parameter matched with the playing device in the vehicle according to the vehicle environment sound and/or the semantic information of the target playing data comprises:

the method comprises the steps of obtaining a noise decibel value of vehicle environment sound, and determining playing volume matched with playing equipment in a vehicle according to the noise decibel value;

controlling the playing device to play the target playing data according to the playing control parameter, including:

and controlling the playing equipment to play the target playing data according to the playing volume.

3. The method of claim 2, wherein determining a playback volume matching a playback device in a vehicle based on the decibel value comprises:

controlling the playing equipment to play a set audio according to the playing volume and acquiring a playing decibel value matched with the set audio;

4. The method of claim 1, wherein determining the playing control parameter matched with the playing device in the vehicle according to the vehicle environment sound and/or the semantic information of the target playing data comprises:

obtaining semantic information of the target playing data, and determining a playing sound source direction matched with playing equipment in a vehicle according to the semantic information;

and controlling the playing equipment to play the target playing data according to the playing sound source direction.

5. The method of claim 4, wherein the semantic information includes at least one of: air-conditioning, canon, weather, smart home, or navigation;

determining a playing sound source direction matched with playing equipment in a vehicle according to the semantic information, wherein the method comprises the following steps:

if the semantic information is of an air conditioner type, determining that the direction of a playing sound source matched with playing equipment in the vehicle is lower than the front or in front;

if the semantic information is of a canon type, determining that a playing sound source direction matched with playing equipment in the vehicle is a vehicle terminal control direction or a front direction;

if the semantic information is of an intelligent home class, determining that a playing sound source direction matched with playing equipment in a vehicle is a front direction or a vehicle terminal control direction;

and if the semantic information is of a navigation type, determining that the playing sound source direction matched with the playing equipment in the vehicle is the front direction or the vehicle terminal control direction.

6. The method according to claim 1, before controlling the playback device to play back the target playback data according to the playback control parameter, further comprising:

detecting the bumping state of a vehicle, and determining the playing frequency matched with playing equipment in the vehicle according to the bumping state;

and controlling the playing equipment to play the target playing data according to the playing frequency.

7. The method of any of claims 1-6, wherein the playback device is a manual mouth.

8. A vehicle-based user pronunciation simulation system, comprising: the system comprises a processor, a playing device, a sound acquisition assembly, a sound card, a digital signal processor and an audio detection module;

9. A vehicle-based user pronunciation simulation apparatus, comprising:

10. The apparatus of claim 9, wherein the playback control parameter determining module comprises:

the playing volume determining unit is used for acquiring a noise decibel value of the environmental sound of the vehicle and determining the playing volume matched with playing equipment in the vehicle according to the noise decibel value;

the play control module includes:

11. The apparatus of claim 9, wherein the playback control parameter determining module comprises:

the playing sound source direction determining unit is used for acquiring semantic information of the target playing data and determining a playing sound source direction matched with playing equipment in the vehicle according to the semantic information;

the play control module includes:

12. A computer device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a vehicle-based user pronunciation simulation method as claimed in any one of claims 1 to 7.

13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a vehicle-based user pronunciation simulation method as claimed in any one of claims 1 to 7.