CN111402925A - Voice adjusting method and device, electronic equipment, vehicle-mounted system and readable medium - Google Patents

Voice adjusting method and device, electronic equipment, vehicle-mounted system and readable medium Download PDF

Info

Publication number
CN111402925A
CN111402925A CN202010172637.4A CN202010172637A CN111402925A CN 111402925 A CN111402925 A CN 111402925A CN 202010172637 A CN202010172637 A CN 202010172637A CN 111402925 A CN111402925 A CN 111402925A
Authority
CN
China
Prior art keywords
vehicle
voice
information
speech
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010172637.4A
Other languages
Chinese (zh)
Other versions
CN111402925B (en
Inventor
李黎萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Zhilian Beijing Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010172637.4A priority Critical patent/CN111402925B/en
Publication of CN111402925A publication Critical patent/CN111402925A/en
Application granted granted Critical
Publication of CN111402925B publication Critical patent/CN111402925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/09Arrangements for giving variable traffic instructions
    • G08G1/0962Arrangements for giving variable traffic instructions having an indicator mounted inside the vehicle, e.g. giving voice messages
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • General Physics & Mathematics (AREA)
  • Traffic Control Systems (AREA)

Abstract

The embodiment of the disclosure discloses a voice adjusting method and device. One embodiment of the present disclosure includes: acquiring environmental information of a vehicle; acquiring state information of at least one person on the vehicle; determining a voice adjustment strategy based on the environmental information and the state information; and adjusting parameters of the voice to be played on the vehicle according to the voice adjusting strategy. According to the embodiment, the situation urgency is judged by utilizing the conditions of the inside and outside vehicles and the road conditions, and then whether the persons in the vehicle need to be pacified is determined by combining the emotional states of the driver/the passengers in the vehicle, and the voice is subjected to tone-changing and speed-changing processing. Under the condition of ensuring the integrity of the same voice image, the voice is endowed with correct emotional feedback, and safer driving voice interaction is brought. The method can be applied to auxiliary driving and unmanned driving scenes.

Description

Voice adjusting method and device, electronic equipment, vehicle-mounted system and readable medium
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to a voice adjusting method and device.
Background
With the rapid development of computer technology and artificial intelligence, vehicle navigation and assistant driving are increasingly widely applied in the field of automobile driving. In the former vehicle-mounted voice system, mechanical single voice prompt is generally adopted, namely, one voice (speed, tone and intonation) is fixedly adopted, which may result in poor prompt effect and is not preferred by users. For example, in general surveys, some users prefer a sweet and soft female voice (slow speech speed, high pitch, and speech pitch change), and some users prefer a voice broadcast of a specific star. In order to meet the personalized requirements of users, a voice synthesis technology can be adopted to simulate the voice of a specific character (a limited amount of voice is collected, and an artificial intelligence technology is adopted to carry out voice synthesis processing to obtain a target language with the voice tone of the specific character).
Research shows that for warning reminding in the driving process, urgent intonation (fast intonation, low intonation and leveling) needs to be ensured to enable a user to react more quickly. The following solutions can be envisaged:
(1) different functions call different voice packets, for example, the voice packet 1 when the function of waking up the voice dialog system is performed, and the voice packet 2 when the function of broadcasting the navigation information is performed. However, in the scheme, different functions cannot completely represent the urgency of the scene, and the voice packet has large timbre difference, so that the voice image experience is split.
(2) Real persons record specific linguistic data, and some vehicle-mounted voice systems record more linguistic data, so that natural changes of voice tones and voice speeds under different scenes are attempted. However, in this scheme, various situations are considered when the corpus needs to be recorded, and the corpus recording workload is large.
It should be noted that the approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.
Disclosure of Invention
The embodiment of the disclosure provides a voice adjusting method and device.
According to a first aspect of the present disclosure, an embodiment of the present disclosure provides a voice adjusting method, including: acquiring environmental information of a vehicle; acquiring state information of at least one person on the vehicle; determining a voice adjustment strategy based on the environmental information and the state information; and adjusting parameters of the voice to be played on the vehicle according to the voice adjusting strategy.
In some embodiments, the environmental information comprises at least one of in-vehicle environmental information and out-vehicle environmental information, and wherein the out-vehicle environmental information comprises road condition information and/or advanced driver assistance system, ADAS, information.
In some embodiments, the parameters include at least one of pitch, pace, and intonation.
In some embodiments, said determining a voice adjustment policy based on said context information and said status information comprises: and judging the emergency degree of a prompt event corresponding to the voice to be played according to the environment information, and judging the emotional state of at least one person on the vehicle according to the state information.
In some embodiments, where the environmental information includes in-vehicle environmental information, the in-vehicle environmental information is obtained by collecting vehicle component state information.
In some embodiments, when the environment information includes traffic information, the traffic information is obtained by one of the following methods: acquiring real-time high-precision road condition information through a cloud; or sense the proximity of the vehicle through a sensing camera and/or radar.
In some embodiments, the at least one person on the vehicle comprises a driver of the vehicle, and wherein the status information is obtained in at least one of: collecting facial expressions of at least one person on the vehicle through a camera; collecting, by a voice receiver, a language of at least one person on the vehicle; collecting the driving action of the driver through a driving action collector; or the duration of the current driving of the driver is acquired through clock recording.
In some embodiments, the method further comprises: and pre-establishing a voice regulation strategy model, wherein the voice regulation strategy model comprises the corresponding relation of the emergency degree, the emotional state and the voice regulation strategy, and the voice regulation strategy comprises the combination of frequency, speed and a tone model curve.
In some embodiments, the intonation model curves include a serious intonation model curve with warning effects, a peace intonation model curve with soothing effects, and an active intonation model curve with uplifting effects.
In some embodiments, the adjusting the parameter of the voice to be played on the vehicle according to the voice adjustment strategy includes: and correspondingly adjusting the frequency and the speech speed of the voice to be played and correspondingly adjusting the tone model of the voice to be played according to the combination of the frequency, the speed and the tone model curve included in the determined voice adjusting strategy.
According to a second aspect of the present disclosure, an embodiment of the present disclosure provides a voice adjusting apparatus, including: a first acquisition unit configured to acquire environmental information of a vehicle; a second acquisition unit configured to acquire status information of at least one person on the vehicle; a determining unit configured to determine a voice adjustment policy based on the environment information and the state information; and the adjusting unit is configured to adjust the parameters of the voice to be played on the vehicle according to the voice adjusting strategy.
In some embodiments, the environmental information comprises at least one of in-vehicle environmental information and out-vehicle environmental information, and wherein the out-vehicle environmental information comprises road condition information and/or ADAS information.
In some embodiments, the parameters include at least one of pitch, pace, and intonation.
In some embodiments, the determining unit is configured to determine an urgency level of a prompting event corresponding to the prompting voice according to the environment information, and determine an emotional state of at least one person on the vehicle according to the state information.
In some embodiments, in a case where the environmental information includes in-vehicle environmental information, the first acquisition unit is configured to acquire the in-vehicle environmental information by collecting vehicle component state information; and wherein, in case the environment information includes traffic information, the first obtaining unit is configured to obtain the traffic information at least in one of: acquiring real-time high-precision road condition information through a cloud; or sense the proximity of the vehicle through a sensing camera and/or radar.
In some embodiments, in a case where the at least one person on the vehicle comprises a driver of the vehicle, the second obtaining unit is configured to obtain the status information at least in one of: collecting facial expressions of at least one person on the vehicle through a camera; collecting, by a voice receiver, a language of at least one person on the vehicle; collecting the driving action of the driver through a driving action collector; or the duration of the current driving of the driver is acquired through clock recording.
In some embodiments, the determining unit is further configured to determine a speech adaptation strategy according to a pre-established speech adaptation strategy model, the speech adaptation strategy model comprising a correspondence of the urgency, emotional state, and speech adaptation strategy, the speech adaptation strategy comprising a combination of frequency, speed, and intonation model curves.
According to a third aspect of the present disclosure, an embodiment of the present disclosure provides an electronic device including: one or more processors; a storage device having one or more programs stored thereon; when executed by the one or more processors, cause the one or more processors to implement a method as described in any implementation of the first aspect.
According to a fourth aspect of the present disclosure, an embodiment of the present disclosure provides an in-vehicle system including the electronic device as described in the third aspect.
According to a fifth aspect of the present disclosure, an embodiment of the present disclosure provides a computer readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method as described in any of the implementations of the first aspect.
According to the voice adjusting method and device provided by the embodiment of the disclosure, the situation urgency is judged by utilizing the inside and outside vehicle conditions and the road conditions, and then whether people in the vehicle need to be pacified is determined by combining the emotion states of a driver/passengers in the vehicle, and the voice is subjected to tone-changing and speed-changing processing. Under the condition of ensuring the integrity of the same voice image, the voice is endowed with correct emotional feedback, and safer driving voice interaction is brought. The present disclosure is applicable to assisted driving and unmanned driving scenarios.
Drawings
Other features, objects, and advantages of the disclosure will become apparent from a reading of the following detailed description of non-limiting embodiments which proceeds with reference to the accompanying drawings.
FIG. 1 is a flow chart of one embodiment of a method of speech adjustment according to the present disclosure.
FIG. 2 is a schematic block diagram of one embodiment of a speech adjustment apparatus according to the present disclosure.
FIG. 3 is a schematic block diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed description of the preferred embodiments
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
FIG. 1 shows a flow diagram of a process 100 according to one embodiment of a speech adjustment method of the present disclosure. The voice adjusting method comprises the following steps:
in step 101, environmental information of a vehicle is acquired.
In this step, the execution subject of the voice adjustment method may acquire the environment information through a wired connection manner or a wireless connection manner. In some embodiments, the environmental information may include in-vehicle environmental information or out-of-vehicle environmental information. The in-vehicle environment information may include, for example, vehicle condition information, and the vehicle condition information may include various data related to the vehicle operating condition, for example, data related to each component of the vehicle itself, such as tire pressure, water temperature, oil amount, electric quantity, vehicle speed, and the like. As an example, the corresponding component status data may be acquired by a tire pressure sensor, a temperature sensor, an oil sensor, a level sensor, a vehicle speed sensor, and the like, respectively. The environment information outside the vehicle can include road condition information and Advanced Driver Assistance System (ADAS) information. The traffic information may be, for example, traffic congestion conditions. As an example, the real-time high-precision traffic information may be obtained from a cloud. This information can be used in subsequent steps to determine whether there is a traffic emergency (bad weather, road landslide, ambulance passing, etc.) or a route problem (wrong walk, overspeed, etc.). The ADAS information may include vehicle surroundings, such as the distance of the vehicle from surrounding obstacles. As an example, the situation around the vehicle may be sensed by a sensing camera, a radar, or the like, and the distance of the vehicle from a surrounding obstacle may be acquired by an ultrasonic radar, for example. In some embodiments, the present disclosure is applied to an intelligent cockpit system, and a sensor such as a sensing camera and a radar configured in an intelligent cockpit can determine whether there is an approach accident (collision with another vehicle, collision with an object, etc.) or a driving problem (running a red light, pressing a line, lane departure), so as to warn a user or provide a manual takeover request. Although the specific content and the obtaining manner of the environment information are described above, this is merely an example, and the present disclosure is not limited thereto. Those skilled in the art will appreciate that the content and manner of obtaining the information may be expanded according to particular needs.
At step 102, status information of at least one person on the vehicle is obtained.
In this step, the execution subject of the voice adjustment method may acquire the status information in a wired connection manner or a wireless connection manner. For example, the state information of the person on the vehicle, such as the driver of the current vehicle or the passenger in the vehicle, is acquired. One or more persons on the vehicle may be considered a user of the voice adjustment function. For simplicity, the term "user" is used hereinafter to refer to the persons on the vehicle, including the driver and the passengers in the vehicle. It should be understood that the at least one person on the vehicle may comprise at least a driver of the vehicle.
In some embodiments, the status information of the user may be various effective information that helps to analyze the emotional status of the user, and by means of the information, it is preferably more effective to analyze whether the user is in a specific emotional status, such as a panic, fatigue, or sad (depressed mood). As an example, the emotional state of the user may be analyzed by the user's facial expressions, the language spoken by the user, the user's driving actions, the duration of this driving, and so on. As an example, the facial expression of the user may be collected by a camera, the language spoken by the user may be collected by a voice receiver, the driving action of the driver may be collected by a driving action collector, and the duration of the driving of the driver may be collected by a clock record. In some embodiments, a camera, a voice receiver, a driving action collector, and the like may be provided in the intelligent cabin system to which the present disclosure is applied. Although the specific content and the obtaining manner of the user status information are described above, this is merely an example, and the present disclosure is not limited thereto. Those skilled in the art will appreciate that the content and manner of obtaining the information may be expanded according to particular needs.
Although step 101 is described above followed by step 102, this is not intended to limit the order of the two steps, which may be performed simultaneously or followed by the description, and it will be understood by those skilled in the art that the present disclosure is not limited thereto.
In step 103, a voice adjustment strategy is determined based on the environment information and the state information.
In this step, analysis is performed with respect to the information acquired in the previous step. And analyzing the emergency degree of the situation according to the environment information, and analyzing the emotional state of the user according to the user state information. Powerful analysis functions can be supported by means of a background server.
In some embodiments, the urgency is pre-classified into a plurality of levels. The number of levels of the pre-division is determined according to the requirement. As an example, it can be simply divided into urgent and non-urgent. This is merely an example and the disclosure is not limited thereto.
In some embodiments, the level of urgency to which a particular situation corresponds is set in advance. For example, the user may be prompted to refuel if the fuel volume is below a first or second threshold, and the urgency of the prompting event below a larger first threshold may be set to a lower level and the urgency of the prompting event below a smaller second threshold may be set to a higher level. For example, if a vehicle component is out of order and a failure notification is required, and a traffic accident is caused by the failure of an important component, the urgency of the notification event may be set to a higher level. Further, for example, the degree of urgency of a warning event that the vehicle speed is high and the safe driving distance to the preceding vehicle is not maintained needs to be set to a high level. For example, the urgency level of a road condition notification event of a congestion situation may be set to a low level. In some embodiments, a database is pre-established, various prompting events are categorized and corresponding urgency levels are set. As an example, when the urgency level is analyzed according to the environment information of the vehicle, the urgency level of the current situation, that is, the urgency level of the prompt event corresponding to the voice to be played, may be determined by searching a pre-established database based on the acquired environment information of the vehicle.
In this step, the emotional state of the user is analyzed. The emotional state of the user can be analyzed according to the collected facial expression, language, driving action, driving duration and the like of the user, the driver can be judged according to one of the emotional states, for example, the driver can be judged to be in a fatigue driving state according to the driving duration, the driver can be judged according to a plurality of information, and even the driver is comprehensively judged by combining with other various information, for example, the current emotional state of the user can be comprehensively judged according to the current facial expression of the user, the specific language sent out, and even the road condition information representing the congestion condition. In some embodiments, when analyzing the emotional state of the user, a comprehensive analysis may be performed by means of the user feature database based on the collected facial expressions, languages, driving actions, and/or driving durations of the user, so as to determine the emotional state of the user more efficiently and accurately. Whether the user is in a panic state, a fatigue state, a normal state, a distraction state, a difficult state and the like can be judged according to the preset mode based on the user state information acquired in the previous step. In some embodiments, the information processing determination may be processed in a cabin intelligence system or may be assisted by a cloud-based system.
In this step, a voice adjustment strategy is determined according to the urgency and the emotional state, and the voice adjustment strategy comprises a combination of frequency adjustment, speed adjustment and intonation model curves. Wherein, the frequency adjustment corresponds to the pitch of the voice, the speed adjustment corresponds to the speed of the voice, and the intonation model curve corresponds to the intonation of the voice. Intonation is a combination of tone and rhythm, such as happy tone, end of sentence tone up, angry tone, and end of sentence tone down. A sentence typically has 2 pitch peaks and 3 low points. Therefore, the voice can be expressed into different tones after being adjusted by adopting different tone model curves. As an example, the intonation model curves may include a serious intonation model curve with a warning effect, a flat intonation model curve with a soothing effect, and an active intonation model curve with an exciting effect, among others.
In some embodiments, a speech adaptation strategy model is pre-established, which may be linear, non-linear, or hierarchical. As an example, the voice adjustment policy model includes a voice adjustment policy table that includes a correspondence of urgency, emotional state, and voice adjustment policy. Table 1 below is an example of a voice adjustment policy table. By way of example, the urgency includes urgency and non-urgency, and emotional states include normal, panic, fatigue … …, and so forth. For example, when the user is in a panic state in a non-emergency situation, the user needs to process the voice by adopting a bass and slow model curve in combination with a flat tone, and for example, when the user is in a fatigue state in an emergency situation, the user needs to process the voice by adopting a treble and fast model curve in combination with a serious tone.
TABLE 1
Degree of urgency Emotional state Voice adjustment strategy
Emergency system Is normal Bass, fast and speech tone seriousness
Emergency system Fatigue High pitch, fast speed and serious intonation
Emergency system Panic alarm Bass, slow, intonation and peace
Is not urgent Is normal Without treatment
Is not urgent Fatigue High pitch, fast speed and serious intonation
Is not urgent Panic alarm Bass, slow, intonation and peace
…… …… ……
In some embodiments, voice modification strategies with appropriate prompting effects may also be employed based on more emotional states (e.g., anxiety, injury, distraction, etc.). More intonation model curves can be developed, such as soothing intonations with a soothing effect, and the like. The above are merely examples, and the present disclosure is not limited thereto.
And 104, adjusting parameters of the voice to be played on the vehicle according to the voice adjusting strategy.
In this step, one or more parameters of the speech to be played, such as speech, intonation, and speech rate, are adjusted according to the determined speech adjustment strategy. In some embodiments, the frequency and the speech rate of the speech to be played are adjusted and corresponding tone model processing is performed according to a combination of the frequency, the speed, and the tone model curve included in the determined speech adjustment strategy.
In some embodiments, the voice to be played may be a voice to be played corresponding to a cue event called from a pre-made voice packet. For example, a reminder event is generated that reminds the user to refuel when the amount of fuel is below a certain threshold. And calling voice data with matched content from a pre-prepared voice packet based on the prompt event as voice to be played. In other embodiments, the voice to be played may also be voice data acquired from a background server according to user preferences and the like. In some embodiments, the voice to be played may also be a predetermined voice in an existing playing program, which may not be associated with the environment information or the user status obtained in real time. The above is only an example of the content and the obtaining manner of the voice to be played, and the disclosure is not limited thereto.
The voice processed by the determined voice adjusting strategy is endowed with correct emotional feedback when being played, so that a better prompting effect can be achieved.
With further reference to fig. 2, as an implementation of the method shown in fig. 1, the present disclosure provides an embodiment of a speech adjusting apparatus, which corresponds to the embodiment of the method shown in fig. 1, and which can be applied in various electronic devices.
As shown in fig. 2, the voice adjustment apparatus 200 provided by the present embodiment includes a first acquisition unit 201 configured to acquire environmental information of a vehicle; a second acquisition unit 202 configured to acquire status information of at least one person on the vehicle; a determining unit 203 configured to determine a voice adjustment policy based on the environment information and the state information; an adjusting unit 204 configured to adjust parameters of the voice to be played on the vehicle according to the voice adjusting strategy.
In the present embodiment, in the voice adjustment apparatus 200: the specific processing of the first obtaining unit 201, the second obtaining unit 202, the determining unit 203, and the adjusting unit 204 and the technical effects thereof can refer to the related descriptions of step 101, step 102, step 103, and step 104 in the corresponding embodiment of fig. 1, which are not described herein again.
In some optional implementations of this embodiment, the environmental information includes at least one of in-vehicle environmental information and out-vehicle environmental information, and wherein the out-vehicle environmental information includes road condition information and/or ADAS information.
In some optional implementations of this embodiment, the parameter includes at least one of pitch, pace, and intonation.
In some optional implementations of this embodiment, the determining unit 203 may be configured to determine an urgency level of a prompting event corresponding to the prompting voice according to the environment information, and determine an emotional state of at least one person on the vehicle according to the state information.
In some optional implementations of the present embodiment, in a case where the environmental information includes in-vehicle environmental information, the first obtaining unit 201 may be configured to obtain the in-vehicle environmental information by collecting vehicle component state information; and wherein, in the case that the environment information includes traffic information, the first obtaining unit 201 may be configured to obtain the traffic information at least in one of the following manners: acquiring real-time high-precision road condition information through a cloud; or sense the proximity of the vehicle through a sensing camera and/or radar.
In some optional implementations of this embodiment, the at least one person on the vehicle comprises a driver of the vehicle, and wherein the second obtaining unit 202 may be configured to obtain the status information at least in one of: collecting facial expressions of at least one person on the vehicle through a camera; collecting, by a voice receiver, a language of at least one person on the vehicle; collecting the driving action of the driver through a driving action collector; or the duration of the current driving of the driver is acquired through clock recording.
In some optional implementations of this embodiment, the determining unit 203 may be further configured to determine a voice regulation policy according to a pre-established voice regulation policy model, where the voice regulation policy model includes correspondence between the urgency level, the emotional state, and the voice regulation policy includes a combination of frequency, speed, and intonation model curves. The adjusting unit 204 may be configured to perform corresponding adjustment on the frequency and the speech speed of the speech to be played and perform corresponding intonation model adjustment on the speech to be played according to a combination of the frequency, the speed, and the intonation model curve included in the determined speech adjusting strategy.
In some optional implementations of the present embodiment, the intonation model curves may include a serious intonation model curve with a warning effect, a peace intonation model curve with a soothing effect, and an active intonation model curve with a revitalizing effect.
The voice adjusting method and the voice adjusting device can process the voice to be played based on the vehicle-mounted scene, judge scene urgency by using the conditions of the inside and outside vehicles and road conditions, determine whether the persons in the vehicle need to be pacified or not by combining the emotional states of the driver/passengers in the vehicle, and perform tone and speed changing processing on the voice. Under the condition of ensuring the integrity of the same voice image, the voice is endowed with correct emotional feedback, and safer driving voice interaction is brought. The method can be applied to auxiliary driving and unmanned driving scenes.
Referring now to FIG. 3, a block diagram of an electronic device 300 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a vehicle-mounted terminal (e.g., a vehicle navigation terminal), a mobile phone, a notebook computer, a PAD (tablet computer), and the like. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
Generally, devices including input devices 306 such as a touch screen, touch pad, camera, accelerometer, gyroscope, etc., output devices 307 such as a liquid Crystal Display (L CD, &lTtTtranslation = L "&gTtL &lTt/T &gTtiquid Crystal Display), speaker, vibrator, etc., storage devices 308 including Flash memory (Flash Card), etc., and communication devices 309, the communication devices 309 may allow the electronic apparatus 300 to communicate wirelessly or wiredly with other devices to exchange data although FIG. 3 illustrates an electronic apparatus 300 having various devices, it is understood that not all of the illustrated devices are required to be implemented or provided, more or less devices may be implemented instead, each block illustrated in FIG. 3 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 309, or installed from the storage means 308, or installed from the ROM 302. The computer program, when executed by the processing apparatus 301, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (Radio Frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring environmental information of a vehicle; acquiring state information of at least one person on the vehicle; determining a voice adjustment strategy based on the environmental information and the state information; and adjusting parameters of the voice to be played on the vehicle according to the voice adjusting strategy.
The voice adjusting device may be a part of an on-board system or a driving assistance system, for example, a part of an advanced driving assistance system ADAS, and is implemented as a function of the system.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including AN object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language, or similar programming languages.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first acquisition unit, a second acquisition unit, a determination unit, and an adjustment unit. Here, the names of these units do not constitute a limitation of the unit itself in some cases, and for example, the first acquisition unit may also be described as a "unit that acquires environmental information of the vehicle".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims (20)

1. A method of speech conditioning, the method comprising:
acquiring environmental information of a vehicle;
acquiring state information of at least one person on the vehicle;
determining a voice adjustment strategy based on the environmental information and the state information;
and adjusting parameters of the voice to be played on the vehicle according to the voice adjusting strategy.
2. The voice adjustment method according to claim 1, wherein the environmental information includes at least one of in-vehicle environmental information and out-vehicle environmental information, and wherein the out-vehicle environmental information includes road condition information and/or Advanced Driver Assistance System (ADAS) information.
3. The speech adjustment method according to claim 1 or 2, wherein the parameter comprises at least one of pitch, pace and intonation.
4. The speech adjustment method according to claim 1 or 2, wherein the determining a speech adjustment policy based on the context information and the status information comprises:
and judging the emergency degree of a prompt event corresponding to the voice to be played according to the environment information, and judging the emotional state of at least one person on the vehicle according to the state information.
5. The voice adjustment method according to claim 2, wherein the in-vehicle environment information is acquired by collecting vehicle component state information in a case where the environment information includes in-vehicle environment information.
6. The voice adjusting method according to claim 2, wherein, when the environment information includes traffic information, the traffic information is obtained by one of the following methods:
acquiring real-time high-precision road condition information through a cloud; or
And sensing the situation near the vehicle through a sensing camera and/or a radar.
7. The voice adjustment method of claim 1, wherein the at least one person on the vehicle comprises a driver of the vehicle, and wherein the status information is obtained in at least one of:
collecting facial expressions of at least one person on the vehicle through a camera;
collecting, by a voice receiver, a language of at least one person on the vehicle;
collecting the driving action of the driver through a driving action collector; or
And recording and acquiring the duration of the current driving of the driver through a clock.
8. The speech adjustment method of claim 4, wherein the method further comprises: and pre-establishing a voice regulation strategy model, wherein the voice regulation strategy model comprises the corresponding relation of the emergency degree, the emotional state and the voice regulation strategy, and the voice regulation strategy comprises the combination of frequency, speed and a tone model curve.
9. The speech adaptation method of claim 8, wherein the intonation model curves include a serious intonation model curve with warning effect, a peace intonation model curve with soothing effect, and an active intonation model curve with excitement effect.
10. The voice adjustment method according to claim 8 or 9, wherein the adjusting the parameter of the voice to be played on the vehicle according to the voice adjustment strategy comprises:
and correspondingly adjusting the frequency and the speech speed of the voice to be played and correspondingly adjusting the tone model of the voice to be played according to the combination of the frequency, the speed and the tone model curve included in the determined voice adjusting strategy.
11. A voice adjustment apparatus comprising:
a first acquisition unit configured to acquire environmental information of a vehicle;
a second acquisition unit configured to acquire status information of at least one person on the vehicle;
a determining unit configured to determine a voice adjustment policy based on the environment information and the state information;
and the adjusting unit is configured to adjust the parameters of the voice to be played on the vehicle according to the voice adjusting strategy.
12. The voice adjustment device of claim 11, wherein the environmental information includes at least one of in-vehicle environmental information and out-vehicle environmental information, and wherein the out-of-vehicle environmental information includes road condition information and/or ADAS information.
13. The speech adaptation device according to claim 11 or 12, wherein the parameter comprises at least one of pitch, pace and intonation.
14. The speech adjustment device of claim 11 or 12, wherein the determination unit is configured to:
and judging the emergency degree of a prompt event corresponding to the prompt voice according to the environment information, and judging the emotional state of at least one person on the vehicle according to the state information.
15. The voice adjustment device according to claim 12, wherein, in a case where the environmental information includes in-vehicle environmental information, the first acquisition unit is configured to acquire the in-vehicle environmental information by collecting vehicle component state information; and wherein, in case the environment information includes traffic information, the first obtaining unit is configured to obtain the traffic information at least in one of:
acquiring real-time high-precision road condition information through a cloud; or
And sensing the situation near the vehicle through a sensing camera and/or a radar.
16. The voice adjustment apparatus according to claim 11, wherein the at least one person on the vehicle comprises a driver of the vehicle, and wherein the second acquisition unit is configured to acquire the status information at least in one of:
collecting facial expressions of at least one person on the vehicle through a camera;
collecting, by a voice receiver, a language of at least one person on the vehicle;
collecting the driving action of the driver through a driving action collector; or
And recording and acquiring the duration of the current driving of the driver through a clock.
17. The speech adaptation device according to claim 14, wherein the determining unit is configured to determine a speech adaptation strategy according to a pre-established speech adaptation strategy model comprising correspondence of the urgency level, emotional state and speech adaptation strategy, the speech adaptation strategy comprising a combination of frequency, speed and pitch model curves.
18. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-10.
19. An in-vehicle system comprising the electronic device of claim 18.
20. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-10.
CN202010172637.4A 2020-03-12 2020-03-12 Voice adjustment method, device, electronic equipment, vehicle-mounted system and readable medium Active CN111402925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010172637.4A CN111402925B (en) 2020-03-12 2020-03-12 Voice adjustment method, device, electronic equipment, vehicle-mounted system and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010172637.4A CN111402925B (en) 2020-03-12 2020-03-12 Voice adjustment method, device, electronic equipment, vehicle-mounted system and readable medium

Publications (2)

Publication Number Publication Date
CN111402925A true CN111402925A (en) 2020-07-10
CN111402925B CN111402925B (en) 2023-10-10

Family

ID=71430758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010172637.4A Active CN111402925B (en) 2020-03-12 2020-03-12 Voice adjustment method, device, electronic equipment, vehicle-mounted system and readable medium

Country Status (1)

Country Link
CN (1) CN111402925B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112033417A (en) * 2020-09-29 2020-12-04 北京深睿博联科技有限责任公司 Real-time navigation method and device for visually impaired people
CN112349299A (en) * 2020-10-28 2021-02-09 维沃移动通信有限公司 Voice playing method and device and electronic equipment
CN112418162A (en) * 2020-12-07 2021-02-26 安徽江淮汽车集团股份有限公司 Method, apparatus, storage medium, and device for vehicle control
CN112776710A (en) * 2021-01-25 2021-05-11 上汽通用五菱汽车股份有限公司 Sound effect adjusting method, sound effect adjusting system, vehicle-mounted device system and storage medium
CN112837552A (en) * 2020-12-31 2021-05-25 北京梧桐车联科技有限责任公司 Voice broadcasting method and device and computer readable storage medium
CN114360241A (en) * 2021-12-10 2022-04-15 斑马网络技术有限公司 Vehicle interaction method, vehicle interaction device and storage medium
CN115460031A (en) * 2022-11-14 2022-12-09 深圳市听见时代科技有限公司 Intelligent sound control supervision system and method based on Internet of things
WO2023236691A1 (en) * 2022-06-08 2023-12-14 Pateo Connect+ Technology (Shanghai) Corporation Control method based on vehicle external audio system, vehicle intelligent marketing method, electronic apparatus, and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100145695A1 (en) * 2008-12-08 2010-06-10 Electronics And Telecommunications Research Institute Apparatus for context awareness and method using the same
CN102874259A (en) * 2012-06-15 2013-01-16 浙江吉利汽车研究院有限公司杭州分公司 Automobile driver emotion monitoring and automobile control system
CN105895095A (en) * 2015-02-12 2016-08-24 哈曼国际工业有限公司 Adaptive interactive voice system
CN106627589A (en) * 2016-12-27 2017-05-10 科世达(上海)管理有限公司 Vehicle driving safety auxiliary method and system and vehicle
CN106652378A (en) * 2015-11-02 2017-05-10 比亚迪股份有限公司 Driving reminding method and system for vehicle, server and vehicle
CN106650633A (en) * 2016-11-29 2017-05-10 上海智臻智能网络科技股份有限公司 Driver emotion recognition method and device
CN106803423A (en) * 2016-12-27 2017-06-06 智车优行科技(北京)有限公司 Man-machine interaction sound control method, device and vehicle based on user emotion state
CN107117174A (en) * 2017-03-29 2017-09-01 昆明理工大学 A kind of driver's mood monitoring active safety guide device circuit system and its control method
US20180101354A1 (en) * 2016-10-11 2018-04-12 Honda Motor Co., Ltd. Service providing apparatus and method
CN108847239A (en) * 2018-08-31 2018-11-20 上海擎感智能科技有限公司 Interactive voice/processing method, system, storage medium, engine end and server-side
CN108875682A (en) * 2018-06-29 2018-11-23 百度在线网络技术(北京)有限公司 Information-pushing method and device
CN109036405A (en) * 2018-07-27 2018-12-18 百度在线网络技术(北京)有限公司 Voice interactive method, device, equipment and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100145695A1 (en) * 2008-12-08 2010-06-10 Electronics And Telecommunications Research Institute Apparatus for context awareness and method using the same
CN102874259A (en) * 2012-06-15 2013-01-16 浙江吉利汽车研究院有限公司杭州分公司 Automobile driver emotion monitoring and automobile control system
CN105895095A (en) * 2015-02-12 2016-08-24 哈曼国际工业有限公司 Adaptive interactive voice system
CN106652378A (en) * 2015-11-02 2017-05-10 比亚迪股份有限公司 Driving reminding method and system for vehicle, server and vehicle
US20180101354A1 (en) * 2016-10-11 2018-04-12 Honda Motor Co., Ltd. Service providing apparatus and method
CN106650633A (en) * 2016-11-29 2017-05-10 上海智臻智能网络科技股份有限公司 Driver emotion recognition method and device
CN106627589A (en) * 2016-12-27 2017-05-10 科世达(上海)管理有限公司 Vehicle driving safety auxiliary method and system and vehicle
CN106803423A (en) * 2016-12-27 2017-06-06 智车优行科技(北京)有限公司 Man-machine interaction sound control method, device and vehicle based on user emotion state
CN107117174A (en) * 2017-03-29 2017-09-01 昆明理工大学 A kind of driver's mood monitoring active safety guide device circuit system and its control method
CN108875682A (en) * 2018-06-29 2018-11-23 百度在线网络技术(北京)有限公司 Information-pushing method and device
CN109036405A (en) * 2018-07-27 2018-12-18 百度在线网络技术(北京)有限公司 Voice interactive method, device, equipment and storage medium
CN108847239A (en) * 2018-08-31 2018-11-20 上海擎感智能科技有限公司 Interactive voice/processing method, system, storage medium, engine end and server-side

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112033417A (en) * 2020-09-29 2020-12-04 北京深睿博联科技有限责任公司 Real-time navigation method and device for visually impaired people
CN112033417B (en) * 2020-09-29 2021-08-24 北京深睿博联科技有限责任公司 Real-time navigation method and device for visually impaired people
CN112349299A (en) * 2020-10-28 2021-02-09 维沃移动通信有限公司 Voice playing method and device and electronic equipment
CN112418162A (en) * 2020-12-07 2021-02-26 安徽江淮汽车集团股份有限公司 Method, apparatus, storage medium, and device for vehicle control
CN112418162B (en) * 2020-12-07 2024-01-12 安徽江淮汽车集团股份有限公司 Method, device, storage medium and apparatus for controlling vehicle
CN112837552A (en) * 2020-12-31 2021-05-25 北京梧桐车联科技有限责任公司 Voice broadcasting method and device and computer readable storage medium
CN112776710A (en) * 2021-01-25 2021-05-11 上汽通用五菱汽车股份有限公司 Sound effect adjusting method, sound effect adjusting system, vehicle-mounted device system and storage medium
CN114360241A (en) * 2021-12-10 2022-04-15 斑马网络技术有限公司 Vehicle interaction method, vehicle interaction device and storage medium
WO2023236691A1 (en) * 2022-06-08 2023-12-14 Pateo Connect+ Technology (Shanghai) Corporation Control method based on vehicle external audio system, vehicle intelligent marketing method, electronic apparatus, and storage medium
CN115460031A (en) * 2022-11-14 2022-12-09 深圳市听见时代科技有限公司 Intelligent sound control supervision system and method based on Internet of things
CN115460031B (en) * 2022-11-14 2023-04-11 深圳市听见时代科技有限公司 Intelligent sound control supervision system and method based on Internet of things

Also Published As

Publication number Publication date
CN111402925B (en) 2023-10-10

Similar Documents

Publication Publication Date Title
CN111402925B (en) Voice adjustment method, device, electronic equipment, vehicle-mounted system and readable medium
JP6953464B2 (en) Information push method and equipment
CN106803423B (en) Man-machine interaction voice control method and device based on user emotion state and vehicle
CN110214107B (en) Autonomous vehicle providing driver education
EP3610785B1 (en) Information processing device, information processing method, and program
CN111381673A (en) Bidirectional vehicle-mounted virtual personal assistant
JP6150077B2 (en) Spoken dialogue device for vehicles
JP2018059960A (en) Information providing device
JP7020098B2 (en) Parking lot evaluation device, parking lot information provision method and program
KR102403355B1 (en) Vehicle, mobile for communicate with the vehicle and method for controlling the vehicle
CN111402879A (en) Vehicle navigation prompt voice control method, device, equipment and medium
CN113401129B (en) Information processing apparatus, recording medium, and information processing method
CN115859219A (en) Multi-modal interaction method, device, equipment and storage medium
CN114248786B (en) Vehicle control method, system, device, computer equipment and medium
CN115628753A (en) Navigation voice broadcasting method and device, electronic equipment and storage medium
Lashkov et al. Dangerous state detection in vehicle cabin based on audiovisual analysis with smartphone sensors
EP3835106A1 (en) Agent management device, program, and agent management method
CN111652065B (en) Multi-mode safe driving method, equipment and system based on vehicle perception and intelligent wearing
CN113450788A (en) Method and device for controlling sound output
US11946762B2 (en) Interactive voice navigation
US11763831B2 (en) Output apparatus, output method and non-transitory computer-readable recording medium
CN117999211A (en) Vehicle, voice navigation method and device thereof and storage medium
CN115631550A (en) User feedback method and system
CN116161051A (en) Warning method, device, equipment, medium and vehicle for vehicle driver
KR20220057492A (en) Method and apparatus for controlling means of transportation, electronic device, computer readable storage medium and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20211014

Address after: 100176 101, floor 1, building 1, yard 7, Ruihe West 2nd Road, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Applicant after: Apollo Zhilian (Beijing) Technology Co.,Ltd.

Address before: 2 / F, baidu building, 10 Shangdi 10th Street, Haidian District, Beijing 100085

Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant