WO2023246563A1 - 一种声音处理方法及电子设备 - Google Patents

一种声音处理方法及电子设备 Download PDF

Info

Publication number
WO2023246563A1
WO2023246563A1 PCT/CN2023/099912 CN2023099912W WO2023246563A1 WO 2023246563 A1 WO2023246563 A1 WO 2023246563A1 CN 2023099912 W CN2023099912 W CN 2023099912W WO 2023246563 A1 WO2023246563 A1 WO 2023246563A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
audio data
speaker
gain
picture
Prior art date
Application number
PCT/CN2023/099912
Other languages
English (en)
French (fr)
Other versions
WO2023246563A9 (zh
Inventor
徐波
张超
马晓慧
余平
张丽梅
冯素梅
陈鹏
周秀敏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023246563A1 publication Critical patent/WO2023246563A1/zh
Publication of WO2023246563A9 publication Critical patent/WO2023246563A9/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/04Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation

Definitions

  • the present application relates to the field of terminal technology, and in particular, to a sound processing method and electronic equipment.
  • the present application provides a sound processing method, electronic equipment, computer storage media and computer program products, which can construct audio data to be played that is adapted to the current environment or the current user's status, so that the audio data to be played can be matched with the current environment or the current user's status.
  • the current environment or the current user's status is integrated to enhance the user experience.
  • the present application provides a sound processing method.
  • the method may include: obtaining environmental information associated with the target device, where the environmental information includes environmental data of the area where the target device is located; and determining, based on the environmental data, the environmental information associated with the environmental data.
  • the target audio data obtained from the white noise corresponding to the N sound objects also matches the environmental data of the area where the target device is located. , In this way, the user can have the experience of being in the environment when listening to the target audio data, thereby having an immersive experience and improving the user experience.
  • the target device can be a vehicle or an electronic device in the vehicle.
  • the target device can be a device integrated in the vehicle, such as a vehicle-mounted terminal, or a device separated from the vehicle, such as a driver's mobile phone.
  • the environmental data may include one or more of environmental images, environmental sounds, weather information or seasonal information, etc.
  • the N sound objects may be sound objects identified based on environmental data, or may be sound objects obtained after the user filters the sound objects identified based on environmental data, for example, by eliminating certain sound objects. The remaining sound objects, or the sound objects obtained by adding some new sound objects, and so on.
  • obtaining the white noise corresponding to each sound object and obtaining N pieces of audio data specifically includes: querying the atomic database based on the N sound objects to obtain N pieces of audio data, where the atomic database is configured with Audio data for each single object within a specific period of time.
  • audio data of a certain duration can be obtained by randomly combining the audio data of multiple objects in the atomic database or combining them according to preset rules.
  • original The sub-database can include: audio data of water flow, audio data of cicadas, audio data of vegetation, etc.
  • the white noise audio data in the atomic database can be configured in the vehicle in advance, or obtained from the server in real time.
  • the environmental data includes environmental sounds.
  • Obtain the white noise corresponding to each sound object and obtain N audio data which specifically includes: extracting the audio data of M sound objects from the environmental sound to obtain M audio data, 0 ⁇ M ⁇ N; among them, when M ⁇
  • the atomic database is queried to obtain (N-M) audio data.
  • the atomic database is configured with the audio data of each single object within a specific period of time. For example, when the audio data of the sound object extracted from the environmental sound does not meet the requirements, the audio data can be discarded and the audio data corresponding to the corresponding sound object can be obtained from the atomic database, thereby improving the subsequent obtained The quality of the target audio data.
  • the method further includes: adjusting the gains of the channels included in each of the M pieces of audio data to the target value. This can improve the loudness of audio data, etc., thereby more realistically restoring environmental sounds and improving user experience.
  • the emotion expressed by each audio data is the same as the emotion expressed by the environmental data. This further matches the target audio data with the environmental information and improves the user experience.
  • this application provides a sound processing method.
  • the method may include: obtaining environmental information associated with the target device.
  • the environmental information includes the first audio data and the second audio data that need to be played simultaneously in the environment where the target device is located, And the first audio data and the second audio data are both played by the same device, wherein the first audio data is the audio data that is played continuously in the first time period, and the second audio data is the audio that is played sporadically in the first time period.
  • the second audio data and the fourth audio data correspond to the same playback time period, and the target processing includes vocal elimination or vocal reduction; according to the second audio data, determine the first gain that needs to be adjusted for the second audio data, and, Based on the first gain, adjust the gain of each channel in the second audio data to obtain fifth audio data; determine the second gain that needs to be adjusted for the fourth audio data based on the fourth audio data or the fifth audio data, and , based on the second gain, adjust the gain of each channel in the fourth audio data to obtain the sixth audio data; based on the fifth audio data and the sixth audio data, obtain the target audio data, where the target audio data and the environment information Match; output target audio data.
  • the user can clearly perceive the While the information contained in the occasionally played audio data can also be clearly perceived, the tunes, background sounds, etc. of other audio data can be clearly perceived, thereby more effectively satisfying the user's sense of hearing and improving the user experience.
  • the audio data that is played continuously can be a certain type of music
  • the audio data that is played occasionally can be audio data of navigation that needs to be broadcast during navigation.
  • vocal elimination can be understood as eliminating vocals in audio data
  • vocal reduction can be understood as reducing vocals in audio data.
  • the target device can be a vehicle or an electronic device in the vehicle.
  • the target device can be a device integrated in the vehicle, such as a vehicle-mounted terminal, or a device separated from the vehicle, such as a driver's mobile phone.
  • the method may be, but is not limited to, applied to a first device, which may be a device that plays the first audio data and the second audio data.
  • the second audio data is the first data
  • the fourth audio data is the first data
  • determining the gain that needs to be adjusted for the first data according to the first data specifically includes: obtaining
  • the audio characteristics of the first data include one or more of the following: time domain characteristics, frequency domain characteristics, or music theory characteristics; according to the audio characteristics, the gain that needs to be adjusted for the first data is determined.
  • the audio characteristics can be processed based on a preset gain calculation formula to obtain the required adjusted gain.
  • the audio features may be, but are not limited to, time domain features, such as loudness, envelope energy, or short-term energy.
  • the loudness may be the loudness at each moment in the second audio data, or the maximum loudness, etc.
  • the audio features may be, but are not limited to, time domain features (such as loudness, envelope energy, or short-term energy, etc.), frequency domain features (such as: multiple Spectral energy of each frequency band, etc.), music theory characteristics (such as: beat, mode, chord, pitch, timbre, melody, emotion, etc.).
  • time domain features such as loudness, envelope energy, or short-term energy, etc.
  • frequency domain features such as: multiple Spectral energy of each frequency band, etc.
  • music theory characteristics such as: beat, mode, chord, pitch, timbre, melody, emotion, etc.
  • determining the second gain that needs to be adjusted for the fourth audio data according to the fifth audio data specifically includes: obtaining the maximum loudness value of the fifth audio data; according to the maximum loudness value of the fifth audio data and a first ratio to determine the second gain, where the first ratio is the ratio between the maximum loudness value of the second audio data and the maximum loudness value of the fourth audio data.
  • determining the second gain that needs to be adjusted for the fourth audio data according to the fifth audio data specifically includes: obtaining the maximum loudness value of the fifth audio data; according to the maximum loudness value of the fifth audio data and a first ratio to determine the second gain, where the first ratio is the ratio between the maximum loudness value of the second audio data and the maximum loudness value of the fourth audio data.
  • the method further includes: correcting the second gain based on the first gain. This makes the sound generated by subsequent playing of the fifth audio data easier to perceive.
  • the second gain is corrected based on a preset linear relationship between the first gain and the second gain.
  • the method further includes: determining that the second gain is greater than a preset gain value; and updating the second gain to the preset gain value. For example, when the second gain is greater than the preset gain value, it indicates that the sound generated by playing the fourth audio data is smaller, which has less impact on the sound generated by playing the subsequently obtained fifth audio data, so it can be The determined value of the second gain is updated to the preset gain value.
  • adjusting the gain of each channel in the fourth audio data specifically includes: after the fourth audio data starts playing, and at the same time as the fourth audio data starts playing Within a first length of time apart from the first preset time, gradually adjust the gain of each channel in the fourth audio data to the second gain according to the first preset step size; and, before the playback of the fourth audio data ends, and with Within a second time period from the second preset time when the playback of the fourth audio data ends, the gain of each channel in the fourth audio data is gradually adjusted from the second gain to the preset gain value according to the second preset step size. This avoids sudden changes in volume, thereby causing the volume of sound perceived by the user to gradually change, thereby improving the user experience.
  • adjusting the gain of each channel in the fourth audio data specifically includes: before the fourth audio data starts playing, and at the same time as the fourth audio data starts playing. Within a first length of time apart from the first preset time, gradually adjust the gain of each channel in the fourth audio data to the second gain according to the first preset step size; and, after the fourth audio data is played, and with Within a second time period from the second preset time when the playback of the fourth audio data ends, the gain of each channel in the fourth audio data is gradually adjusted from the second gain according to the second preset step size. Adjust to the preset gain value. This avoids sudden changes in volume, thereby causing the volume of sound perceived by the user to gradually change, thereby improving the user experience.
  • adjusting the gain of each channel in the fourth audio data based on the second gain specifically includes: after the playback of the fourth audio data starts and at the same time as the playback of the fourth audio data starts. Within a first length of time apart from the first preset time, gradually adjust the gain of each channel in the fourth audio data to the second gain according to the first preset step size; and, after the fourth audio data is played, and with Within a second time period from the second preset time when the playback of the fourth audio data ends, the gain of each channel in the fourth audio data is gradually adjusted from the second gain to the preset gain value according to the second preset step size. This avoids sudden changes in volume, thereby causing the volume of sound perceived by the user to gradually change, thereby improving the user experience.
  • adjusting the gain of each channel in the fourth audio data specifically includes: before the fourth audio data starts playing, and at the same time as the fourth audio data starts playing. Within a first length of time apart from the first preset time, gradually adjust the gain of each channel in the fourth audio data to the second gain according to the first preset step size; and, before the playback of the fourth audio data ends, and with Within a second time period from the second preset time when the playback of the fourth audio data ends, the gain of each channel in the fourth audio data is gradually adjusted from the second gain to the preset gain value according to the second preset step size. This avoids sudden changes in volume, thereby causing the volume of sound perceived by the user to gradually change, thereby improving the user experience.
  • the present application provides a sound processing method.
  • the method may include: the first device obtains a first message sent by the second device, and the first message is sent when the second device needs to broadcast audio data; responding to the first message , the first device performs target processing on the audio data to be played, and plays the target-processed audio data.
  • the target processing is used to eliminate or reduce the target sound in the audio data; the first device obtains the second message sent by the second device , the second message is sent when the second device finishes broadcasting the audio data; in response to the second message, the first device stops target processing of the audio data to be played, and plays audio data without target processing.
  • Audio data played by the device For example, the audio data played occasionally can be audio data during a call, and the audio data played continuously can be a certain type of music.
  • this method can be applied in a home scenario.
  • the second device can be a mobile phone
  • the first device can be a smart speaker, a smart TV, etc.
  • the first device may be playing music, TV series, movies, etc.
  • the audio data that the second device needs to broadcast may be the audio data that the second device needs to play when the user uses the second device to make a call.
  • this method can also be applied in a driving scenario.
  • the second device can be a mobile phone and the first device can be a vehicle-mounted terminal.
  • the first device may be playing music, etc.
  • the audio data to be broadcast by the second device may be the audio data that the second device needs to play when the user uses the second device to navigate or make a call.
  • the target processing includes vocal elimination processing or vocal reduction processing.
  • the present application provides a sound processing method.
  • the method may include: obtaining environmental information associated with the target device.
  • the environmental information includes the target position of the target device in the target space, and at least one speaker is configured in the target space; determining The distance between the target device and N speakers to obtain N first distances, N is a positive integer, where N speakers are in the same space as the target device; based on the N first distances and N speakers, the target virtual Speaker group, the target virtual speaker group consists of M target virtual speakers, and the M target virtual speakers are located in the location where the target device is located.
  • the value of M is equal to the number of speakers required to construct spatial surround sound, and the arrangement of the M target virtual speakers is consistent with constructing spatial surround sound.
  • the required speakers are arranged in the same way, and each target virtual speaker is obtained by adjusting the gain of the audio signal corresponding to at least one speaker among the N speakers; according to the audio corresponding to the speaker among the N speakers and associated with the target virtual speaker
  • the gain that needs to be adjusted for the signal is adjusted to the gain of each channel in the original audio data to obtain the target audio data, where the target audio data matches the environmental information; the target audio data is output.
  • the position of the target electronic device in the space adjusts the gain of the audio signal output by each speaker in the space, so that the user can enjoy spatial surround sound anytime and anywhere.
  • the arrangement of the speakers required to build spatial surround sound may be the arrangement required in the requirements of 5.1.X or 7.1.X.
  • this method can be applied to the scenario described in Figure 9 or 10 below.
  • the target device may be the electronic device 100 in FIG. 10 .
  • one piece of audio data may, but is not limited to, include audio signals that each corresponding speaker needs to play.
  • each audio signal included in one audio data may correspond to one channel.
  • the target distance is the minimum value among the N first distances. This can virtualize the speakers to the area closest to the target device, improving the spatial surround sound effect.
  • constructing a target virtual speaker group based on N first distances and N speakers specifically includes: using the target distance as a benchmark, determining the number of speakers corresponding to each of the N speakers except the target speaker. The gain that needs to be adjusted for the audio signal to construct the first virtual speaker group.
  • the first virtual speaker group is a combination of speakers obtained by virtualizing N speakers to a circle with the target device as the center and the target distance as the radius.
  • the target speaker is the speaker corresponding to the target distance; according to the arrangement of the first virtual speaker group and the speakers required to build spatial surround sound, the target virtual speaker group is determined, where the center speaker in the target virtual speaker group is located at the current location of the target device. orientation within the preset angle range.
  • the target distance can be used as a benchmark and based on a preset gain calculation model, the target distance and the distance between each speaker except the target speaker and the target device can be processed to obtain the target distance except the target speaker.
  • the gain of the audio signal corresponding to each speaker needs to be adjusted, thereby constructing a first virtual speaker group.
  • a target virtual speaker group may be determined from the first virtual speaker group based on the arrangement of speakers required to build spatial surround sound.
  • the virtual speakers in the first virtual speaker group can be processed through the VBAP algorithm to construct a virtual speaker in the target virtual speaker group. speaker.
  • the method of determining the target virtual speaker group please refer to the description in FIG. 11 below.
  • constructing a target virtual speaker group based on N first distances and N speakers specifically includes: constructing the arrangement of speakers required for spatial surround sound based on N speakers and N first distances. method, the orientation of the target device, and the location of the target device, a first virtual speaker group is constructed.
  • the first virtual speaker group includes M first virtual speakers, and each first virtual speaker is adjusted by adjusting the The gain of the audio signal corresponding to at least one speaker is obtained; the second distance between the target device and each first virtual speaker is determined to obtain M second distances; the M first virtual speakers are virtualized to the location where the target device is located position as the center and one of the second distances as the radius, to obtain the target virtual speaker group.
  • a certain number of virtual speakers (that is, the number of speakers required to build spatial surround sound) can be determined first, and then these virtual speakers can be virtualized on the same circle to obtain the target virtual speaker group.
  • a certain number of virtual speakers that is, the number of speakers required to build spatial surround sound
  • the method before determining the distance between the target device and the N speakers, the method further includes: based on the speakers configured in the space where the target device is located, the orientation of the target device, the location of the target device, and Construct
  • N speakers are selected from the speakers configured in the space where the target device is located, and the N speakers are used to construct spatial surround sound. That is to say, you can first screen out the real speakers needed to build spatial surround sound, and then build the required virtual speakers from these real speakers.
  • the method of determining the target virtual speaker group please refer to the description in Figure 19 below.
  • the method further includes: determining the distance between the target device and each speaker in the target space; and determining, based on the distance between the target device and each speaker in the target space, whether each speaker in the target space is playing Delay time for audio data; control each speaker in the target space to play audio data according to the corresponding delay time. This allows you to control each speaker to play synchronously, improving the user experience.
  • the present application provides a sound processing method.
  • the method may include: obtaining environmental information associated with the target device.
  • the environmental information includes the target position of the picture generated by the target device in the target space.
  • At least one device is configured in the target space.
  • the volume of the virtual space is smaller than the volume of the target space; according to the position of each speaker in the target space, a target virtual speaker group is constructed in the virtual space.
  • the target virtual speaker group Including at least one target virtual speaker, and each target virtual speaker is obtained by adjusting the gain of an audio signal corresponding to a speaker in the target space; the adjustment is required according to the audio signal corresponding to the speaker in the target space and associated with the target virtual speaker.
  • the gain of each channel in the original audio data is adjusted to obtain the target audio data, where the target audio data matches the environmental information; the target audio data is output.
  • a virtual speaker group is constructed at the target position, and the audio data in the target device is controlled to be played by the virtual speaker group, so that the picture played by the target device Synchronize with audio data to improve the user's consistent listening and visual experience.
  • this method can be applied to the scenario described in Figure 20 below.
  • the target device may be the electronic device 100 in FIG. 20 .
  • the original audio data can be the audio data played by the user using the target device.
  • a target virtual speaker group is constructed in the virtual space according to the position of each speaker in the target space, which specifically includes: determining the target virtual speaker group in the virtual space according to the ratio between the virtual space and the target space. The position of each target virtual speaker in the speaker group; according to the distance between each target virtual speaker and the target speaker corresponding to each target virtual speaker, determine the gain that needs to be adjusted for the audio signal corresponding to each target speaker to obtain the target virtual speaker group , the target speaker is the speaker in the target space.
  • the method further includes: determining the distance between the picture produced by the target device and each speaker in the target space; determining the target space according to the distance between the picture produced by the target device and each speaker in the target space.
  • the delay time of each speaker in the target space when playing audio data control each speaker in the target space to play audio data according to the corresponding delay time. This allows you to control each speaker to play synchronously, improving the user experience.
  • the method may also include: selecting a distance as a reference distance from the determined distance between the picture produced by the target device and each speaker in the target space; and determining the distance of the picture produced by the target device based on the reference distance. Appearance time.
  • the reference distance may be the largest distance among the determined distances between the picture generated by the target device and each speaker in the target space.
  • the delay time of the sound generated by the generated picture relative to the sound generated by the speaker corresponding to the reference distance can be determined; then, the target device is controlled at the speaker corresponding to the reference distance.
  • the corresponding picture is displayed. For example, if the determined delay time is 3 seconds, the time when the speaker corresponding to the reference distance plays the corresponding audio data is t, then the time when the picture generated by the target device appears is (t+3).
  • the present application provides a sound processing method.
  • the method may include: obtaining status information of a user associated with the target device.
  • the user's status information includes the target distance between the target device and the target user's head, and the target user's status information.
  • the target position of the head in the target space, and at least one speaker is configured in the target space; according to the target distance, the target position and the position of each speaker in the target space, a target virtual speaker group is constructed, and the target virtual speaker group includes at least one target virtual speaker.
  • Speaker, each target virtual speaker is obtained by adjusting the gain of the audio signal corresponding to a speaker in the target space.
  • Each target virtual speaker is located on a circle with the target position as the center and the target distance as the radius; according to the target space and the gain that needs to be adjusted for the audio signal corresponding to the speaker associated with the target virtual speaker. Adjust the gain of each channel in the original audio data to obtain the target audio data, where the target audio data matches the user's status; output Target audio data.
  • a virtual speaker group is constructed around the target user and controls the audio in the target device. The data is played by the virtual speaker group, so that the picture and audio data played by the target device are synchronized, improving the user's hearing and visual consistency experience.
  • this method can be applied to the scenario described in Figure 24 below.
  • the target device may be the electronic device 100 in FIG. 24 .
  • the original audio data can be the audio data played by the user using the target device.
  • the method further includes: constructing a first virtual speaker group according to the target virtual speaker group, and the first virtual speaker group is The speaker group consists of M virtual speakers.
  • the M virtual speakers are located on a circle with the target position as the center and the target distance as the radius.
  • the value of M is equal to the number of speakers required to build spatial surround sound.
  • the arrangement method is the same as the arrangement method of speakers required to construct spatial surround sound.
  • Each of the M virtual speakers is obtained by adjusting the gain of the audio signal corresponding to at least one speaker in the target space.
  • the gain of each channel in the original audio data is adjusted to obtain the target audio data, which specifically includes:
  • the gain that needs to be adjusted for the audio signal corresponding to the speaker in the space and associated with the M virtual speakers is adjusted to the gain of each channel in the original audio data to obtain the target audio data.
  • the target virtual speaker group includes S virtual speakers, the S virtual speakers are the speakers required to build spatial surround sound, and each of the S virtual speakers is adjusted by N speakers.
  • the gain of the audio signal corresponding to at least one speaker in is obtained; determine the distance between the target position and each of the S virtual speakers to obtain S distances; virtualize the S virtual speakers to center on the target position, and On a circle with a radius of one of the S distances, the required virtual speaker group is obtained, as well as the gain that needs to be adjusted based on the audio signal corresponding to each real speaker determined in the process of constructing the required virtual speaker group. , adjust the original audio data to obtain the target audio data.
  • a certain number of virtual speakers (that is, the number of speakers required to build spatial surround sound) can be determined first, and then these virtual speakers can be placed on the same circle virtually to obtain the required virtual speaker group; Finally, the original audio data can be adjusted based on the gain that needs to be adjusted for the audio signal corresponding to each real speaker determined in the process of building the required virtual speaker group to obtain the target audio data.
  • the method may also include: based on the target distance, the target position, the position of each speaker in the target space, and the arrangement of the speakers required to build spatial surround sound, from the space where the target device is located.
  • N speakers are selected from the configured speakers, and the N speakers are used to build spatial surround sound; based on the N speakers, the required virtual speaker group is determined, and based on each real speaker determined in the process of building the required virtual speaker group The gain corresponding to the audio signal of the speaker needs to be adjusted, and the original audio data is adjusted to obtain the target audio data.
  • the real speakers needed to build spatial surround sound can be screened out first, and then the required virtual speakers can be constructed from these real speakers; finally, the required virtual speakers can be determined based on the process of building the required virtual speaker group.
  • the gains that need to be adjusted for the audio signals corresponding to the N real speakers are adjusted to the original audio data to obtain the target audio data.
  • the present application provides a sound processing method, which may include: obtaining environmental information associated with a target device, where the target device is located in a vehicle, and the environmental information includes the vehicle's driving speed, rotational speed, and accelerator pedal opening.
  • the environmental information includes the vehicle's driving speed, rotational speed, and accelerator pedal opening.
  • the target audio particles in the data are obtained by performing telescopic transformation; according to the driving speed, the acceleration of the vehicle is determined, and based on the acceleration, the gain of each channel in the first audio data is adjusted to obtain the second audio data, and the sound field in the vehicle is determined
  • the target speed moving in the target direction according to the target speed, determine the virtual position of the sound source of the target audio data; according to the virtual position, determine the target gains that need to be adjusted for the audio signals corresponding to the
  • this method can be applied to the scenario of “controlling the acceleration of new energy vehicles” described below.
  • the target device may be a vehicle or an electronic device in the vehicle.
  • the target device can be a device integrated in the vehicle, such as a vehicle-mounted terminal, or a device separated from the vehicle, such as a driver's mobile phone.
  • the method before adjusting the gain of each channel in the first audio data according to the driving speed, the method further includes: determining that the change value of the driving speed exceeds a preset speed threshold; and/or determining that the first audio
  • the adjustment value corresponding to the gain of each channel in the data is less than or equal to the preset adjustment value.
  • the target adjustment value corresponding to the gain of the target channel in the first audio data is greater than the preset adjustment value, the target adjustment value is Updated to default adjustment value. This prevents the user from hearing sounds that are suddenly louder or softer, or sudden changes in the sound, and improves the user experience.
  • the target parameter also includes the acceleration duration of the vehicle
  • the method further includes: controlling the operation of the ambient light in the vehicle according to the acceleration duration.
  • the color change speed of the ambient light can also be controlled to be the same as the target speed of the sound field movement in the vehicle, so that the spatial hearing and spatial visual perception in the vehicle correspond to each other, improving the user experience.
  • the present application provides a sound processing method.
  • the method may include: obtaining status information of a user associated with the target device, where the status information includes the user's fatigue level; and determining a target adjustment value of the first characteristic parameter based on the fatigue level.
  • the first characteristic parameter is the characteristic parameter of the original audio data that currently needs to be played, and the first characteristic parameter includes pitch and/or loudness; according to the target adjustment value, the original audio data is processed to obtain the target audio data, where the target audio The value of the characteristic parameter of the data is higher than the value of the first characteristic parameter, and the target audio data matches the user's status information; the target audio data is output.
  • the target device can be a vehicle or an electronic device in the vehicle.
  • the target device can be a device integrated in the vehicle, such as a vehicle-mounted terminal, or a device separated from the vehicle, such as a driver's mobile phone.
  • the original audio data may be the audio data of the navigation sound to be played.
  • outputting the target audio data specifically includes: determining the first target prompt sound according to the fatigue level; and outputting the target audio data and the first target prompt sound according to a preset broadcast sequence. This will further impact the user's hearing, make the broadcasting method and language more life-like and humane, and improve the user experience.
  • the first target prompt voice may be the prompt voice shown in "Table 2" below.
  • the method further includes: determining a second target prompt sound based on the fatigue level and map information; and outputting the second target prompt sound.
  • This further creates an auditory impact on the user, thereby increasing the user's attention.
  • the second target prompt voice may be "Attention! Attention! The driver is extremely tired and can stop and rest at xxx intersection/supermarket/transfer station xxx meters away.”
  • the target device is located in a vehicle.
  • the method before outputting the target audio data, the method also includes: determining that the vehicle is in an autonomous driving state and that the road condition of the road section where the vehicle is located is lower than a preset road condition threshold, and/or determining that the road section where the vehicle is located is a preset road section. .
  • the user's attention can be improved under certain conditions.
  • the method further includes: determining the flashing frequency and/or color of the warning light according to the fatigue level, and controlling the warning light to work according to the determined flashing frequency and/or color. This gives the user a visual impact and thereby increases the user's attention.
  • the present application provides a sound processing method, which may include: obtaining status information of a user associated with the target device, where the status information includes first audio data and second audio data selected by the user; determining the first audio data
  • the first audio feature includes: the loudness at each moment and/or the position point of each beat; according to the first audio feature, adjust the second audio feature of the second audio data to obtain the third audio data
  • the second audio feature includes at least one of loudness, pitch and sound speed; target audio data is obtained according to the first audio data and the third audio data, wherein the target audio data matches the user's status information; and the target audio data is output.
  • this method can be applied to the scenario of "the user selects multiple types of audio data to be overlaid and played" described below.
  • the first audio data may be background sound
  • the second audio data may be white noise.
  • the first audio feature includes: loudness at each moment of the first audio data
  • the second audio feature includes loudness.
  • Adjusting the second audio characteristics of the second audio data according to the target audio characteristics specifically includes: determining the target loudness corresponding to each moment in the second audio data according to the loudness and the preset loudness ratio at each moment in the first audio data; The loudness at each moment in the second audio data is adjusted to the target loudness corresponding to each moment in the second audio data. In this way, the loudness of each moment in the two audio data matches the preset loudness ratio, so that the two can be naturally blended together.
  • the target audio features include: position points of each beat
  • the second audio features include pitch and/or sound speed. Adjusting the pitch of the second audio data according to the target audio characteristics specifically includes: for any two adjacent beats in the first audio data, determining the target corresponding to any two adjacent beats based on the two adjacent beats. Rhythm; according to the target rhythm, determine the second audio data of the second audio data within the position points corresponding to any two adjacent beats. The target adjustment value of the feature; according to the target adjustment value, the second audio feature of the second audio data within the position points corresponding to any two adjacent beats is adjusted.
  • the audio characteristics of the second audio data can match the rhythm of the first audio data, so that the two can be naturally blended together.
  • this application provides a sound processing method.
  • the method may include: obtaining the status information of the user associated with the target device.
  • the user's status information includes one or more of the following: pictures, videos selected by the user, or, Audio data added by the user for the target object; Determine N pictures, N ⁇ 2; Determine the target objects contained in each of the N pictures, to obtain M target objects, M ⁇ 1; Determine each target object in N
  • the spatial position of each picture in the picture, and the duration of each target object appearing in the target video are determined to obtain M first durations.
  • the target video is obtained based on N pictures; according to the spatial position of each target object, and N pictures The moment when each adjacent picture in the picture appears in the target video, determine the moving speed of each target object between each adjacent picture; according to the M target objects, obtain Q first audio data, 1 ⁇ Q ⁇ M, Among them, one first audio data is associated with at least one target object; the second duration of each first audio data is adjusted to be equal to the first duration corresponding to the corresponding target object, so as to obtain Q second audio data; according to The spatial position of each target object, and the moving speed of each target object between adjacent pictures, the second audio data corresponding to each target object are processed respectively to obtain Q third audio data; according to the Q third audio data Audio data and N pictures are used to obtain the target video, where the target video includes target audio data, and the target audio data is obtained based on Q third audio data, where the target audio data matches the user's status information; the target audio data is output .
  • this method can be applied to the scenario of “making videos or dynamic pictures” described below.
  • the duration of the target video can be calculated by playing a picture for a fixed time, or can be obtained by the duration of a selected piece of audio data.
  • the method further includes: determining fourth audio data matching the N pictures based on the N pictures; and using the position points of at least part of the beats in the fourth audio data as at least part of the N pictures.
  • the moment when the picture appears, and/or the starting or ending point of at least a part of the section in the fourth audio data is used as the moment when at least part of the N pictures appear.
  • the moment when at least some of the N pictures appear can be consistent with the position points of certain beats or the position points of certain sections, so that visual impact changes are presented at key points of the listening sense, that is, in the listening sense Users can view the pictures at key points, thereby creating a consistent audio-visual impact and thus improving the user experience.
  • determining the spatial position of each target object in each of the N pictures specifically includes: for the k-th target object in the i-th picture, based on the preset three-dimensional coordinate system, determine The first spatial position of the k-th target object in the i-th picture, where the center point of the three-dimensional coordinate system is the center position of the i-th picture, the i-th picture is any picture in the N pictures, and the k-th The target object is any target object in the i-th picture.
  • the method further includes: determining that the k-th target object does not exist in the (i+1)th picture; changing the first position on the first boundary of the (i+1)th picture, As the second spatial position of the k-th target object in the (i+1)-th picture. This is to avoid the sudden disappearance of the sound of the k-th target object in the (i+1)-th picture.
  • the first boundary is the boundary of the target direction of the k-th target object in the i-th picture, and the first position is at the first spatial position in the (i+1)-th picture.
  • the method further includes: determining that the k-th target object does not exist in the (i+2)-th picture; According to the first spatial position, the second spatial position, and the time interval between the i-th picture and the (i+1)-th picture, determine the first moving speed and first moving direction of the k-th target object; The second position outside the i+2) picture is used as the third spatial position of the k-th target object in the (i+2)-th picture; where the second position is in the first movement direction and is the same as The second spatial position in the (i+2)-th picture is a distance from the first target, the first target distance is based on the first moving speed, and the (i+1)-th picture and the (i+2)-th picture The time interval between pictures is obtained. As a result, the sound of the k-th target object gradually moves away in the target direction instead of disappearing suddenly, which improves the user experience.
  • the method further includes: determining that the k-th target object does not exist in the (i-1)th picture, where i ⁇ 2; converting the second boundary of the (i-1)th picture The third position on is used as the fourth spatial position of the k-th target object in the (i-1)-th picture. This is to avoid the sudden appearance of the sound of the k-th target object in the i-th picture.
  • the second boundary is the boundary of the k-th target object in the opposite direction of the target direction in the i-th picture
  • the third position is in the first space in the (i-1)-th picture.
  • the position is the starting point and the intersection point of the straight line extending in the opposite direction of the target direction and the second boundary.
  • the method further includes: determining that the k-th target object does not exist in the (i-2)th picture, where i ⁇ 3; according to the first spatial position, the fourth spatial position, and the The time interval between the i picture and the (i-1)th picture determines the second moving speed and second moving direction of the kth target object; the fourth position outside the (i-2)th picture, As the fifth spatial position of the k-th target object in the (i-2)th picture; where the fourth position is in the opposite direction of the second movement direction and is the same as the fifth spatial position in the (i-2)th picture.
  • the fourth spatial position is a position point that is a distance from the second target.
  • the second target distance is obtained based on the second moving speed and the time interval between the (i-1)th picture and the (i-2)th picture.
  • the method also includes: determining that the k-th target object does not exist in the (i+1)-th picture to the (i+j)-th picture, j ⁇ 2, and the (i+ There is the k-th target object in the j+1) picture, (i+j+1) ⁇ N; based on the i-th picture, determine whether the k-th target object is in the (i+1)-th picture to the (i+1)-th picture The spatial position in each of the (i+j) pictures to obtain the first spatial position set ⁇ P i+1 ,...,P i+j ⁇ , where P i+j is the k-th target The spatial position of the object in the (i+j)th picture, and, based on the (i+j+1)th picture, determine the position of the k-th target object in the (i+1)th picture to (i+1)th picture.
  • the spatial position specifically includes: according to the first spatial set and the second spatial set, respectively determine the two locations of the k-th target object in each picture from the (i+1)th picture to the (i+j)th picture.
  • the picture is a picture corresponding to one of j distances, 1 ⁇ c ⁇ j; according to the spatial position of the k-th target object in the i-th picture, the k-th target object is in the (i+j+1)-th picture.
  • the spatial position in the picture, the spatial position of the k-th target object in the (i+c)-th picture, and the position of each picture from the i-th picture to the (i+j+1)-th picture in the target video At the moment when it appears, determine the spatial position in each picture between the i-th picture of the k-th target object and the (i+c)-th picture, and determine the spatial position of the k-th target object from the (i+c)-th picture to the (i+c)-th picture.
  • the present application provides an electronic device, including: at least one memory for storing a program; at least one processor for executing the program stored in the memory; wherein, when the program stored in the memory is executed, the processor uses The method provided in any one of the first to tenth aspects is performed.
  • the present application provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program When the computer program is run on an electronic device, the electronic device causes the electronic device to perform the tasks provided in the first to tenth aspects. methods provided in any aspect.
  • the present application provides a computer program product, which when the computer program product is run on an electronic device, causes the electronic device to execute the method provided in any one of the first to tenth aspects.
  • the present application also provides a chip, including a processor, the processor being coupled to a memory and configured to read and execute program instructions stored in the memory, so that the chip implements the above-mentioned first step.
  • a chip including a processor, the processor being coupled to a memory and configured to read and execute program instructions stored in the memory, so that the chip implements the above-mentioned first step.
  • the method provided in any one of aspects 1 to 10 . It can be understood that for the beneficial effects of the eleventh to fourteenth aspects above, reference can be made to the relevant descriptions in the first to tenth aspects above, and will not be described again here.
  • Figure 1 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • Figure 2 is a schematic flowchart of a sound processing method provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of a display interface of an electronic device provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • Figure 5 is a schematic flowchart of a sound processing method provided by an embodiment of the present application.
  • Figure 6 is a time domain waveform schematic diagram and envelope schematic diagram of audio data provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of a spectrogram obtained by performing short-time Fourier transform on audio data according to an embodiment of the present application
  • Figure 8 is a schematic flowchart of a sound processing method provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • Figure 10 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • Figure 11 is a schematic flow chart of a sound processing method provided by an embodiment of the present application.
  • Figure 12 is a schematic diagram of the orientation of an electronic device provided by an embodiment of the present application.
  • Figure 13 is a schematic diagram of constructing a virtual speaker provided by an embodiment of the present application.
  • Figure 14 is a schematic diagram of a process of constructing a virtual speaker provided by an embodiment of the present application.
  • Figure 15 is a schematic diagram of a process of constructing a virtual speaker group provided by an embodiment of the present application.
  • Figure 16 is a schematic diagram of another process of constructing a virtual speaker group provided by an embodiment of the present application.
  • Figure 17 is a schematic flow chart of a sound processing method provided by an embodiment of the present application.
  • Figure 18 is a schematic diagram of a process of constructing a virtual speaker provided by an embodiment of the present application.
  • Figure 19 is a schematic flowchart of yet another sound processing method provided by an embodiment of the present application.
  • Figure 20 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • Figure 21 is a schematic diagram of a three-point positioning provided by an embodiment of the present application.
  • Figure 22 is a schematic flowchart of a sound processing method provided by an embodiment of the present application.
  • Figure 23 is a schematic diagram of constructing a virtual space provided by an embodiment of the present application.
  • Figure 24 is a schematic flowchart of a sound processing method provided by an embodiment of the present application.
  • Figure 25 is a schematic diagram of constructing a virtual speaker group in a virtual space according to an embodiment of the present application.
  • Figure 26 is a schematic flow chart of a sound processing method provided by an embodiment of the present application.
  • Figure 27 is a schematic flowchart of a sound processing method provided by an embodiment of the present application.
  • Figure 28 is a schematic flowchart of a sound processing method provided by an embodiment of the present application.
  • Figure 29 is a schematic flowchart of a sound processing method provided by an embodiment of the present application.
  • Figure 30 is a schematic diagram of the hardware structure of a vehicle provided by an embodiment of the present application.
  • Figure 31 is a schematic flow chart of a sound processing method provided by an embodiment of the present application.
  • Figure 32 is a schematic diagram of sound field movement provided by an embodiment of the present application.
  • Figure 33 is a schematic diagram of sound field movement provided by an embodiment of the present application.
  • Figure 34 is a schematic diagram in which the color of the ambient light in a vehicle gradually changes along with the acceleration duration of the vehicle according to an embodiment of the present application;
  • Figure 35 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • Figure 36 is a schematic flowchart of a sound processing method provided by an embodiment of the present application.
  • Figure 37 is a schematic diagram of a process for processing sound with variable speed and constant pitch according to an embodiment of the present application.
  • Figure 38 is a schematic flow chart of a sound processing method provided by an embodiment of the present application.
  • Figure 39 is a schematic flowchart of a sound processing method provided by an embodiment of the present application.
  • Figure 40 is a schematic flowchart of a sound processing method provided by an embodiment of the present application.
  • Figure 41 is a schematic flowchart of a sound processing method provided by an embodiment of the present application.
  • Figure 42 is a schematic flowchart of a sound processing method provided by an embodiment of the present application.
  • Figure 43 is a schematic diagram of adjusting the moment when the picture appears to the position point of the beat provided by an embodiment of the present application.
  • Figure 44 is a schematic diagram of determining the spatial position of a target object in a picture provided by an embodiment of the present application.
  • Figure 45 is a schematic diagram of determining the spatial position of a target object in a picture provided by an embodiment of the present application.
  • Figure 46 is a schematic flowchart of a sound processing method provided by an embodiment of the present application.
  • Figure 47 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the present application.
  • Figure 48 is a software structure block diagram of an electronic device provided by an embodiment of the present application.
  • a and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. these three situations.
  • the symbol "/" in this article indicates that the associated object is or, for example, A/B means A or B.
  • first, second, etc. in the description and claims herein are used to distinguish different objects, rather than to describe a specific order of objects.
  • first response message and the second response message are used to distinguish different response messages, but are not used to describe a specific sequence of response messages.
  • multiple refers to two or more, for example, multiple processing units refers to two or more processing units, etc.; multiple Component refers to two or more components wait.
  • embodiments of the present application provide a sound processing method, which can process original audio data based on external information input to construct audio data to be played.
  • the method can construct audio data to be played that is adapted to the current environment or the status of the current user based on the environmental information associated with the electronic device and/or the user's status information, so that the audio data to be played can be consistent with the current environment or the current user's status.
  • the current environment or the current user's status is integrated to enhance the user experience.
  • the required audio data when constructing the audio data to be played that is adapted to the current environment or the current user's status, can be obtained by adjusting the audio characteristics of the audio data to be played (such as gain, pitch or loudness, etc.) , and/or, obtain the required audio data by combining the audio data of the target object adapted to the current environment.
  • the method can construct audio data to be played that is adapted to the captured image or video based on information related to the image or video captured by the electronic device.
  • the environmental information associated with the electronic device may include one or more of the following: environmental data of the area where the electronic device is located (such as environmental images, environmental sounds, weather information or seasonal information, etc.), the location of the electronic device. Whether different audio data needs to be played simultaneously in the environment, the position of the electronic device in space, the position of the picture generated by the electronic device in space, or, when the electronic device is located in the vehicle, the driving parameters of the vehicle (such as: driving speed, etc.), etc.
  • the user's status information associated with the electronic device may include one or more of the following: the user's fatigue level, the distance between the electronic device and the user's head and the position of the user's head in space, audio data selected by the user , or, user-selected pictures or videos, etc.
  • the sound processing method mainly involves the following scenarios:
  • the audio of each sound object that is adapted to the current environment can be determined from the pre-configured atomic database of white noise through the electronic device in the vehicle and combined with the environmental data of the area where the electronic device is located. data. And, the determined audio data of each sound object can be synthesized to obtain target audio data, and the target audio data can be played. In this way, the driver or other users in the vehicle can hear sounds that match the external environment, allowing users to have an immersive experience.
  • the atomic database of white noise can be configured with the audio data of each single object within a specific period of time, such as the audio data of water flow, the audio data of cicadas, the audio data of vegetation, etc.
  • the audio data to be played can be constructed according to the environmental information associated with the electronic device, where the environmental information associated with the electronic device can be environmental data of the area where the electronic device is located.
  • This scenario can include two scenarios.
  • the first scenario is that the audio data played continuously and the audio data played occasionally are played through the same electronic device.
  • electronic devices can be used to perform vocal cancellation or vocal reduction processing on the audio data that is played continuously, and the audio data that is played occasionally and the processed audio data that needs to be played continuously can be broadcast at the same time. .
  • the audio data that is played continuously can be a certain type of music
  • the audio data that is played occasionally can be navigation audio data that needs to be broadcast during navigation.
  • the second scenario is that the audio data played continuously and the audio data played occasionally are played through different electronic devices.
  • one electronic device (hereinafter referred to as the "first device") can continuously play one type of audio data, and another electronic device can sporadically play another type of audio data.
  • the electronic device when the electronic device that plays audio data sporadically needs to play audio data, the electronic device can instruct the electronic device that plays audio data continuously to perform a human voice cancellation or voice reduction operation; and when the electronic device that plays audio data sporadically needs to play audio data After the broadcast of the electronic device ends, the electronic device can Instructs an electronic device that continuously plays audio data to stop performing a vocal cancellation or vocal reduction operation.
  • Audio data played by the device For example, the audio data played occasionally can be audio data during a call, and the audio data played continuously can be a certain type of music.
  • the audio data to be played can be constructed according to the environmental information associated with the electronic device.
  • the environmental information associated with the electronic device can be whether different audio data needs to be played simultaneously in the environment where the electronic device is located.
  • This scenario can include two scenarios.
  • the first scenario may be: multiple speakers are configured in the space, and at least some of the speakers are arranged according to certain requirements (such as 5.1.X, or 7.1.X, etc.).
  • the electronic device or other device is using the speaker to play audio data.
  • the gain of the audio signals output by each speaker can be adjusted based on the location of the electronic device, so that users can enjoy spatial surround sound anytime and anywhere.
  • the audio data to be played can be constructed according to the environmental information associated with the electronic device, where the environmental information associated with the electronic device can be the location of the electronic device in space.
  • the second scenario may be: multiple speakers are configured in the space, and the electronic device can generate pictures (for example, the user uses the electronic device to watch movies, etc.), and the electronic device plays the audio data thereon through the speakers arranged in the space.
  • a virtual speaker group can be constructed around the electronic device or the picture generated by the electronic device based on the location of the electronic device, so that the audio data in the electronic device can be played by the virtual speaker group. Synchronizes the picture and audio data played by electronic devices, improving the user's consistent listening and visual experience.
  • the audio data to be played can be constructed based on the environmental information associated with the electronic device or the user's status information, where the environmental information associated with the electronic device can be the position in space of the picture generated by the electronic device; the user The status information includes the distance between the electronic device and the user's head, the position of the user's head in space, etc.
  • new energy vehicles refer to vehicles that use unconventional vehicle fuels as power sources (or use conventional vehicle fuels and adopt new vehicle-mounted power devices).
  • the audio data to be played can be constructed based on the environmental information associated with the electronic device, where the environmental information associated with the electronic device can be the driving parameters of the vehicle.
  • the characteristic parameters (such as pitch, gain, etc.) of the audio data broadcast by the navigation can be changed according to the driver's fatigue level, so that the played audio data can be auditory correct.
  • the driver generates impact, thereby improving the driver's attention and achieving safe driving.
  • the audio data to be played can be constructed according to the user's status information associated with the electronic device, where the user's status information associated with the electronic device can be the user's fatigue level.
  • the user selects a scenario where multiple audio data are overlaid and played.
  • other audio data selected by the user can be modified based on at least one type of audio data selected by the user, so that the two can be more naturally integrated. to provide users with a better listening experience.
  • the audio data selected by the user may include background sound, white noise, etc.
  • the audio data to be played can be constructed according to the user's status information associated with the electronic device, where the user's status information associated with the electronic device can be audio data selected by the user.
  • the audio data to be played can be constructed based on the user's status information associated with the electronic device.
  • the audio data is the audio data of the target object in the completed video or dynamic picture.
  • the user's status information associated with the electronic device may be pictures, videos selected by the user, and/or audio data added for the target object.
  • Figure 1 shows an application scenario in some embodiments of the present application.
  • driver A is located in vehicle 200 .
  • the electronic device 100 and the speaker 230 are configured in the vehicle 200, and the electronic device 100 is in a powered-on state.
  • the electronic device 100 may be a device integrated in the vehicle 200, such as a vehicle-mounted terminal, or may be a device separated from the vehicle 200, such as driver A's mobile phone, etc., which is not limited here.
  • the electronic device 100 can directly use the speaker 230 in the vehicle 200 to broadcast the audio data it needs to broadcast.
  • the connection between the electronic device 100 and the vehicle 200 may be established through, but is not limited to, short-range communication (such as Bluetooth, etc.).
  • the electronic device 100 can transmit the audio data it needs to broadcast to the vehicle 200 and broadcast it through the speaker 230 on the vehicle 200 , or the electronic device 100 can transmit it through the speaker 230 on the vehicle 200 .
  • the built-in speaker broadcasts the audio data it needs to broadcast.
  • an image collection device 210 such as a camera may be provided outside the vehicle 200 to collect images of the environment outside the vehicle 200 .
  • a pickup 220 for collecting sounds in the environment, such as a microphone, may also be provided outside the vehicle 200 .
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the vehicle 200 .
  • the vehicle 200 may include more or fewer components than shown in the figures, or some components may be combined, or some components may be separated, or may be arranged differently.
  • Figure 2 shows a sound processing method.
  • the electronic device 100 may be a device integrated in the vehicle 200 , such as a vehicle-mounted terminal, or may be a device separated from the vehicle 200 , such as driver A's mobile phone.
  • the method shown in Figure 2 can be, but is not limited to, applied to driving scenes, such as driving scenes, or outdoor camping scenes, such as camping scenes in valleys or lakeside.
  • the electronic device 100 in Figure 2 may be, but is not limited to, provided with a control for initiating execution of the method.
  • the name of the control may be "Camping Mode".
  • Figure 2 may be executed.
  • the method shown in As shown in Figure 2, the method includes the following steps:
  • the electronic device 100 obtains environmental data of the area where the vehicle 200 is located.
  • the environmental data includes one or more of environmental images, environmental sounds, weather information or seasonal information.
  • the image collection device 210 on the vehicle 200 can collect environmental images of the area where the vehicle 200 is located in real time or periodically, and transmit the collected data to the electronic device 100 .
  • the pickup 220 on the vehicle 200 can collect environmental sounds in the area where the vehicle 200 is located in real time or periodically, and transmit the collected data to the electronic device 100 .
  • the electronic device 100 can obtain weather information and/or seasonal information of the area where the vehicle 200 is located through the network in real time or periodically.
  • the electronic device 100 determines each currently required sound object based on the environment data.
  • the electronic device 100 can input environmental data into a pre-trained sound object detection model, so that the sound object detection model outputs each currently required sound object.
  • the sound object detection model may be, but is not limited to, trained based on a convolutional neural network (CNN).
  • the vehicle 200 when the vehicle 200 is driving on a road in the woods, and it is daytime and the weather is clear, it can be determined from the environment image that the vehicle 200 is in the woods, and it can be determined from the environment sound that there is the sound of birds chirping in the current environment.
  • the weather information can determine that the current weather is sunny and daytime. In this way, the individual sound objects determined are trees, birdsong, daytime, and clear.
  • a sound theme adapted to the environmental data can also be determined based on the environmental data. Then, use the sound object contained in the sound theme as the currently required sound object.
  • Each sound theme includes at least one sound object associated with the sound theme.
  • the sound theme can be "cicadas chirping on a summer night”
  • the sound objects included in this sound theme can include “cicadas chirping", “night and clear”, “breeze", and “flowing water”
  • the sound theme It can also be "Heavy Rain on a Summer Night”.
  • the sound objects included in this sound theme can include “Storm Wind”, “Heavy Rain”, and “Thunder”.
  • the electronic device 100 determines the audio data of each sound object from the atomic database of white noise based on each sound object.
  • the electronic device 100 can query the atomic database of white noise, thereby acquiring the audio data of each sound object within a specific period of time.
  • the atomic database of white noise is configured with the audio data of each single object within a specific period of time, such as the audio data of water flow, the audio data of cicadas, the audio data of vegetation, etc. Audio data of a certain duration can be obtained by randomly combining the audio data of multiple objects in the atomic database or combining them according to preset rules.
  • the white noise audio data in the atomic database can be configured in the vehicle in advance, or obtained from the server in real time.
  • the atomic database may include audio data of a sound object in different time periods, and the audio data in different time periods may have different emotions. For example, when the sound object is a bird call, the atomic database may include a cheerful bird call and a sad bird call.
  • the audio data of each sound object when determining the audio data of each sound object, based on the current environment data, the audio data of each sound object that is adapted to the emotion expressed by the current environment data can be determined. For example, when the weather is sunny, it can be determined that the emotion expressed by the current environmental data is happiness. At this time, the audio data in each sound object currently required can be filtered out from the atomic database, and the emotions expressed by these audio data are all hapiness.
  • the electronic device 100 synthesizes the audio data of each sound object to obtain target audio data, and plays the target audio data.
  • the electronic device 100 can synthesize the audio data of each sound object to obtain target audio data, and play the target audio data.
  • the electronic device 100 plays the target audio data, it may be played through the speaker of the vehicle 200 . In this way, the driver can hear sounds in the vehicle that match the external environment, allowing users to have an immersive experience.
  • audio data of each sound object may be mixed using a mixing algorithm to obtain target audio data.
  • a mixing algorithm suitable for the type can be selected for processing.
  • each audio data can be directly superimposed and mixed to obtain the target audio data.
  • mixing algorithms such as adaptive weighted mixing algorithm and linear superposition averaging can be used to process each audio data to obtain the target audio data.
  • the number of times of mixing can be selected during the mixing process based on the type of sound object. For example, for sound objects such as cicadas and birds, their sounds are relatively short, so during the mixing process, the audio data of these sound objects can be input multiple times at random times for mixing processing.
  • the duration of the corresponding audio data when the duration of the corresponding audio data is long enough, it can be input once during the mixing process; when the duration of the corresponding audio data is short, it can be input during the mixing process. Multiple times, and two adjacent audio data are connected end to end, that is, the playback end time of the first audio data is the playback start time of the second audio data, thereby obtaining a sufficient duration of noise-like sound.
  • the electronic device 100 may display to the user the identification of each sound object that makes up the target audio data, as well as the identification of the sound object that the user can currently add. In this way, users can choose to add sound objects or delete sound objects according to their own needs. For example, as shown in FIG. 3 , the electronic device 100 may display the currently played sound object (ie, the sound object that constitutes the target audio data) at the control 31 , and display the addable sound object at the control 32 . Continuing to refer to FIG. 3 , the user can select to delete the sound object in the sub-control 33 of the control 31 , and/or select to add the sound object in the sub-control 34 of the control 32 .
  • the currently played sound object ie, the sound object that constitutes the target audio data
  • the electronic device 100 can re-synthesize the sound objects that the user selects and needs to play to obtain the audio that the user needs. data. For example, continuing to refer to Figure 3, when the user deletes "Beep”, “Bird Call”, and “Breeze” and chooses to add “Falling Rocks” and “Storm Wind”, the sound object that the user expects to play at this time is: “ Clear day”, “rustling leaves”, “strong wind”, “flowing water”, “falling rocks”.
  • the electronic device 100 can synthesize the audio data of the sound objects that the user desires to play (i.e., “clear day”, “rustling leaves”, “strong wind”, “flowing water”, “falling rocks”) to Get new target audio data and play it.
  • the audio data of the sound objects that the user desires to play i.e., "clear day”, “rustling leaves”, “strong wind”, “flowing water”, “falling rocks”
  • the electronic device 100 may determine whether to transparently transmit the environmental sound based on the set transparent transmission policy.
  • transparently transmitting environmental sounds can be understood as playing environmental sounds.
  • the transparent transmission strategy may include: isolating all environmental sounds, isolating part of the environmental sounds, or not isolating any of the environmental sounds.
  • the transparent transmission strategy can be selected by the user.
  • the electronic device 100 can be provided with a mechanical button or a virtual button for selecting the transparent transmission strategy, and the user can select according to his or her own needs.
  • the transparent transmission strategy can also be determined by the electronic device 100.
  • the transparent transmission strategy that the electronic device 100 can choose can be to isolate all environmental sounds; when the environmental noise is greater than the second noise value, When the noise value is smaller than the first noise value, the transparent transmission strategy that the electronic device 100 can choose can be to isolate part of the sound in the environmental sound; when the environmental noise is smaller than the second noise value, the transparent transmission strategy that the electronic device 100 can choose can be To avoid isolating ambient sounds.
  • the electronic device 100 can discard the environmental sounds, that is, not play the environmental sounds.
  • the electronic device 100 can input the environmental sound to a pre-trained sound separation model, so that the sound separation model can extract the corresponding information of each sound object contained in the environmental sound. audio data. After the electronic device 100 obtains the audio data corresponding to each sound object contained in the environmental sound, it can discard a part of the audio data corresponding to the sound objects, and combine the remaining audio data corresponding to the sound objects with the previously determined audio data of each sound object.
  • the audio data is synthesized to obtain the target audio data, and the target audio data is played, thereby integrating the audio data in the real environment with the audio data determined from the atomic database, allowing the user to experience the external environment more realistically.
  • the electronic device 100 can determine a sound theme adapted to the environmental data according to the environmental data.
  • Each sound theme contains at least one sound object associated with the sound theme.
  • the electronic device 100 may discard the audio data corresponding to the sound object.
  • the electronic device 100 may retain the audio data corresponding to the sound object. For example, if the determined sound theme is "summer night cicada chirping", the sound objects included in this sound theme include “cicada chirping", "night and clear", “breeze", and "flowing water”.
  • the electronic device 100 can retain the audio data corresponding to the "chirping cicadas” in the environmental sounds and discard the audio data corresponding to the "falling stones” in the environmental sounds.
  • the electronic device 100 can adjust the gain of each channel in the audio data corresponding to the extracted sound object. For example, when the extracted audio data of the sound object is wind sound, the electronic device 100 may increase the loudness of the wind sound.
  • the electronic device 100 can also mark the sound object corresponding to each audio data. At the same time, the electronic device 100 can eliminate objects that are the same as the marked sound objects at this time from the previously determined currently required sound objects. This avoids subsequent synthesis of similar audio data and improves the quality of the synthesized audio data. For example, when the currently required sound objects determined above are: trees, bird calls, daytime and clear, and the sound objects corresponding to the audio data extracted from environmental sounds are bird calls, the electronic device 100 can change the aforementioned sound objects. Determine the current desired sound object in the "birdsong" culling.
  • the electronic device 100 may also determine whether the audio data amplitude corresponding to the sound object in the environmental sound meets the requirements. When the requirements are met, a certain sound object determined above can be eliminated, otherwise, the audio data corresponding to a certain sound object determined above is retained, and the audio data corresponding to the sound object in the environmental sound is eliminated, or, The audio data corresponding to the sound object in the environmental sound is adjusted to meet the requirements, and a certain sound object determined above is eliminated.
  • the electronic device 100 determines that the amplitude of the audio data corresponding to the "cicada chirping" that can be extracted from the environmental sound is lower than the preset value, the electronic device 100 can discard the fact that the "cicada chirping" can be extracted from the environmental sound.
  • the electronic device 100 can adjust the amplitude of the audio data corresponding to the "cicada chirping" that can be extracted from the environmental sound, so that its amplitude is higher than the preset value, and discards the previously determined Audio data corresponding to the sound object. If the electronic device 100 determines that the amplitude of the audio data corresponding to the "cicada chirping" that can be extracted from the environmental sound is higher than the preset value, the electronic device 100 can retain the audio data corresponding to the "cicada chirping" that can be extracted from the environmental sound. data, and discard the audio data corresponding to the previously determined sound object.
  • the electronic device 100 can synthesize the environmental sounds and the audio data of each sound object determined above to obtain target audio data, and play the target audio data.
  • Audio data played continuously and audio data played occasionally are played through the same electronic device.
  • Figure 4 shows an application scenario in some embodiments of the present application.
  • driver A when driver A drives vehicle 200 to a destination, driver A can navigate to the destination using the electronic device 100 located in vehicle 200 .
  • driver A can use the electronic device 100 to play music. That is to say, navigation-related software (such as Google etc.), and, software related to playing music (such as wait).
  • the electronic device 100 may be a device integrated in the vehicle 200 , such as a vehicle-mounted terminal, or may be a device separated from the vehicle 200 , such as driver A's mobile phone, etc., which is not limited here.
  • the electronic device 100 can directly use the speakers in the vehicle 200 to broadcast the audio data it needs to broadcast.
  • the connection between the electronic device 100 and the vehicle 200 may be established through, but is not limited to, short-range communication (such as Bluetooth, etc.).
  • the electronic device 100 can transmit the audio data it needs to broadcast to the vehicle 200 and broadcast it through the speakers on the vehicle 200 , or the electronic device 100 can use its built-in The speaker broadcasts the audio data it needs to broadcast.
  • the electronic device 100 can reduce the volume of music playback and play the navigation sound at a normal volume.
  • the normal volume can be understood as: the volume before the sound of the music is not reduced during the music playing process. After the navigation sound is broadcast, the electronic device 100 can restore the music playing volume to the normal volume.
  • an embodiment of the present application provides a sound processing method.
  • the user can obtain the sound of navigation announcement and have a better understanding of the sound of music playback. listening experience.
  • Figure 5 shows a sound processing method in some embodiments of the present application.
  • the electronic device 100 may be a device integrated in the vehicle 200 , such as a vehicle-mounted terminal; or it may be a device separated from the vehicle 200 , such as driver A's mobile phone.
  • navigation-related software such as Google etc.
  • software related to playing music such as Apple etc.
  • the user is using the electronic device 100 to navigate from one location to another location and is using the electronic device 100 to play music.
  • the method may include the following steps:
  • the electronic device 100 acquires the second audio data to be played.
  • the electronic device 100 can obtain another audio data to be played.
  • the first audio data may be music data played by the electronic device 100
  • the second audio data may be navigation data that the electronic device 100 needs to play.
  • the electronic device 100 extracts the third audio data to be played from the first audio data according to the second audio data, where the second audio data and the third audio data correspond to the same playback time period.
  • the electronic device 100 can extract the third audio data to be played from the first audio data according to the initial play time and data length of the second audio data, wherein the initial play time of the third audio data is the same as the data length.
  • the initial playback time of the second audio data is the same, and the data length of the third audio data is equal to the data length of the second audio data. That is to say, the playback time periods corresponding to the second audio data and the third audio data are the same.
  • the electronic device 100 performs vocal cancellation or vocal reduction processing on the third audio data to obtain fourth audio data.
  • the electronic device 100 can input the third audio data to the pre-trained human voice cancellation model, and perform human voice cancellation processing on the third audio data to obtain the fourth audio signal. data.
  • vocal reduction processing is required, the electronic device 100 can input the third audio data to the pre-trained vocal reduction model, and perform vocal reduction processing on the third audio data to obtain fourth audio data.
  • Whether to select vocal elimination processing or vocal reduction processing can be, but is not limited to, preset. Among them, since the fourth audio data is obtained by processing the third audio data, and the playback time periods corresponding to the second audio data and the third audio data are the same, the playback time periods corresponding to the second audio data and the fourth audio data are also the same. same.
  • the electronic device 100 may also first input the third audio data to a high-pass filter to filter out data of a specific frequency. Then, the electronic device 100 The data output through the high-pass filter can be channel-mixed to remove vocals. Finally, the electronic device 100 can input the channel-mixed data to a low-pass filter to filter out data at a specific frequency, thereby obtaining fourth audio data.
  • the ratio of the audio signals corresponding to the two channels can be set in one channel.
  • the percentage of the original left channel in the new left channel is a1; the percentage of the original right channel in the new left channel is a2; the percentage of the original left channel in the new right channel is b1; the new right channel
  • the percentage of the original right channel in Daoli is b2.
  • the four values of the channel mixing are: 100, -100, -100, 100. This generates a stereo waveform with opposite left and right channel waveforms. When the two waveforms in one channel are added together, they cancel each other out, and the human voice is completely eliminated.
  • the four values of the channel mix can be: 100, -50, -50, and 100. In this way, when the two waveforms in one channel are added together, they cancel by half, and the reduction of the human voice is completed.
  • the electronic device 100 determines the first gain that needs to be adjusted for the second audio data based on the second audio data, and adjusts the gain of each channel in the second audio data based on the first gain to obtain fifth audio data. .
  • the electronic device 100 may first extract audio features of the second audio data, such as time domain features. Then, based on the determined audio characteristics, the first gain that needs to be adjusted for the second audio data is determined.
  • time domain features may include loudness, envelope energy, or short-term energy, etc.
  • the amplitude of the waveform at each moment can be determined based on the waveform diagram of the second audio data in the time domain, and then the loudness at each moment can be determined. Among them, an amplitude is the loudness at a moment. In addition, you can also select a specific loudness according to your needs, such as the maximum loudness, etc.
  • the envelope corresponding to the second audio data can be constructed based on the waveform diagram of the second audio data in the time domain; and then the envelope surrounded by the envelope is calculated through integration.
  • the area of the graph is used to obtain the average envelope energy of the second audio data in the time domain, and the average envelope energy is the required envelope energy. For example, the amplitude corresponding to each moment on the time domain waveform chart can be compared.
  • the envelope can be understood as a change curve of the amplitude of the second audio data over time on a time domain waveform graph.
  • this figure is a time domain waveform diagram of the second audio data.
  • the curve of the envelope corresponding to the second audio data in Figure 6a can be as shown in Figure 6b, wherein, Figure The area between the envelope curve and the horizontal axis in 6b is the average envelope energy corresponding to the second audio data.
  • the amplitude of the waveform at each moment can be determined from the waveform diagram of the second audio data in the time domain, and the amplitude of the waveform at each moment can be determined. The squares are summed to obtain the short-term energy of the second audio data.
  • the electronic device 100 may determine the first gain based on the determined audio characteristics and the preset first gain calculation formula.
  • g is the gain
  • w n is the preset nth weight value
  • K n is the preset nth threshold value
  • x n is the maximum value of the nth audio feature, such as the maximum value of loudness, etc. .
  • the electronic device 100 may also first perform frame processing on the second audio data to obtain at least one audio frame. Then, the electronic device 100 can obtain the loudness and/or short-term energy corresponding to each audio frame in the aforementioned manner.
  • the maximum loudness can be selected from the loudness corresponding to each audio frame and substituted into the above "Formula 1", that is, the first gain can be obtained.
  • the maximum envelope energy can be selected from the envelope energies corresponding to each audio frame and substituted into the above "Formula 1" to obtain the first gain.
  • the maximum short-term energy can be selected from the short-term energy corresponding to each audio frame and substituted into the above "Formula 1" to obtain the first gain.
  • the maximum loudness can be selected from the loudness corresponding to each audio frame
  • the maximum envelope energy can be selected from the envelope energy corresponding to each audio frame
  • the electronic device 100 can adjust the gain of each channel in the second audio data based on the first gain to obtain fifth audio data.
  • the maximum loudness value corresponding to the second audio data exceeds a certain value, it indicates that the loudness of the second audio data can meet the requirements.
  • the value of the first gain can be set to 0 to reduce the amount of subsequent calculations. In this way, the fifth audio data obtained subsequently is the second audio data.
  • the electronic device 100 determines the second gain that needs to be adjusted for the fourth audio data according to the fourth audio data, and adjusts the gain of each channel in the fourth audio data based on the second gain to obtain the sixth audio data. .
  • the electronic device 100 may first extract audio features of the fourth audio data, such as time domain features, music theory features, or frequency domain features. Then, based on the determined audio characteristics, the second gain that needs to be adjusted for the fourth audio data is determined.
  • the time domain features may include loudness and/or short-term energy, etc.
  • Music theory features can include beat, mode, chord, pitch, timbre, melody, emotion, and more.
  • Frequency domain features may include spectral energy of multiple preset frequency bands, etc.
  • the electronic device 100 may input the fourth audio data to a pre-trained music theory feature determination model to obtain the music theory features of the fourth audio data.
  • the music theory feature determination model can be obtained by training the audio data used for training using a Gaussian process model, a neural network model, a support vector machine, etc.
  • the mode contained in the fourth audio data can also be determined based on the Krumhansl-Schmuckler tonal analysis algorithm.
  • the emotion contained in the fourth audio data can also be determined based on the Thayer emotion model.
  • the electronic device 100 can perform short time Fourier transform (STFT) on the fourth audio data, convert the frame of audio data from the time domain to the frequency domain, and obtain the corresponding fourth audio data.
  • STFT short time Fourier transform
  • spectrogram From the spectrogram corresponding to the fourth audio data, the spectral energy corresponding to the fourth audio data can be obtained.
  • the fourth audio data can be divided into n frequency bands. Each frequency in each frequency band corresponds to a spectrum energy. The frequency band can be obtained by summing or averaging the spectrum energy corresponding to each frequency in each frequency band. the corresponding spectrum energy.
  • this figure is a spectrogram obtained after performing short-time Fourier transform on the fourth audio data.
  • the horizontal axis is the frequency and the vertical axis is the spectrum energy value; the fourth audio data is divided into There are 3 frequency bands, and each frequency in each frequency band corresponds to a spectrum energy. By summing or averaging these spectrum energies, the spectrum energy corresponding to the corresponding frequency band (such as frequency band 1) can be obtained.
  • the second gain that needs to be adjusted for the fourth audio data can be determined based on the preset second gain calculation formula.
  • g is the gain
  • w n is the preset nth weight value
  • x n is the value of the nth audio feature.
  • the electronic device 100 can adjust the gain of each channel in the fourth audio data based on the second gain to obtain sixth audio data.
  • the electronic device 100 may also first perform frame processing on the fourth audio data to obtain at least one audio frame. Then, the electronic device 100 can perform a short time Fourier transform (STFT) on each audio frame, convert the audio data of the frame from the time domain to the frequency domain, and obtain a spectrogram corresponding to each audio frame. From the spectrogram corresponding to each audio frame, the spectrum energy corresponding to each audio frame can be obtained. Then, an audio frame with the largest spectral energy can be selected as the required audio frame, and the audio frame can be processed using the aforementioned method of determining time domain characteristics, music theory characteristics, or frequency domain characteristics to obtain the fourth The second gain to be adjusted for the audio data.
  • STFT short time Fourier transform
  • the first gain and the second gain may also be preset based on The linear relationship between the second gain is corrected to obtain the required second gain.
  • g is the second gain after correction
  • g 1 is the first gain
  • g 2 is the second gain before correction
  • K is a constant.
  • the electronic device 100 plays the fifth audio data and the sixth audio data simultaneously.
  • the electronic device 100 can play the fifth audio data and the sixth audio data at the same time.
  • the user can clearly perceive the information contained in the original second audio data, he can also clearly perceive the melody, background sound, etc. of the original first audio data, thus satisfying the user more effectively.
  • the sense of hearing improves the user experience.
  • the second audio data when determining the second gain that needs to be adjusted for the fourth audio data, in addition to the method described in S505, the second audio data can also be adjusted based on the fifth audio data (that is, based on the first gain). Data obtained after adjustment), determine the second gain.
  • the second gain may be determined based on the ratio between the maximum loudness value of the fifth audio data and the maximum loudness value of the second audio data and the maximum loudness value of the fourth audio data calculated in real time.
  • the second gain is determined according to the ratio between the maximum loudness value of the fifth audio data and the preset maximum loudness value of the second audio data and the maximum loudness value of the fourth audio data.
  • the ratio between the maximum loudness value of the second audio data and the maximum loudness value of the fourth audio data is f
  • the current maximum loudness value of the second audio data is A
  • the maximum loudness value of the fifth audio data is B.
  • the maximum loudness value of the sixth audio data that is, the data obtained by adjusting the fourth audio data based on the second gain
  • the loudness value that needs to be adjusted for the fourth audio data can be determined.
  • the second gain that needs to be adjusted for the fourth audio data can be determined.
  • the second gain may be compared with a preset gain value (such as 0, 0.1, etc.).
  • a preset gain value such as 0, 0.1, etc.
  • the second gain is greater than the preset gain value, it indicates that the sound generated by playing the fourth audio data is smaller, which has a smaller impact on the sound generated by playing the fifth audio data obtained in S504. Therefore, the determined The value of the second gain is updated to the preset gain value. For example, if the unit of the second gain is represented by a standardized value (such as an amplification factor, etc.), if the determined value of the second gain is 0.2 and the preset gain value is 0.1, then the second gain can be Value adjusted from 0.2 to 0.1.
  • the process may be performed after the start of playback and at a predetermined distance from the start of playback.
  • the preset value such as 0, 1, etc.
  • the preset value such as 0, 1, etc.
  • the gain to be adjusted is adjusted from the default value (such as 0, 1, etc.) to the second gain in a certain step size after the playback starts and within a preset time period from the start of playback, and, at the end After playback and within a preset time period from the end of playback, the gain to be adjusted is adjusted from the second gain to a preset value (such as 0, 1, etc.) in a certain step. Therefore, when transitioning to playing the sixth audio data, or transitioning from playing the sixth audio data to playing other data in the first audio data, a sudden change in volume can be avoided and the user experience can be improved.
  • the default value such as 0, 1, etc.
  • Audio data played continuously and audio data played occasionally are played through different electronic devices.
  • FIG. 8 shows another sound processing method in some embodiments of the present application.
  • the first device and the second device are separate devices, and the connection between the first device and the second device may be established through, but is not limited to, short-range communication methods such as Bluetooth.
  • the first device is configured with software related to the continuous playback of audio data (such as Apple etc.), or the first device may be a device that can continuously play audio data, such as a smart TV, a smart speaker, etc., and the first device is using its own speaker to play the audio data.
  • the second device is configured with software related to audio data that can be played occasionally (such as calls, Google etc.); wherein, the sound generated on the second device is played through its own speaker.
  • the first device may be a smart TV, a smart speaker, a car terminal, etc.; the second device may be a mobile phone, a tablet computer, etc.
  • the method may include the following steps:
  • the second device When the second device needs to broadcast audio data, the second device sends a first message to the first device.
  • the first message is used to instruct the first device to perform a vocal cancellation or vocal reduction operation.
  • the second device when the second device needs to broadcast audio data, the second device may send a first message to the first device to instruct the first device to perform a vocal cancellation or vocal reduction operation.
  • the second device may be a mobile phone, and the first device may be a smart speaker, a smart TV, etc.
  • the first device may be playing music, TV series, movies, etc.
  • the audio data that the second device needs to broadcast may be the audio data that the second device needs to play when the user uses the second device to make a call. That is to say, in a home scenario, when the user needs to use the second device to make a call (for example, when the second device receives an incoming call, or when the user answers an incoming call on the second device), the second device can make a call to the second device.
  • the first device sends a message instructing the first device to perform a vocal cancellation or vocal reduction operation.
  • the second device may be a mobile phone (for example, the electronic device 100 shown in FIG. 5 ), and the first device may be a vehicle-mounted terminal (for example, the vehicle 200 shown in FIG. 4 ).
  • the first device may be playing music, etc.
  • the audio data to be broadcast by the second device may be the audio data that the second device needs to play when the user uses the second device to navigate or make a call. That is to say, in the driving scene, when the second device needs to play navigation audio data, or the user needs to use the second device to make a call, the second device can send instructions to the first device to perform human voice cancellation or Message for vocal reduction operation.
  • the first device performs a vocal cancellation or vocal reduction operation on the audio data to be played.
  • the first device when the human voice cancellation operation is selected, can perform the human voice cancellation operation on the audio data to be played through the aforementioned human voice cancellation method.
  • the first device may perform the vocal reduction operation on the audio data to be played through the aforementioned vocal reduction method.
  • the first message when the audio data to be played by the second device is navigation audio data, the first message may include the initial play time and data length of the navigation audio data.
  • the first device can extract sub-data equal to the initial play time and data length from the audio data to be played, and perform a vocal elimination operation on the sub-data, wherein the The initial playback time of the subdata is the same as the initial playback time of the navigation audio data, and the data length of the subdata is equal to the length of the navigation audio data.
  • the second device plays the audio data
  • the first device plays the audio data after the human voice has been eliminated or reduced.
  • the second device When the second device finishes broadcasting audio data, the second device sends a second message to the first device.
  • the second message is used to instruct the first device to stop performing the human voice cancellation or human voice reduction operation.
  • the second device when the second device finishes broadcasting audio data, the second device sends a second message to the first device, and the second message is used to instruct the first device to stop performing the human voice cancellation or human voice reduction operation.
  • the second device when the user uses the second device to make a call, when the user ends the call (for example, when the user hangs up the phone), the second device can notify the first device of the state of ending the call, so that the first device can Stop performing vocal cancellation or vocal reduction operations.
  • the second device can notify the first device of the status of ending the navigation broadcast, so that the first device can stop performing human voice cancellation or human voice reduction. operate.
  • the first device stops performing the vocal cancellation or vocal reduction operation on the audio data to be played, and plays the audio data without vocal cancellation or vocal reduction.
  • the interference of the audio data played by the first device can be reduced, so that the user can clearly perceive the audio data played by the second device.
  • speakers configured in the space, and at least some of the speakers are arranged according to certain requirements (such as: 5.1.X, or 7.1.X, etc.), and electronic equipment or other devices are using the speakers to play audio. data.
  • FIG. 9 shows an application scenario in some embodiments of the present application.
  • speakers can be configured at fixed locations in the room in accordance with the requirements of 5.1.X, so that users can enjoy the ultimate cinema-level sound.
  • 5 represents the number of speakers to build spatial surround sound
  • 1 represents the subwoofer
  • the speaker 201 is arranged directly in front of the location where the user A is located; the speaker 202 is arranged in front of the right side of the user A.
  • the speaker 202 can be arranged between the location where the user A is located and the speaker 201 is the baseline, and the position of user A is the center of the circle, 30 degrees to the right; the speaker 203 is arranged at the right rear of user A.
  • the speaker 203 can be arranged at the position where user A is.
  • the line connecting the position and the speaker 201 is the reference line, and the position of the user A is the center of the circle, 120 degrees to the right; the speaker 204 is arranged at the left rear of the user A.
  • the speaker 204 can be arranged at The line connecting the position of user A and the speaker 201 is the reference line, and the position of the user A is the center of the circle, 120 degrees to the left; the speaker 205 is arranged in front of the left side of the user A, for example, the speaker 205 can be arranged at a position 30 degrees to the left with the connection between the position of user A and the speaker 201 as the reference line, and the position of user A as the center of the circle.
  • FIG. 9 shows another application scenario in some embodiments of the present application.
  • speakers can be configured at fixed positions in the room in accordance with the requirements of 7.1.X, so that users can enjoy the ultimate cinema-level sound.
  • the speaker 201 is arranged directly in front of the location where the user A is located; the speaker 202 is arranged in front of the right side of the user A.
  • the speaker 202 can be arranged between the location where the user A is located and the speaker 201 is the baseline, and the position of user A is the center of the circle, 30 degrees to the right;
  • the speaker 203 is arranged directly to the right of user A.
  • the speaker 203 can be arranged at the position where user A is.
  • the connection line between the position of the user A and the speaker 201 is the baseline, and the position of the user A is the center of the circle, 90 degrees to the right; the speaker 204 is arranged at the right rear of the user A.
  • the speaker 204 can be arranged at Taking the line connecting the position of user A and the speaker 201 as the baseline, and taking the position of the user A as the center of the circle, 150 degrees to the right; the speaker 205 is arranged at the left rear of the user A, for example, The speaker 205 can be arranged at a position 150 degrees to the left with the line connecting the position of the user A and the speaker 201 as the reference line, and the position of the user A as the center of the circle; the speaker 206 can be arranged at the position of the user A.
  • the speaker 206 can be arranged at a position 90 degrees to the left with the line connecting the position of the user A and the speaker 201 as the reference line, with the position of the user A as the center of the circle; the speaker 207 is arranged in front of the left side of user A.
  • speaker 207 can be arranged with the connection line between user A's position and speaker 201 as the baseline, and with user A's position as the center of the circle, 30 degrees to the left. location.
  • embodiments of the present application provide a sound processing method that can adjust the gain of the audio signal output by each speaker based on the distance between the user and each speaker, so that the user can enjoy spatial surround sound anytime, anywhere.
  • FIG. 10 shows yet another application scenario in some embodiments of the present application.
  • FIG. 10 The main difference between the scene shown in and the scene shown in Figure 9 is that: the space shown in (A) of Figure 10 is equipped with an image acquisition device, such as a camera 300, and/or user A carries a Electronic equipment 100.
  • an image acquisition device such as a camera 300
  • user A carries a Electronic equipment 100.
  • the camera 300 can collect images of user A in the space to determine the distance between user A and each speaker based on the collected images.
  • the camera 300 can establish a connection with a controller (not shown in the figure) used to control each speaker through a wired network or a wireless network (such as Bluetooth, etc.), so that the camera 300 can collect the images it collects. Transmit to the controller, so that the controller processes the image, for example, input the image to a pre-trained image processing model, so that the controller outputs the distance between user A and each speaker according to the model.
  • the image processing model can be, but is not limited to, trained based on a convolutional neural network (convolutional neural network, CNN).
  • the camera 300 can establish a connection with the electronic device 100 through a wireless network (such as Bluetooth, etc.).
  • the camera 300 can transmit the images it collects to the electronic device 100 so that the electronic device 100 can perform processing on the images. Processing, such as inputting the image to a pre-trained image processing model, so that the electronic device 100 outputs the distance between user A and each speaker according to the model.
  • the electronic device 100 can establish a connection with each speaker through a wireless network (such as Bluetooth, etc.).
  • a wireless network such as Bluetooth, etc.
  • it can also be determined based on the wireless communication signal between the electronic device 100 and each speaker. For example, it can be determined based on the strength of the received signal.
  • a received signal strength indication (RSSI) ranging method determines the distance between the electronic device 100 and each speaker. Since the electronic device 100 is carried by user A, the distance between the electronic device 100 and each speaker is determined, that is, the distance between user A and each speaker is determined.
  • RSSI received signal strength indication
  • the execution subject that determines the distance between user A and each speaker may be the electronic device 100 or a controller (not shown in the figure) used to control each speaker, which is not limited here.
  • the electronic device 100 determines the distance between the electronic device 100 and a speaker for the execution subject
  • the distance between the electronic device 100 and a certain speaker can be determined through the following "Formula 1", which formula is:
  • the three-point positioning method can be used to process the distances between the electronic device 100 and at least three speakers to obtain the position of the electronic device 100 .
  • the movement distance of the electronic device 100 can be obtained from the positions of the electronic device 100 at different times. Since the electronic device 100 is carried by the user, the moving distance of the electronic device 100 is the moving distance of the user.
  • FIG. 10 shows yet another application scenario in some embodiments of the present application.
  • the main difference between the scene shown in (B) of FIG. 10 and the scene shown in (A) of FIG. 10 is that the space shown in (B) of FIG. 10 is surrounded by speakers 201 to 205
  • Other speakers are also configured outside the area, such as speakers 208, 209, etc.
  • FIG. 10 when user A moves to an area surrounded by speakers 202 , 208 , 209 , 210 and 203 , spatial surround sound can be controlled to be generated in this area.
  • FIG. 10(B) may be in the same space as the speakers 201 to 205. In a space adjacent to the space where the speakers 201 to 205 are located, there is no limitation here. In addition, what is shown in Figure 10 is a scenario in which speakers are configured according to the requirements of 5.1.
  • user A can configure the position of the camera and/or each speaker in the space on the electronic device 100 , and/or configure the identity of the camera and/or each speaker, etc. , so as to facilitate the subsequent determination of the distance between user A and each speaker, and the subsequent selection of the required speaker.
  • the electronic device 100 may be installed with an application (application, APP) for configuring the camera and/or speaker, and user A can log in to the APP for configuration.
  • each speaker can automatically identify its position in space based on the distance from the electronic device, and display it in the APP interface installed on the electronic device 100 .
  • User A can also adjust the position of each speaker in the space in the APP.
  • Figure 11 shows the flow of a sound processing method in some embodiments of the present application.
  • the connection between the electronic device 100 and each speaker may be established through, but is not limited to, Bluetooth.
  • the audio signal played by the speaker may be an audio signal in the electronic device 100 or an audio signal in other devices, which is not limited here.
  • the user's movement area can be an area surrounded by speakers that construct spatial surround sound, such as the area surrounded by speakers 201 to 205 in Figure 10(A), etc., or it can be other areas.
  • the area outside the area surrounded by the speakers 201 to 205 in (A) of Figure 10 is not limited here.
  • the method shown in Figure 11 can be executed in real time, or can be executed when certain conditions are met. For example, it can be executed when it is detected that the user's movement distance is greater than a certain threshold, which is not limited here.
  • the sound processing method includes the following steps:
  • the electronic device 100 determines the distance between it and N speakers to obtain N first distances, where N is a positive integer.
  • the electronic device 100 can determine the distance between the user and each speaker based on the image collected by the image acquisition device configured in the space where the user is located, so as to obtain the N first distances.
  • the electronic device 100 may also determine the distance between the electronic device 100 and each speaker based on the wireless communication signal between the electronic device 100 and each speaker to obtain N first distances.
  • N is a positive integer.
  • the N speakers may be speakers configured according to certain requirements (such as 5.1.X or 7.1.X, etc.) to build spatial surround sound.
  • the N speakers may be the speakers 201 to 205 shown in (A) of FIG. 10 .
  • the N speakers may be all speakers in the space, such as speakers 201 to 205 and speakers 208 to 210 shown in (B) of Figure 10 .
  • the electronic device 100 selects the target speaker from the N speakers based on the N first distances.
  • the distance between the target speaker and the electronic device 100 is the shortest.
  • the electronic device 100 can sort the N first distances, such as from large to small or from small to large, etc., select the smallest first distance from them, and add the smallest first distance to The corresponding speaker serves as the target speaker.
  • the target speaker may also be other speakers, such as the speaker farthest from the electronic device 100, etc.
  • the specific situation may be determined according to the actual situation and is not limited here.
  • the electronic device 100 uses the distance between it and the target speaker as a reference to determine the gain that needs to be adjusted for the audio signals corresponding to each speaker except the target speaker to construct a first speaker group, where the first speaker group is
  • the N speakers are all virtualized with the electronic device 100 as the center, and the distance between the electronic device 100 and the target speaker is half
  • the combination of speakers is obtained on a circle with a diameter.
  • one piece of audio data may, but is not limited to, include audio signals that each corresponding speaker needs to play.
  • each audio signal included in one audio data may correspond to one channel.
  • the electronic device 100 can choose the distance between itself and the target speaker as a reference, and determine the audio signals corresponding to the other speakers that need to be adjusted based on the reference distance and the distances between other speakers and the electronic device 100 gain, so that all other speakers are virtualized on a circle with the distance between the electronic device 100 and the target speaker as the radius, thereby constructing the first speaker group.
  • the audio signal corresponding to one of the speakers needs to be adjusted.
  • the gain gi d2/d1.
  • the electronic device 100 may record the gain that needs to be adjusted for the audio signal corresponding to each real speaker to obtain the first gain set.
  • the electronic device 100 takes its current orientation as a reference and builds a virtual speaker group based on the first speaker group.
  • the virtual speaker group consists of M virtual speakers, and the value of M is equal to the number of speakers required to build spatial surround sound.
  • the individual virtual speakers in the virtual speaker group are arranged in the same way as the speakers required to build spatial surround sound.
  • the electronic device 100 can first determine a virtual speaker in its orientation based on the first speaker group, and then based on the preset arrangement of speakers required to build spatial surround sound (such as 5.1. or 7.1.X arrangement, etc.) to determine the remaining virtual speakers, thereby constructing a virtual speaker group.
  • the virtual speaker can be understood as a virtual speaker.
  • the speaker when there is a speaker in the first speaker group in the direction of the electronic device 100 , or there is a speaker within a preset angle range in the direction of the first speaker group, the speaker can be determined as a virtual speaker group.
  • the center speaker can be understood as a speaker in the 0-degree direction in the orientation of the electronic device 100, such as the speaker 201 shown in (A) of FIG. 10 .
  • the orientation of the electronic device 100 can be understood as a direction from the bottom of the electronic device 100 to the top thereof.
  • the position of the earpiece 1201 of the mobile phone can be the top of the mobile phone, and the position 1202 on the mobile phone opposite to the earpiece 1201 can be On the bottom of the phone, the direction pointed by arrow 1203 is the direction of the phone.
  • the orientation of the electronic device 100 can be determined by the projection of the electronic device 100 on the horizontal plane. At this time, the orientation of the electronic device 100 can be the bottom of the electronic device 100 on the horizontal plane.
  • the projection on is oriented in the direction of the projection of its top on the horizontal plane.
  • the left and right adjacent speakers in the first speaker group located in the direction can be used.
  • a virtual speaker is created from the two speakers, and the virtual speaker is used as the center speaker in the virtual speaker group.
  • a virtual speaker can be created by adjusting the gains of the audio signals corresponding to the two speakers in the first speaker group.
  • the preset angle range in the orientation of the electronic device 100 continue to refer to FIG. 12.
  • the preset angle range in the orientation of the electronic device 100 is the area constructed by the angle ⁇ . .
  • vector basis amplitude translation can be used (Vector base amplitude panning, VBAP) algorithm virtualizes one speaker from two speakers. It should be understood that when there is a speaker in the direction of the electronic device 100, it can also be understood as a virtual speaker, but this virtual speaker is essentially a speaker in the first speaker group.
  • Vector base amplitude panning VBAP
  • the speakers SP1 and SP2 are on a circle centered on the user U11, and the current orientation of the user U11 is the direction pointed by the vector P. .
  • the positions of the speakers SP1 and SP2 can be used to fix the sound at the position of the virtual speaker VSP1.
  • the position of the user U11 is taken as the origin O, it has a two-dimensional coordinate system in which the vertical direction and the horizontal direction are the x-axis direction and the y-axis direction, respectively. In this two-dimensional coordinate system, the position of the virtual speaker VSP1 can be represented by the vector P.
  • the vector P is a two-dimensional vector
  • the coefficient g1 is used as the gain of the audio signal corresponding to the speaker SP1
  • the coefficient g2 is used as the gain of the audio signal corresponding to the speaker SP2
  • the sound can be fixed at the position of the virtual speaker VSP1.
  • the virtual speaker VSP1 can be located at any position on the arc AR11 connecting the speakers SP1 and SP2.
  • the remaining virtual speakers can be determined according to a preset arrangement of speakers required to construct spatial surround sound, thereby constructing a virtual speaker group. For example, taking the construction of a virtual speaker group required by 5.1.
  • the virtual speaker on the right front of user U11 is located with the connection between the position of user U11 and the speaker VSP1 as the baseline, and with the position of user U11 as the center of the circle, deviated 30 degrees to the right. degree; the virtual speaker at the right rear of user U11 is located at a position 120 degrees to the right with the connection between the position of user U11 and speaker VSP1 as the baseline, and with the position of user U11 as the center of the circle.
  • the speaker on the left rear of user U11 is located at a position 120 degrees to the left with the connection between the position of user U11 and speaker VSP1 as the baseline, and with the position of user U11 as the center of the circle; the left front of user U11 The speaker is located at a position 30 degrees to the left with the connection between the position of the user U11 and the speaker VSP1 as the baseline, and the position of the user U11 as the center of the circle.
  • the virtual speakers when determining the remaining virtual speakers, when there are no speakers at a specific angle or a specific angle range, the virtual speakers are virtualized through the left and right speakers. For details, please refer to the aforementioned method of determining the center speaker, which will not be explained here. Repeat.
  • the gain that needs to be adjusted for the audio signal corresponding to each speaker in the first speaker group may be recorded, thereby obtaining the second gain set.
  • the required speakers can also be virtualized from the obtained virtual speakers.
  • the method of virtualizing the required speaker from the obtained virtual speaker can refer to the aforementioned method of determining the center speaker, which will not be described again here.
  • the electronic device 100 controls the virtual speaker group to play audio data.
  • the electronic device 100 after the electronic device 100 constructs a virtual speaker group, it can control the virtual speaker group to play audio data.
  • the audio data played by the virtual speaker group can be obtained by adjusting the gain of each channel in the audio data according to the first gain set and the second gain set determined above.
  • two speakers SP1 and SP2 are arranged in the space, and the distance between the speaker SP1 and the user U11 (ie, the electronic device 100) is d1, and the distance between the speaker SP2 and the user U11 is d1.
  • the distance is d2 as an example.
  • (B) of FIG. 14 when constructing the first speaker group, if d1 is used as the reference, the speaker SP2 can be virtually placed on the circle C1 to obtain the speaker SP2 ′.
  • (C) of Figure 14 when constructing a virtual speaker group, you can use Speakers SP1 and SP2' virtualize a virtual speaker VSP1.
  • the second gain set obtained in (C) of Figure 14 is: the gain that needs to be adjusted for the audio signal corresponding to speaker SP1 is g2, and the gain that needs to be adjusted for the channel corresponding to speaker SP2' is g3.
  • the final gain that needs to be adjusted for the audio signal corresponding to the speaker SP1 is g2, and the speaker
  • the addition method can be used, and when the unit of the gain that needs to be adjusted is a standardized value, the multiplication method can be used.
  • the electronic device 100 can adjust the gain of the corresponding channel in the audio data to obtain the required audio data, and map the corresponding channel to The signal is sent to the corresponding speaker, so that the sound feels like it is played through the virtual speaker group. In this way, the sound perceived by the user is approximately generated around the user, allowing the user to enjoy spatial surround sound anytime and anywhere.
  • the delay corresponding to each speaker can be determined respectively, so that each speaker can play the same audio data synchronously.
  • the electronic device 100 can control each speaker to play audio data according to the corresponding time delay.
  • the user can adjust the gain of each speaker anytime and anywhere as the user moves, so that the user can enjoy spatial surround sound anytime and anywhere.
  • five speakers are arranged in the space, namely speakers SP1, SP2, SP3, SP4 and SP5.
  • the electronic device 100 used by the user is at position a1, and the user is at Move in an area surrounded by 5 speakers.
  • the electronic device 100 can construct a virtual speaker group according to the requirements of 5.1.X.
  • the virtual The speaker group consists of speakers SP2, VSP1, VSP2, SP4' and SP1'.
  • the speaker VSP1 is virtually obtained by the speakers SP2 and SP3'
  • the speaker VSP2 is obtained virtually by the speakers SP2 and SP1'. It can be understood that the speakers SP2, SP4' and SP1' are located within an angle or angle range that satisfies the condition.
  • the electronic device 100 can control the virtual speaker group to play audio data.
  • each speaker is arranged in the space, namely speakers SP1, SP2, SP3, SP4, SP5, SP6 and SP8, and the electronic device 100 used by the user is at position a1 .
  • the electronic device 100 can construct a virtual speaker group according to the requirements of 5.1.X.
  • the virtual speaker group consists of speakers SP5, VSP1, SP6’, VSP2, and SP3’.
  • the speaker VSP1 is virtually obtained from the speakers SP1’ and SP6’
  • the speaker VSP2 is obtained virtually from the speakers SP8’ and SP4’. It can be understood that the speakers SP5, SP6' and SP3' are located within an angle or angle range that satisfies the condition.
  • the electronic device 100 can control the virtual speaker group to play audio data.
  • Figure 17 shows the flow of a sound processing method in some embodiments of the present application.
  • the connection between the electronic device 100 and each speaker may be established through, but is not limited to, Bluetooth.
  • the audio signal played by the speaker may be an audio signal in the electronic device 100 or an audio signal in other devices, which is not limited here.
  • the user's movement area can be an area surrounded by speakers that construct spatial surround sound, such as the area surrounded by speakers 201 to 205 in Figure 10 (A), etc., or it can be other areas.
  • the area outside the area surrounded by the speakers 201 to 205 in (A) of Figure 10 is not limited here.
  • the method shown in Figure 17 can be executed in real time, or can be executed when certain conditions are met. For example, it can be executed when it is detected that the user's movement distance is greater than a certain threshold, which is not limited here.
  • the sound processing method includes the following steps:
  • the electronic device 100 determines the distance between it and N speakers to obtain N first distances, where N is a positive integer.
  • the electronic device 100 constructs a first virtual speaker group based on its orientation.
  • the first virtual speaker group is composed of M virtual speakers, and the value of M is the same as the number of speakers required to construct spatial surround sound.
  • the electronic device 100 can first determine a virtual speaker in its orientation, and then based on the preset arrangement of the speakers required to build spatial surround sound (such as the arrangement of 5.1.X or 7.1.X method, etc.) to determine the remaining virtual speakers in sequence, thereby constructing a first virtual speaker group.
  • the preset arrangement of the speakers required to build spatial surround sound such as the arrangement of 5.1.X or 7.1.X method, etc.
  • the speaker when there is a speaker in the direction of the electronic device 100, or there is a speaker within a preset angle range in the direction of the electronic device 100, the speaker can be determined as the center speaker in the virtual speaker group.
  • a speaker can be virtualized from the two adjacent speakers on the left and right in the direction, and the virtual speaker can be The speaker serves as the center speaker in the virtual speaker group.
  • a virtual speaker is virtualized from two real speakers. At this time, you can create a virtual speaker by adjusting the gain of two real speakers. See the description in Figure 13 for details and will not be described again here.
  • the distances between the speakers SP1 and SP2 and the user U11 are respectively are d1 and d2, and d1 ⁇ d2.
  • d1 can be used as the radius of the required circle C1.
  • the gain that needs to be adjusted for the audio signal corresponding to speaker SP2 can be adjusted in the manner described in S1103.
  • the speaker SP2 is virtualized on the circle C1 with the user U11 as the center and d1 as the radius.
  • the speaker SP2' in the figure is the speaker virtualized by the speaker SP2.
  • d2' d1.
  • a virtual speaker VSP1 can be virtualized from the speakers SP1 and SP2' in the manner described in Figure 14.
  • the gain corresponding to speaker SP1 is g1
  • the gain corresponding to speaker SP2’ is g2.
  • speaker SP2' is virtualized by speaker SP2
  • the gain that needs to be adjusted for the audio signal corresponding to speaker SP2 can be gi.
  • the audio signal corresponding to speaker SP1 needs to be adjusted.
  • the gain of speaker SP2' is g1
  • the gain that needs to be adjusted for the channel corresponding to speaker SP2' is g2.
  • the gain of speaker SP1 is g1
  • the total gain of speaker SP2 is the product of gi and g2, or gi and g2 Added sums etc.
  • the addition method can be used.
  • the unit of the gain that needs to be adjusted is a standardized value (such as amplification factor, etc.)
  • the multiplication method can be used.
  • d2 can also be used as the radius of the required circle C1.
  • the gain that needs to be adjusted for the audio signal corresponding to SP1 can be adjusted through the method described in S1103, and the speaker VSP1 can be further virtualized.
  • any value within the range of d1 and d2 can also be selected as the required radius of circle C1.
  • the audio signals corresponding to speakers SP1 and SP2 can be adjusted in a similar manner as described above. gain, and further finally virtualize the speaker VSP1.
  • the electronic device 100 may record the gain that needs to be adjusted for the audio signal corresponding to each real speaker to obtain the first gain set.
  • the electronic device 100 determines the distance between it and each virtual speaker in the first virtual speaker group to obtain M second distances.
  • the electronic device 100 when the electronic device 100 constructs the first virtual speaker group, it is constructed based on the distance between it and a certain speaker, and is constructed in a circle with itself as the center and the distance as the radius. Virtually create a speaker. Therefore, the distance between a certain virtual speaker and the electronic device 100 is the distance corresponding to the benchmark selected to construct the virtual speaker.
  • the finally determined virtual speaker is VSP1', and the distance d1 between the speaker SP1 and the user U11 (ie, the electronic device 100) is used as the benchmark. Therefore, the virtual speaker VSP1' and the user U11 (i.e., the electronic device 100) are That is, the distance between the electronic devices 100) is d1.
  • the electronic device 100 selects the target speaker from the M virtual speakers based on the M second distances.
  • the distance between the target speaker and the electronic device 100 is the shortest.
  • the electronic device 100 can sort the M second distances, such as from large to small or from small to large. Sorting, etc., and selecting the smallest second distance, and using the virtual speaker corresponding to the smallest second distance as the target speaker.
  • the target speaker may also be other virtual speakers, such as the virtual speaker farthest from the electronic device 100, etc.
  • the specifics may be determined according to the actual situation and are not limited here.
  • the electronic device 100 takes the distance between it and the target speaker as a reference and constructs a second virtual speaker group based on the first virtual speaker group, where the second virtual speaker group is the M virtual speakers in the first virtual speaker group.
  • the speakers are all virtualized to a combination of virtual speakers on a circle with the electronic device 100 as the center and the distance between the electronic device 100 and the target speaker as the radius.
  • the electronic device 100 can select the distance between itself and the target speaker as a reference, and determine the location of the audio signals corresponding to other virtual speakers based on the reference distance and the distances between other virtual speakers and the electronic device 100 .
  • the gain needs to be adjusted to adjust the other virtual speakers to a circle with the distance between the electronic device 100 and the target speaker as a radius, thereby constructing a second virtual speaker group.
  • the electronic device 100 can also select other linear models for determination.
  • the gain that needs to be adjusted for the audio signal corresponding to each virtual speaker in the first virtual speaker group may be recorded to obtain the second gain set.
  • the electronic device 100 controls the second virtual speaker group to play audio data.
  • the electronic device 100 after the electronic device 100 constructs the second virtual speaker group, it can control the second virtual speaker group to play audio data.
  • the audio data played by the second virtual speaker group can be obtained by adjusting the gain of each channel in the audio data according to the first gain set and the second gain set determined above.
  • the speaker SP3 is a speaker that the user U11 (ie, the electronic device 100) faces upward, and the speaker needs to be SP1 and SP2 virtualize another desired speaker.
  • the speaker SP2 when constructing the first virtual speaker group, if d1 is used as the reference, the speaker SP2 can be virtually placed on the circle C1, that is, the speaker SP2' is obtained.
  • a virtual speaker VSP1 can be virtualized from the speakers SP1 and SP2'. At this time, two speakers in the first virtual speaker group, namely the speaker SP3 and the virtual speaker VSP1, are constructed.
  • the virtual speaker VSP1' when constructing the second virtual speaker group, can be virtualized on the circle C2 based on d3, thereby constructing two speakers in the second speaker group. , namely speaker SP3 and virtual speaker VSP1'.
  • the unit of the gain that needs to be adjusted is decibel
  • the addition method can be used.
  • the unit of the gain that needs to be adjusted is a standardized value, the method can be used. way of multiplying.
  • the virtual speaker VSP1' is equivalent to first virtualizing the speakers SP1 and SP2 on the circle C2, and then virtualizing the virtual speaker VSP1' from these two speakers. Since the speakers SP1, SP2’, and VSP1 are on the same circle C1, based on the virtual placement of the three on the circle C2, the gains required to be adjusted for the corresponding channels of the three are equal. Therefore, from the second gain set obtained in (D) of Figure 18, it is possible to determine the gain that needs to be adjusted for the audio signal corresponding to the real speaker corresponding to the virtual speaker VSP1 when constructing the second virtual speaker group, and the two real speakers The gain that needs to be adjusted for the channels corresponding to the speakers (ie speakers SP1 and SP2) is also g4.
  • the final gain required to be adjusted for the audio signal corresponding to speaker SP1 is (g0+g2+g4) or (g0*g2*g4), and the gain corresponding to speaker SP2
  • the final gain that needs to be adjusted for the audio signal is (gi+g4) or (gi*g4), and the gain that needs to be adjusted for the audio signal corresponding to speaker SP3 is g0.
  • the unit of the gain that needs to be adjusted is decibel
  • the final required adjusted gain can be obtained by adding the individual gains.
  • the unit of the gain that needs to be adjusted is a standardized value
  • the final required adjusted gain can be obtained by adding the individual gains. Get the final desired adjusted gain.
  • the electronic device 100 can adjust the gain of the corresponding channel in the audio data to obtain the required audio data, and map the corresponding channel to The signal is sent to the corresponding speaker, so that the sound feels like it is played through the second virtual speaker group.
  • the sound perceived by the user is approximately generated around the user, allowing the user to enjoy spatial surround sound anytime and anywhere.
  • Figure 19 shows the flow of a sound processing method in some embodiments of the present application.
  • the connection between the electronic device 100 and each speaker may be established through, but is not limited to, Bluetooth.
  • the audio signal played by the speaker may be an audio signal in the electronic device 100 or an audio signal in other devices, which is not limited here.
  • the user's movement area can be an area surrounded by speakers that construct spatial surround sound, such as the area surrounded by speakers 201 to 205 in Figure 10 (A), etc., or it can be other areas.
  • the area outside the area surrounded by the speakers 201 to 205 in (A) of Figure 10 is not limited here.
  • the method shown in Figure 19 can be executed in real time or when certain conditions are met. For example, it can be executed when it is detected that the user's movement distance is greater than a certain threshold, which is not limited here.
  • the sound processing method includes the following steps:
  • the electronic device 100 selects K speakers from N speakers based on its orientation, and the K speakers are used to construct spatial surround sound.
  • the electronic device 100 can first determine a speaker in its orientation, and then based on the preset arrangement of the speakers required to build spatial surround sound (such as a 5.1.X or 7.1.X arrangement) etc.) to determine the remaining required speakers in turn, thereby obtaining K speakers.
  • the preset arrangement of the speakers required to build spatial surround sound such as a 5.1.X or 7.1.X arrangement
  • the speaker when there is a speaker in the direction of the electronic device 100, or there is a speaker within a preset angle range in the direction of the electronic device 100, the speaker can be used as the required speaker.
  • the two adjacent speakers on the left and right can be used as the required speakers.
  • the electronic device 100 constructs a virtual speaker group based on the K speakers, where the virtual speaker group is a combination of virtual speakers obtained by virtualizing the K speakers on a circle centered on the electronic device 100.
  • the distance between the electronic device 100 and one of the K speakers can be used as the radius.
  • the process of constructing a virtual speaker group can be referred to the description in FIG. 11 or FIG. 17 , and will not be described again here.
  • the electronic device 100 controls the virtual speaker group to play audio data.
  • the process of controlling the playing of audio data by the electronic device 100 is detailed in the description in FIG. 11 or FIG. 17 , and will not be described again here.
  • the sound perceived by the user is approximately generated around the user, allowing the user to enjoy spatial surround sound anytime and anywhere.
  • the electronic device 100 may send instruction information for adjusting the volume to each speaker respectively based on the determined gain that needs to be adjusted for the audio signal corresponding to each speaker.
  • the mapping relationship between the gain that needs to be adjusted for the audio signal corresponding to the speaker and the adjustment value of the volume can be preset.
  • the electronic device 100 can query the mapping relationship. The adjustment value of the volume of the speaker is determined, and then instruction information is sent to the speaker, where the instruction information may include the adjustment value of the volume.
  • the electronic device 100 can also control the loudness of the audio signals played by each real speaker that is not related to the virtual speaker group to be lower than the preset loudness value, so as to reduce the interference of these speakers and enable subsequent use when needed. There will be no lagging when using these speakers.
  • the electronic device 100 can control each real speaker unrelated to the virtual speaker group to adjust the volume to the minimum, or adjust the gain of the audio signals corresponding to these speakers to the minimum, etc.
  • the electronic device 100 can also control each real speaker independent of the virtual speaker group to suspend operation.
  • the methods described in the above embodiments can also process speakers in other directions to construct corresponding surround sound.
  • the speakers arranged at the top of the space can be processed.
  • the processing method can refer to the aforementioned method, which will not be described here.
  • the electronic device can generate pictures (for example, the user uses the electronic device to watch movies, etc.), and the electronic device plays the audio data on it through the speakers arranged in the space.
  • FIG. 20 shows an application scenario in some embodiments of the present application.
  • six speakers namely speakers SP1, SP2, SP3, SP4, SP5, and SP6, are arranged in vehicle 200.
  • the user U11 uses the electronic device 100 to watch a movie in the right rear seat of the vehicle 200 , and the electronic device 100 and the vehicle 200 establish a connection through a short-range communication method such as Bluetooth.
  • the audio data in the electronic device 100 can be played through the speakers in the vehicle 200 to obtain a better listening experience.
  • FIG. 20 shows another application scenario in some embodiments of the present application.
  • speakers namely speakers SP1, SP2, SP3, SP4 and SP5
  • speakers are configured at fixed positions in the room according to certain requirements (for example, 5.1.X, etc.).
  • User U11 uses the electronic device 100 to watch movies on a seat in the room, and a connection is established between the electronic device 100 and the speakers in the room through short-range communication methods such as Bluetooth.
  • the audio data in the electronic device 100 can be played through speakers in the room, thereby obtaining a better listening experience.
  • FIG. 20 shows yet another application scenario in some embodiments of the present application.
  • speakers SP1, SP2, SP3, and SP4 are arranged in the room, and the projection device 400 is arranged.
  • the projection device 400 can be used to project movies and other content in the electronic device 100 onto the wall 500 .
  • the electronic device 100 can establish a connection with the speakers in the room through short-range communication methods such as Bluetooth.
  • the audio data in the electronic device 100 can be played through speakers in the room, thereby obtaining a better listening experience.
  • embodiments of the present application provide a sound processing method, which can construct a sound system around the electronic device 100 (or a picture generated based on the electronic device 100) based on the speakers arranged in the space.
  • the virtual speaker group of the virtual speaker enables the audio data in the electronic device 100 to be played by the virtual speaker group, thereby solving the problem of out-of-synchronization of audio and video and improving the user's consistent listening and visual experience.
  • a camera 300 may also be configured.
  • the camera 300 can collect images of the user U11 and the electronic device 100 in space, so as to determine the positions of the user U11's head and the electronic device 100 in the space from the collected images, and/or, between the electronic device 100 and each speaker. distance etc.
  • the camera 300 may also be arranged.
  • the camera 300 can collect images in space of the user U11, the electronic device 100 and the screen generated based on the electronic device 100, so as to determine the position of the head of the user U11 and the electronic device 100 in the space based on the collected images.
  • the electronic device 100 The distance to each speaker is based on the distance between the picture produced by the electronic device 100 and each speaker, or based on the position of the picture produced by the electronic device 100 .
  • the camera 300 can establish a connection with a controller (not shown in the figure) used to control each speaker through a wired network or a wireless network (such as Bluetooth, etc.), so that the camera 300 can collect the images it collects. Transmitted to the controller, so that the controller processes the image, such as inputting the image to a pre-trained image processing model, so that the model outputs the position of the head of the user U11 and the electronic device 100 in space, and/or , the distance between the electronic device 100 and each speaker, etc.
  • the image processing model can be, but is not limited to, trained based on a convolutional neural network (convolutional neural network, CNN).
  • the camera 300 can establish a connection with the electronic device 100 through a wireless network (such as Bluetooth, etc.). In this way, the camera 300 can transmit the images it collects to the electronic device 100 so that the electronic device 100 can perform processing on the images. Processing, such as inputting the image to a pre-trained image processing model, so that the model outputs the position of the user U11's head and the electronic device 100 in space, the distance between the electronic device 100 and each speaker, generated based on the electronic device 100 The distance between the picture and each speaker, or the position of the picture generated by the electronic device 100, etc.
  • a wireless network such as Bluetooth, etc.
  • the electronic device 100 can establish a connection with each speaker through a wireless network (such as Bluetooth, etc.).
  • a wireless network such as Bluetooth, etc.
  • the distance between the electronic device 100 and each speaker can also be determined.
  • the wireless communication signal is determined.
  • the position of the electronic device 100 in space can be determined through a ranging method based on the received signal strength indication (RSSI), and/or the distance between the electronic device 100 and each speaker can be determined. distance.
  • RSSI received signal strength indication
  • the execution subject that determines the distance between user A and each speaker may be the electronic device 100, or it may be a controller (not shown in the figure) used to control each speaker, or it may be located in the scene shown in Figure 1
  • Other devices in the device are not limited here.
  • the electronic device 100 determines that the electronic device 100 is the execution subject When it comes to the distance from a speaker, the distance between the electronic device 100 and a certain speaker can be determined through "Formula 1" described in the scenario 3.1.
  • the controller for controlling each speaker determines the distance between the electronic device 100 and the speaker for the execution subject, reference may be made to the manner in which the electronic device 100 determines the distance between the electronic device 100 and the speaker for the execution subject, which will not be described again here.
  • the location of the electronic device 100 can be determined based on the distance between the electronic device 100 and at least three speakers. For example, as shown in Figure 24, if the distance between the electronic device 100 and the speaker SP1 is d1, the distance between the electronic device 100 and the speaker SP2 is d2, and the distance between the electronic device 100 and the speaker SP3 is d3, since the positions of the speakers SP1, SP2 and SP3 are known and fixed, so you can draw a circle with the location of each speaker as the center and the distance between the corresponding speaker and the electronic device 100 as the radius. The intersection of these three circles (ie, the position E in the figure) is the electronic The location of the device 100.
  • Figure 22 shows the flow of a sound processing method in some embodiments of the present application.
  • the connection between the electronic device 100 and each speaker may be established through, but is not limited to, Bluetooth.
  • the method shown in Figure 22 can be, but is not limited to, applied in the scenario shown in (A) or (B) of Figure 20 .
  • the execution subject of the method shown in FIG. 22 may be the electronic device 100.
  • the sound processing method includes the following steps:
  • the electronic device 100 determines its target position in the target space, and at least one speaker is configured in the target space.
  • the electronic device 100 can determine its position in the target space based on the image collected by the camera in the space where the user is located, or can also determine its position in the target space based on the wireless communication signals between it and each speaker. s position.
  • the electronic device 100 constructs a virtual space that matches the target space according to the target position, and the volume of the virtual space is smaller than the volume of the target space.
  • the electronic device 100 can place the target position in a preset space model, and associate the target position with a certain component or area in the target space in the space model, that is, the target position in the space model As the position of a certain component or area in the target space, a virtual space that matches the target space is constructed.
  • the virtual space can be understood as a miniaturized target space.
  • the virtual space can be formed by reducing the target space according to a certain proportion.
  • the virtual space may be a preset space that can surround the user.
  • the space model may be a small virtual vehicle, and in the virtual vehicle, the target position may be placed at the position of the display screen of the vehicle 200 .
  • the space model may be a small virtual room, in which the target position may be placed on the wall directly in front of the user U11 in the room.
  • the electronic device 100 constructs a virtual speaker group in the virtual space according to the position of each speaker in the target space.
  • the virtual speaker group includes virtual speakers corresponding to each speaker in the target space.
  • the electronic device 100 can determine the position of the virtual speaker corresponding to each speaker in the target space in the virtual space based on the ratio between the virtual space and the target space.
  • the virtual space at this time is the virtual vehicle 41, and the position of the electronic device 100 is the display screen of the vehicle machine in the virtual vehicle 2301. s position.
  • the distance and angle between the display screen of the vehicle machine and each speaker are fixed. If the ratio between the virtual vehicle 2301 and the vehicle 200 is 1:10, the distance between the speaker SP1 and the display screen 210 of the vehicle 200 is d1, and the angle is ⁇ , then in the virtual vehicle 2301, with the electronic device 100 is d1/10 away from the position and the angle is ⁇ , arrange a virtual speaker VSP1.
  • the method of arranging other virtual speakers in the virtual vehicle 2301 reference may be made to the method of arranging the virtual speaker VSP1, which will not be described again here.
  • the distance between the virtual speaker and the speaker in the target space can be used to determine the gain that needs to be adjusted for the audio signal corresponding to the speaker in the target space, and each speaker in the target space can be adjusted.
  • the gain of the corresponding audio signal is adjusted to construct a virtual speaker group, thereby mapping each speaker in the target space to the virtual space.
  • the virtual speaker group includes virtual speakers corresponding to each speaker in the target space.
  • the virtual speakers can be understood as virtual speakers, and the speakers configured in the vehicle 200 can be understood as real speakers.
  • the electronic device 100 can determine the gain that needs to be adjusted for the audio signal corresponding to each speaker based on the distance between the speakers in the target space and the virtual speakers in the virtual space and a preset distance model.
  • the gain of the corresponding audio signal is adjusted, and the adjustment value is g1, that is, the speaker SP1 can be mapped to the virtual vehicle 41, thereby constructing a virtual speaker corresponding to the speaker SP1 in the virtual vehicle 41.
  • the electronic device 100 can also record the gain value that needs to be adjusted for the audio signal corresponding to each speaker, and then adjust the audio data to be played later.
  • the electronic device 100 uses the virtual speaker group to play the target audio data.
  • the gain of each channel in the target audio data is obtained by the gain that needs to be adjusted based on the audio signal corresponding to each speaker in the target space during the process of constructing the virtual speaker group.
  • the electronic device 100 can use the virtual speaker group to play target audio data.
  • the electronic device 100 can transmit audio signals corresponding to different channels included in the target audio data to corresponding speakers for playing.
  • the gain of each channel in the target audio data is obtained by the gain that needs to be adjusted based on the audio signal corresponding to each speaker in the target space during the process of constructing the virtual speaker group.
  • the sound the user hears is approximately generated from the electronic device and surrounds the user, so that the picture played by the electronic device Synchronized with the sound, it improves the user’s hearing and visual consistency experience.
  • Figure 24 shows the flow of another sound processing method in some embodiments of the present application.
  • the connection between the electronic device 100 and each speaker may be established through, but is not limited to, Bluetooth.
  • the method shown in Figure 24 can be, but is not limited to, applied in the scenario shown in (A) or (B) of Figure 20 .
  • the execution subject of the method shown in FIG. 24 may be the electronic device 100.
  • the sound processing method includes the following steps:
  • the electronic device 100 determines the first distance between it and the user's head, and determines the first position of the user's head in the target space. At least one speaker is configured in the target space.
  • the electronic device 100 can determine the first distance between the image and the user's head based on the image collected by the camera in the space where the user is located, and determine the first position of the user's head in the target space.
  • the electronic device 100 constructs a virtual speaker group according to the first distance, the first position and the position of each speaker in the target space.
  • the virtual speaker group includes virtual speakers corresponding to each speaker in the target space.
  • Each virtual speaker The sounders are all located on a circle with the first position as the center and the first distance as the radius.
  • the electronic device 100 can construct a circle with the first distance as the radius and the first position as the center, and virtualize each speaker in the target space onto the circle.
  • the electronic device 100 can virtualize each speaker in the target space to the circle constructed by it based on the distance between the first position and the position of each speaker. For example, a distance model can be preset, and the distance between the first position and the position of each speaker is input into the distance model. The gain that needs to be adjusted for the audio signal corresponding to each speaker can be obtained, and the target space can be calculated. The gain of the audio signal corresponding to each speaker is adjusted to construct a virtual speaker group.
  • g is the audio signal corresponding to the speaker that needs to be adjusted.
  • Gain, k and b are constants
  • the electronic device 100 The gain of the audio signal corresponding to the speaker SP1 in the audio data to be played is adjusted, and the adjustment value is g1.
  • the speaker SP1 can be virtualized on the circle constructed by it, and the virtual speaker VSP1 is obtained.
  • the electronic device 100 can virtualize other speakers in the vehicle 200 onto the circle it constructs, that is, construct a virtual speaker group.
  • the electronic device 100 uses the virtual speaker group to play the target audio data.
  • the gain of each channel in the target audio data is obtained by the gain that needs to be adjusted based on the audio signal corresponding to each speaker in the target space during the process of constructing the virtual speaker group.
  • the electronic device 100 can use the virtual speaker group to play target audio data.
  • the electronic device 100 can transmit audio signals corresponding to different channels included in the target audio data to corresponding speakers for playing.
  • the gain of each channel in the target audio data is obtained by the gain that needs to be adjusted based on the audio signal corresponding to each speaker in the target space during the process of constructing the virtual speaker group.
  • a vector base amplitude panning (VBAP) algorithm can also be used to construct another virtual speaker group from the virtual speaker group.
  • the newly constructed virtual speaker group can be composed of M virtual speakers.
  • the value of M is equal to the number of speakers required to construct spatial surround sound, and the arrangement of each virtual speaker in the virtual speaker group is consistent with the number of speakers required to construct spatial surround sound.
  • the speakers are arranged in the same way.
  • the electronic device 100 can use the virtual speaker group to play target audio data. In this way, users can enjoy spatial surround sound, improving user experience.
  • the sound the user hears is approximately generated from the electronic device and surrounds the user, so that the picture played by the electronic device Synchronized with the sound, it improves the user’s hearing and visual consistency experience.
  • FIG. 26 shows the flow of yet another sound processing method in some embodiments of the present application.
  • the connection between the electronic device 100 and each speaker may be established through, but is not limited to, Bluetooth.
  • the method shown in Figure 26 can be, but is not limited to, applied in the scenario shown in (A) or (B) of Figure 20 .
  • the execution subject of the method shown in Figure 26 may be an audio equipment control system, which may be used to control each speaker.
  • S802 and S803 can refer to the descriptions in S2202 and S2203 in Figure 22; in addition, in S2603, the audio equipment control system records the gain that needs to be adjusted for the audio signal corresponding to each speaker, and in S2203, the electronic equipment 100 can not only record the gain that needs to be adjusted for the audio signal corresponding to each speaker, but also directly adjust the gain of the corresponding channel in the audio data to be played.
  • the sound processing method includes the following steps:
  • the audio equipment control system determines the target position of the electronic device 100 in the target space, and at least one speaker is configured in the target space.
  • the audio equipment control system can determine the position of the electronic device 100 in the target space based on the images collected by the camera in the space where the user is located, or can also determine the position of the electronic device 100 based on the wireless communication signals between the electronic device 100 and each speaker. The location of the device 100 in the target space.
  • the audio equipment control system constructs a virtual space that matches the target space according to the target position.
  • the volume of the virtual space is smaller than the volume of the target space.
  • the audio equipment control system constructs a virtual speaker group in the virtual space according to the position of each speaker in the target space.
  • the virtual speaker group includes virtual speakers corresponding to each speaker in the target space.
  • the audio equipment control system obtains the target audio data sent by the electronic device 100 and uses the gain that needs to be adjusted for the audio signal corresponding to each speaker, adjusts the gain of each channel in the target audio data, and plays the adjusted target audio data.
  • the audio equipment control system after the audio equipment control system obtains the target audio data sent by the electronic device 100, it can adjust each of the target audio data according to the gain that needs to be adjusted for the audio signals corresponding to each speaker recorded in the process of constructing the virtual speaker group.
  • the gain of the channel is adjusted, and the adjusted target audio data is played.
  • the sound the user hears is approximately generated from the electronic device and surrounds the user, so that the picture played by the electronic device Synchronized with the sound, it improves the user’s hearing and visual consistency experience.
  • Figure 27 shows the flow of yet another sound processing method in some embodiments of the present application.
  • the connection between the electronic device 100 and each speaker may be established through, but is not limited to, Bluetooth.
  • the method shown in Figure 27 can be, but is not limited to, applied in the scenario shown in (A) or (B) of Figure 20 .
  • the execution subject of the method shown in Figure 27 may be an audio equipment control system, which may be used to control each speaker.
  • the sound processing method includes the following steps:
  • the audio equipment control system determines the first distance between the electronic device 100 and the user's head, and determines the first position of the user's head in the target space. At least one speaker is configured in the target space.
  • the audio equipment control system can determine the first distance between the electronic device 100 and the user's head based on the image collected by the camera in the space where the user is located, and determine the first distance of the user's head in the target space. Location.
  • the audio equipment control system constructs a virtual speaker group based on the first distance, the first position and the position of each speaker in the target space.
  • the virtual speaker group includes virtual speakers corresponding to each speaker in the target space, and each virtual speaker is in the following position. On a circle with the first position as the center and the first distance as the radius.
  • the audio equipment control system obtains the target audio data sent by the electronic device 100 and uses the gain that needs to be adjusted for the audio signal corresponding to each speaker, adjusts the gain of each channel in the target audio data, and plays the adjusted target audio data.
  • a vector base amplitude panning (VBAP) algorithm may also be used to construct another virtual speaker group from the virtual speaker group.
  • the newly constructed virtual speaker group can be composed of M virtual speakers.
  • the value of M is related to the number of speakers required to build spatial surround sound. The amounts are equal, and the individual virtual speakers in the virtual speaker group are arranged in the same way as the speakers required to build spatial surround sound.
  • the audio equipment control system can use the virtual speaker group to play target audio data. In this way, users can enjoy spatial surround sound, improving user experience.
  • the sound the user hears is approximately generated from the electronic device and surrounds the user, so that the picture played by the electronic device Synchronized with the sound, it improves the user’s hearing and visual consistency experience.
  • Figure 28 shows the flow of yet another sound processing method in some embodiments of the present application.
  • the connection between the electronic device 100 and each speaker may be established through, but is not limited to, Bluetooth.
  • the method shown in Figure 28 can be, but is not limited to, applied in the scenario shown in (C) of Figure 20 .
  • the execution subject of the method shown in FIG. 28 may be the electronic device 100.
  • S2803 to S2804 reference can be made to the description of S2203 to S2204 in Figure 22, which will not be described again here.
  • the sound processing method includes the following steps:
  • the electronic device 100 determines the target position in the target space based on the picture generated by the electronic device 100. At least one speaker is configured in the target space.
  • the electronic device 100 can obtain the target position of the picture it generates in the target space through the image captured by the camera.
  • the user can also configure the target location in the electronic device 100 in advance, and the details can be determined according to the actual situation, which is not limited here.
  • the electronic device 100 constructs a virtual space that matches the target space according to the target position, and the volume of the virtual space is smaller than the volume of the target space.
  • the electronic device 100 can place the target position in a preset space model, and associate the target position with a certain component or area in the target space in the space model, that is, the target position in the space model As the position of a certain component or area in the target space, a virtual space that matches the target space is constructed.
  • the virtual space can be understood as a miniaturized target space.
  • the virtual space can be formed by reducing the target space according to a certain proportion.
  • the virtual space may be a preset space.
  • the space model can be a small virtual room, in which the target position can be placed somewhere on the wall 500 directly in front of the user U11 in the room. Location.
  • the electronic device 100 constructs a virtual speaker group in the virtual space according to the position of each speaker in the target space.
  • the virtual speaker group includes virtual speakers corresponding to each speaker in the target space.
  • the electronic device 100 uses the virtual speaker group to play the target audio data.
  • the gain of each channel in the target audio data is obtained by the gain that needs to be adjusted based on the audio signal corresponding to each speaker in the target space during the process of constructing the virtual speaker group.
  • the projection device uses the projection device to watch the picture on the electronic device and uses the audio data on the electronic device used by the external speaker player, the sound the user hears is approximately generated from the picture projected by the projection device. This synchronizes the pictures and sounds generated by electronic devices, improving the user's hearing and visual consistency experience.
  • FIG. 29 shows the flow of yet another sound processing method in some embodiments of the present application.
  • the connection between the electronic device 100 and each speaker may be established through, but is not limited to, Bluetooth.
  • the method shown in Figure 29 can be, but is not limited to, applied in the scenario shown in (C) of Figure 20 .
  • the execution subject of the method shown in Figure 29 may be an audio equipment control system, which may be used to control each speaker.
  • S2901 to S2903 can refer to the description of S2801 to S2803 in Figure 28, and S2904 in Figure 29 can refer to the description of S2604 in Figure 26.
  • the sound processing method includes the following steps:
  • the audio equipment control system determines the target position in the target space based on the picture generated by the electronic device 100. At least one speaker is configured in the target space.
  • the audio equipment control system constructs a virtual space that matches the target space according to the target position.
  • the volume of the virtual space is smaller than the volume of the target space.
  • the audio equipment control system constructs a virtual speaker group in the virtual space according to the position of each speaker in the target space.
  • the virtual speaker group includes virtual speakers corresponding to each speaker in the target space.
  • the audio equipment control system obtains the target audio data sent by the electronic device 100 and uses the gain that needs to be adjusted for the audio signal corresponding to each speaker, adjusts the gain of each channel in the target audio data, and plays the adjusted target audio data.
  • the projection device uses the projection device to watch the picture on the electronic device and uses the audio data on the electronic device used by the external speaker player, the sound the user hears is approximately generated from the picture projected by the projection device. This synchronizes the pictures and sounds generated by electronic devices, improving the user's hearing and visual consistency experience.
  • the target space can also be determined respectively.
  • the corresponding delay of each speaker in the speaker is so that the picture the user sees and the sound he hears match, thereby improving the user experience.
  • the target distance between the user and the picture generated by the electronic device 100 can be used as a reference, and the time delay of each speaker in the target space can be determined based on the target distance.
  • the distance between the user and the picture generated based on the electronic device 100 may be: the distance between the user U11 and the electronic device 100 ; in FIG. 20
  • the distance between the user and the picture generated by the electronic device 100 can be: the distance between the user U11 and the wall 500. This distance can be, but is not limited to, obtained through the camera 300 in the room.
  • the speaker when the calculated delay of a speaker is a positive number, it indicates that the speaker is farther away from the user, so the speaker can be controlled to play in advance.
  • the advance time can be the determined delay. delay; when the calculated delay of a speaker is a negative number, it indicates that the speaker is closer to the user, so the speaker can be controlled to delay playback.
  • the delay time can be the absolute value of the determined delay. value.
  • the electronic device 100 or the audio equipment control system can control each speaker to play the audio data in advance or delay according to the corresponding time delay. This allows the images the user sees and the sounds they hear to match, thereby improving the user experience.
  • a distance can be selected as a reference distance from the determined distance between the picture produced by the target device and each speaker in the target space; and based on the reference distance, the appearance time of the picture produced by the target device is determined.
  • the reference distance may be the largest distance among the determined distances between the picture generated by the target device and each speaker in the target space.
  • the delay time of the sound generated by the generated picture relative to the sound generated by the speaker corresponding to the reference distance can be determined; then, the target device is controlled at the speaker corresponding to the reference distance. After the corresponding audio data is played and the delay time is reached, the corresponding picture is displayed. For example, if the determined delay time is 3 seconds, the time when the speaker corresponding to the reference distance plays the corresponding audio data is t, then the time when the picture generated by the target device appears is (t+3).
  • the vehicle when a user is using a new energy vehicle (hereinafter referred to as a "vehicle"), the vehicle can loop and play sound waves according to its own driving status. For example, a vehicle can gradually increase the volume of its speakers to the maximum value when accelerating, and gradually reduce the volume of its speakers to the minimum value when decelerating. However, in this way, the sound wave only changes in volume, but does not change in space, that is, no spatial sound wave is formed. This makes the sound wave played by the vehicle very different from the actual driving state.
  • a vehicle can gradually increase the volume of its speakers to the maximum value when accelerating, and gradually reduce the volume of its speakers to the minimum value when decelerating.
  • the sound wave only changes in volume, but does not change in space, that is, no spatial sound wave is formed. This makes the sound wave played by the vehicle very different from the actual driving state.
  • the sound wave still only changes in volume, but does not change in space, that is, no spatial sound wave is formed, and the user experience is poor.
  • embodiments of the present application provide a sound processing method.
  • This method can cause spatial changes in the sound waves when the user uses the vehicle, so that the Doppler effect appears inside the vehicle, thereby making the vehicle
  • the sound waves played are consistent with the real driving conditions, making the listening experience more realistic and improving the user experience.
  • FIG. 30 shows the hardware structure of a vehicle.
  • the vehicle 200 may be equipped with an electronic device 100 and a speaker 210 .
  • the electronic device 100 can transmit the sound waves to the speaker 210 for playing through the speaker 210 .
  • the electronic device 100 may be, but is not limited to, a vehicle-mounted terminal.
  • the number and position of the speakers 210 can be configured according to requirements and are not limited here.
  • the vehicle 200 may also be equipped with components necessary for its normal operation, such as various sensors, etc., which are not limited here.
  • the vehicle 200 may be configured with sensors for sensing the motion state of the vehicle, such as speed sensors, acceleration sensors, etc.
  • Figure 31 shows a sound processing method. It can be understood that this method can be, but is not limited to, executed by an electronic device (such as a vehicle-mounted terminal, etc.) configured in the vehicle. As shown in Figure 31, the sound processing method may include the following steps:
  • the electronic device 100 determines the current driving parameters of the vehicle 200.
  • the driving parameters include one or more of driving speed, rotational speed, and accelerator pedal opening.
  • the driving parameters can be transmitted to the electronic device 100.
  • the electronic device 100 determines the first audio data corresponding to the driving speed according to the driving parameters.
  • the electronic device 100 can determine the first audio data corresponding to the driving parameters based on the driving parameters and the preconfigured original audio data.
  • the electronic device 100 may first obtain audio particles obtained from original audio data.
  • each audio particle can correspond to a driving speed of the vehicle.
  • audio particles can be understood as data formed by dividing original audio data into very short segments (such as segments measured in milliseconds, etc.).
  • the original audio data can be default audio data, or audio data selected by the user, which is not limited here.
  • a selection portal may be configured on the electronic device 100 for the user to make a selection.
  • the electronic device 100 can determine the audio particles corresponding to the current driving parameters based on the mapping relationship between the driving parameters and the audio particles.
  • the current acceleration of the vehicle 200 is used to perform a stretching transformation on the determined audio particles to adjust the data length of the audio particles so that the playback speed of the audio particles matches the current driving state.
  • the first audio data are audio particles after scaling transformation.
  • the driving parameter is driving speed
  • the mapping relationship between driving speed and audio particles is: when the speed is a1, the audio particles are audio particles b1; when the speed is a2, the audio particles are audio particles b2.
  • the vehicle's driving speed determined at time t1 is a2, based on the mapping relationship between the driving speed and audio particles, it can be determined that the currently required audio particles are audio particles b2. If the traveling speed of vehicle 200 is determined to be a0 at time t0, the acceleration of vehicle 200 at time t1 is (a2-a0)/(t1-t0).
  • the determined acceleration can be used to query the mapping relationship between the preset acceleration and the telescopic change value, and the telescopic transformation value corresponding to the current acceleration can be determined.
  • the audio particle b2 can be processed through a time-scale modificatio (TSM) algorithm to complete the scaling transformation of the audio particle b2, and then obtain the first audio data.
  • TSM time-scale modificatio
  • different scaling transformation values can be first used to perform scaling transformation on the original audio data. Then, the scaled audio data is segmented separately.
  • the audio particles segmented in this way can correspond to an audio particle in the original audio data, and each audio particle in the original audio data corresponds to a particle group, and the particle group includes at least one after the scaling transformation.
  • audio particles, and different audio particles in this particle group correspond to different expansion and contraction values. Since each traveling speed can correspond to one audio particle in the original audio data, each traveling speed can correspond to one of the aforementioned particle groups.
  • an audio particle can correspond to a speed range, that is, the speeds in the speed range all correspond to the same particle.
  • the speed interval corresponding to audio particle a can be (20km/h, 25km/h).
  • the scaling change values x1 and x2 are used to perform scaling transformation on the original audio data respectively, and the audio data after scaling transformation is segmented, then for the audio particle b0 in the original audio data, the scaling variation value can be used.
  • the audio particle b1 after the scaling transformation of x1 corresponds to the audio particle b2 after the scaling transformation using the scaling change value x2.
  • the particle group corresponding to the audio particle b0 consists of audio particles b1 and b2.
  • the audio particles b0, b1 and b2 correspond to the same time points; in addition, it can also be understood that the audio particle b1 is obtained by scaling the audio particle b0 using the scaling change value x1, and the audio particle b2 is obtained by using the scaling change value x2.
  • the audio particle b0 is obtained by scaling transformation.
  • a particle group can be determined based on the traveling speed. Then, based on the current acceleration, the mapping relationship between the preset acceleration and the telescopic change value is queried, and the telescopic transformation value corresponding to the current acceleration is determined. Finally, based on the scaling change value, the relationship between each audio particle in the particle group and the scaling change value can be queried, and the required audio particles can be determined from the particle group, and the audio particles are the first audio data.
  • the driving parameter is the rotation speed or the opening of the accelerator pedal
  • the electronic device 100 adjusts the gain of each channel in the first audio data according to the driving parameters to obtain the second audio data.
  • the electronic device 100 can determine the gain that needs to be adjusted based on the driving speed and the preset gain adjustment model, and adjust the gain of each channel in the first audio data to obtain the second audio data.
  • the acceleration in the linear model can be determined by the relationship between driving speed, time and acceleration. At this time, it can be understood that the acceleration of the vehicle is first determined based on the driving speed, and then the first audio number is adjusted based on the acceleration. The gain of each channel in the data.
  • the range of each gain adjustment can be set.
  • the maximum value of the preset range can be used as the gain for this adjustment.
  • a condition for adjusting the gain in order to prevent the volume from suddenly rising or falling, can be set. For example, when the change in driving speed exceeds a preset speed value (such as 3km/h, etc.), the gain can be adjusted. , otherwise the gain will not be adjusted. In other words, when the variation value of the traveling speed of the vehicle 200 exceeds a certain speed value, the gain may be adjusted.
  • a preset speed value such as 3km/h, etc.
  • the electronic device 100 determines the target speed at which the sound field moves in the target direction based on the driving parameters.
  • the electronic device 100 can use the traveling speed of the vehicle 200 to determine the acceleration of the vehicle 200 . Then, the determined acceleration is used to query the mapping relationship between the preset acceleration and the speed at which the sound field moves toward the target direction, and the target speed at which the sound field moves toward the target direction is determined.
  • the target direction may be the front of vehicle 200 toward the rear.
  • the electronic device 100 plays the second audio data using the speakers in the target speaker group, and adjusts the gain of each channel in the second audio data according to the target speed.
  • the target speaker group includes at least two speakers.
  • the target speaker group uses Controls the sound field to move in the target direction at the target speed.
  • the initial position of the sound field may be preset.
  • the initial position of the vehicle 200 may be a certain position in front of the driver in the vehicle 200 .
  • the position of the sound field can be gradually controlled to move from the initial position to the rear of the vehicle 200 according to the target speed.
  • the position of the sound field can be understood as the position of the sound source perceived by the user.
  • the speaker SP1 is arranged in front of the left side of the driver
  • the speaker SP2 is arranged in front of the driver's right side. ahead.
  • the area where position 3201 is located may be the initial position of the sound field.
  • the initial position may be the gain of the audio signals corresponding to the default speakers SP1 and SP2, and the position of the sound field when the two speakers play sound.
  • the position of the sound field at the next moment can be determined based on the target speed of the movement of the sound field, such as the area where the position 3202 in FIG. 32(B) is located.
  • a virtual speaker VSP1 can be created at position 3202 by adjusting the gains of the audio signals corresponding to speakers SP1 and SP2.
  • the gain of the corresponding channel in the second audio data is adjusted using the gain that needs to be adjusted for the audio signal corresponding to the speaker SP1; and the corresponding gain of the corresponding channel in the second audio data is adjusted using the gain that needs to be adjusted for the audio signal corresponding to the speaker SP2.
  • the gain of each channel in the second audio data is adjusted, thereby completing the adjustment of the gain of each channel in the second audio data.
  • the electronic device 100 may use the speakers SP1 and SP2 to play the second audio data. In this way, the driver hears the sound equivalently played at position 3202.
  • speaker SP1 and speaker SP2 are the target speaker group.
  • the electronic device 100 can also adjust the volume of the corresponding speaker according to the gain that needs to be adjusted for the audio signal corresponding to each speaker. , and play the second audio data, thereby realizing movement of the sound field.
  • a vector base amplitude panning (VBAP) algorithm may be used to operate, but is not limited to.
  • the process of constructing a virtual speaker based on the VBAP algorithm can be referred to the description in the aforementioned scenario 3.1, and will not be described again here.
  • x1 is the distance between the initial position of the sound field and the reference point.
  • the distance, x2 is the distance between the current position of the sound field and the reference point.
  • the position of the user U11 is the reference point.
  • the movement can also be carried out in other ways, which are not limited here.
  • a virtual speaker can be virtualized on each side, and then the virtual speaker can be used to play the second audio data.
  • the speaker SP1 is arranged in front of the left side of the driver
  • the speaker SP2 is arranged in front of the right side of the driver
  • speaker SP3 is arranged directly to the left of the driver
  • speaker SP4 is arranged directly to the right of the driver.
  • the area where position 3301 is located may be the initial position of the sound field.
  • the initial position may be the gain of the audio signals corresponding to the default speakers SP1, SP2, SP3 and SP4, and when they play sound The location of the sound field.
  • the position of the sound field at the next moment can be determined based on the target speed of the movement of the sound field, such as the area where the position 3302 in FIG. 33(B) is located.
  • the gains of the audio signals corresponding to the speakers SP1 and SP3 can be created on the left side of the vehicle 200; by adjusting the gains of the audio signals corresponding to the speakers SP2 and SP4, a virtual speaker VSP1 can be created on the right side of the vehicle 200. Virtualize a virtual speaker VSP2.
  • the way to determine the gain that needs to be adjusted for the audio signals corresponding to the speakers SP1, SP2, SP3 and SP4 can be referred to the determination method described in Figure 32, for example, based on the distance gain model determination, etc., see the above description for details, here No longer.
  • the gain that needs to be adjusted for the audio signal corresponding to speaker SP1 can be used to adjust the gain of the corresponding channel in the second audio data; the gain that needs to be adjusted for the audio signal corresponding to speaker SP2 can be used to adjust the gain that needs to be adjusted for the audio signal corresponding to speaker SP2.
  • Adjust the gain of the channel use the gain that needs to be adjusted for the audio signal corresponding to speaker SP3 to adjust the gain of the corresponding channel in the second audio data; use the gain that needs to be adjusted for the audio signal corresponding to speaker SP4 to adjust the gain for the second audio data
  • the gain of the corresponding channel in the audio data is adjusted.
  • the electronic device 100 may play the second audio data through the speakers SP1, SP2, SP3, and SP4. In this way, the driver hears the sound equivalently played by the virtual speakers VSP1 and VSP2. This achieves the movement of the sound field in space.
  • speaker SP1, speaker SP2, speaker SP3 and speaker SP4 are the target speaker group.
  • the electronic device 100 can also adjust the volume of the corresponding speaker according to the gain that needs to be adjusted for the audio signal corresponding to each speaker. , and thereby realize the movement of the sound field.
  • the virtual position of the sound source of the target audio data may be determined first according to the target speed. Then, based on the virtual position, the speakers that control the movement of the sound field are filtered out from the vehicle. Then, based on the virtual position, the target gains that need to be adjusted for the audio signals corresponding to the multiple selected speakers can be determined, and F target gains are obtained, F ⁇ 2. Then, the gain of each channel in the second audio data can be adjusted according to the F target gains to obtain the target audio data. Finally, the target audio data can be played using the filtered speakers. Among them, the filtered speakers are the target speaker group.
  • Doppler processing can also be performed on the second audio data according to the target speed, the user's position, the initial position of the sound field, etc., so that the sound heard by the user has a tone change. process to improve user experience.
  • the movement of the sound field in the vehicle is controlled according to the speakers in the vehicle, so that the sound waves can change spatially, so that the Doppler effect can appear inside the vehicle, thereby making the sound played by the vehicle
  • the sound waves are consistent with the real driving conditions, making the listening experience more realistic and improving the user experience.
  • the color of the ambient light in the vehicle 200 may also be controlled to gradually change along with the acceleration duration of the vehicle 200 .
  • the atmosphere can be controlled The color of the light gradually changes from light to dark.
  • the color change speed of the ambient light can be controlled to be the same as the target speed of the sound field movement, so that the spatial hearing and spatial visual perception in the vehicle 200 correspond to each other and improve the user experience.
  • the ambient light in the vehicle 200 may be a light strip that may present gradient colors.
  • different noise floors ie, background noise
  • different audios in different speed ranges can be selected as the noise floor for mixed playback.
  • audio 1 and audio 2 can be preset audios, and different speed ranges can correspond to different audio particles. These audio particles are mainly used as noise floor.
  • the foregoing method in addition to being executed by electronic devices configured in the vehicle (such as vehicle-mounted terminals, etc.), the foregoing method can also be executed by electronic devices located in the vehicle and separated from the vehicle (such as mobile phones, etc.).
  • the arrangement positions of the speakers in the vehicle can be configured in advance in the electronic device, so that the electronic device can determine the gain that needs to be adjusted for the audio signal corresponding to each speaker.
  • the vehicle's driving speed can be transmitted from the vehicle to the electronic device, or it can be sensed by the electronic device itself, which is not limited here.
  • the electronic device can first adjust the gain of each channel in the second audio data, and then send the adjusted audio data to the vehicle for playback.
  • part of the aforementioned method can be executed by the vehicle or an electronic device integrated in the vehicle (such as a vehicle-mounted terminal, etc.), and the other part is executed by an electronic device separated from the vehicle (such as a mobile phone, etc.), that is, the aforementioned method
  • the execution entity of each step in the method can be adaptively adjusted according to needs, and the adjusted solution is still within the protection scope of this application.
  • Figure 35 shows an application scenario in some embodiments of the present application.
  • driver A drives vehicle 200 to a destination
  • driver A can navigate to the destination using the electronic device 100 located in vehicle 200 .
  • the characteristic parameters such as pitch, gain, etc.
  • the audio data broadcast by the electronic device 100 can be changed, so that the driver can increase his attention under the impact of hearing and achieve safe driving.
  • the electronic device 100 is located in the vehicle 200. It can be a device integrated in the vehicle 200, such as a vehicle-mounted terminal, or it can be a device separated from the vehicle 200, such as driver A's mobile phone, etc., which is not done here. limited.
  • the electronic device 100 can directly use the speakers in the vehicle 200 to broadcast the audio data it needs to broadcast.
  • the connection between the electronic device 100 and the vehicle 200 may be established through, but is not limited to, short-range communication (such as Bluetooth, etc.).
  • the electronic device 100 can transmit the audio data it needs to broadcast to the vehicle 200 and broadcast it through the speakers on the vehicle 200 , or the electronic device 100 can use its built-in The speaker broadcasts the audio data it needs to broadcast.
  • an image collection device such as a camera may be provided inside the vehicle 200 to collect the driver's facial data.
  • a speaker may be provided inside the vehicle 200 , and the navigation sound to be broadcast in the electronic device 100 may be broadcast through the speaker on the vehicle 200 .
  • a device for collecting road condition information may be provided outside the vehicle 200 Sensors (such as radar, cameras, etc.).
  • Figure 36 shows a sound processing method in some embodiments of the present application.
  • the electronic device 100 may be a device integrated in the vehicle 200 , such as a vehicle-mounted terminal, or may be a device separated from the vehicle 200 , such as driver A's mobile phone.
  • the method may include the following steps:
  • the electronic device 100 determines the driver's fatigue level.
  • the vehicle 200 may be equipped with an image collection device, such as a camera. Through this image acquisition device, driver A's facial data, such as eyes, mouth, etc., can be collected in real time or periodically (for example, every 2 seconds, 3 seconds, or 5 seconds, etc.).
  • the vehicle 200 may transmit the driver A's facial data collected by the image collection device on the vehicle 200 for a certain period of time (such as 5 seconds, etc.) to the electronic device 100 .
  • the vehicle 200 can cache facial data collected in a short time (5s or 10s) based on a dynamic sliding window.
  • the vehicle 200 can use the data of a certain time period (such as: 1s to 5s, 2s to 6s, or 3s to 7s, etc.) in the video it collects as the required facial data of driver A.
  • the electronic device 100 After the electronic device 100 obtains driver A's facial data, it can input the obtained facial data into a pre-trained fatigue monitoring model, so that the fatigue monitoring model outputs driver A's fatigue level.
  • the fatigue monitoring model can be, but is not limited to, trained based on a convolutional neural network (CNN).
  • the fatigue level of driver A can be determined based on the mapping relationship between the number of blinks, yawns, or nods of driver A and the fatigue level within a preset period of time.
  • Table 1 shows the mapping relationship between the number of blinks and the fatigue level.
  • driver A's fatigue level is level 3. It can be understood that the higher the fatigue level, the more fatigued the driver A is within the preset time period.
  • the electronic device 100 determines the target adjustment value of the first characteristic parameter according to the fatigue level.
  • the first characteristic parameter is the characteristic parameter of the audio data that currently needs to be played.
  • the electronic device 100 determines the fatigue level of driver A, it can query the mapping relationship between the preset fatigue level and the adjustment value of the characteristic parameter to determine the characteristic parameter of the audio data that currently needs to be played. target adjustment value.
  • the characteristic parameters may include: pitch and/or loudness, etc.
  • the electronic device 100 can determine the target adjustment value based on the relational expression corresponding to the fatigue level and the preset characteristic parameters.
  • the electronic device 100 processes the audio data that currently needs to be played according to the target adjustment value to obtain the target audio data, where the value of the characteristic parameter of the target audio data is higher than the value of the first characteristic parameter.
  • the electronic device 100 may adjust the pitch and/or loudness of the audio data of the currently required navigation sound to obtain the target audio data.
  • the value of the characteristic parameter of the target audio data is higher than the value of the first characteristic parameter.
  • the characteristic parameter is loudness
  • the unit of loudness is represented by a standardized value (such as amplification, etc.)
  • the electronic device 100 can adjust the loudness of the audio data of the navigation sound that currently needs to be broadcast.
  • the adjusted loudness is 1.5 times the original gain; if the target adjustment value is 10, and the unit of loudness is expressed in decibels, the electronic device 100 can adjust the loudness of the audio data of the currently required navigation sound, And the adjusted loudness is the sum of the original volume loudness and the target adjustment value.
  • the characteristic parameter is pitch
  • the electronic device 100 can increase the pitch of the audio data of the navigation sound currently to be broadcast to 1.2 times the original pitch based on the pitch shifting algorithm.
  • the pitch modification algorithm may be synchronized overlap-add (SOLA), fixed synchronized overlap-add and fixed synthesis (SOLAFS), time-domain pitch synchronized superposition (time-domain pitch synchronized) Time domain methods such as overlap-add (TD-PSOLA) and waveform similarity overlap-and-add (WSOLA), or frequency domain methods such as pitch-synchronized overlap-add (PSOLA) Law.
  • SOLA synchronized overlap-add
  • SOLAFS fixed synchronized overlap-add and fixed synthesis
  • time-domain pitch synchronized superposition time-domain pitch synchronized
  • Time domain methods such as overlap-add (TD-PSOLA) and waveform similarity overlap-and-add (WSOLA)
  • frequency domain methods such as pitch-synchronized overlap-add (PSOLA) Law.
  • the pitch of the audio data of the navigation sound that needs to be broadcast currently can be processed by changing the pitch without changing the speed.
  • variable pitch and constant speed uses the time domain method to achieve variable pitch and constant speed as an example.
  • the method of "variable speed and constant pitch + resampling" can generally be used to achieve the effect of changing pitch and constant speed.
  • the audio data of the navigation sound currently required to be broadcast can be processed with variable speed and constant pitch, and then resampled.
  • the audio data of the navigation sound currently to be broadcast can be framed in the original time domain x.
  • one frame of data that is, x m
  • the frame data can be added to the time domain y.
  • another frame of data that is, x m+1
  • one frame of data i.e. x m
  • another frame of data i.e.
  • the corresponding resampling factor P/Q can be selected to achieve P/Q times resampling, so that the speech speed and pitch after resampling become Q/P times the original.
  • P is the upsampling factor
  • Q is the downsampling factor.
  • the resampling process may include an upsampling process and a downsampling process.
  • the upsampling process is: interpolate (P-1) sampling points between each two adjacent sampling points in the original signal, so that the pitch period of the original signal becomes P times the original, and the duration becomes P times the original, that is, the fundamental frequency becomes 1/P times the original, the pitch drops to 1/P times the original, and the speaking speed becomes 1/P times the original.
  • the process of downsampling is: in the original signal, a sampling point is extracted every (Q-1) sampling points, so that the length of the pitch period becomes 1/Q times the original, and the duration becomes 1/Q times the original. That is, the fundamental frequency becomes Q times the original, the pitch rises to the original Q times, and the speaking speed becomes Q times the original.
  • both the speaking speed and pitch of the audio data can be modulated to the original Q/P times.
  • the electronic device 100 plays the target audio data.
  • the electronic device 100 can play the target audio data after obtaining the target audio data from the audio data of the currently required navigation sound. Since the value of the characteristic parameter of the target audio data is higher than the value of the first characteristic parameter (that is, the characteristic parameter of the audio data of the currently required navigation sound), the purpose of reminding the driver can be achieved. For example, when the played audio data has a higher pitch and/or a higher sound loudness, the sound heard by the driver will be harsher, thus stimulating the driver and thereby increasing the driver's attention.
  • the electronic device 100 may play the target audio data through its own speaker, or may transmit the target audio data to the vehicle 200 and be used by the vehicle 200 speaker to play. When the electronic device 100 is integrated in the vehicle 200, the electronic device 100 may play the target audio data through the speaker of the vehicle 200.
  • the characteristic parameters (such as pitch, loudness, etc.) of the audio data broadcast by the electronic device 100 can be changed according to the driver's fatigue level, so that the played audio data can be heard in the auditory sense. It has an impact on the driver, thereby improving the driver's attention and achieving safe driving.
  • the electronic device 100 can also determine the corresponding prompt voice according to the fatigue level, and broadcast the target audio data and prompt voice based on the preset broadcast sequence, thereby making the broadcast method and language more life-like and humane. , improve user experience.
  • the electronic device 100 can directly play the prompt voice.
  • the electronic device 100 can query the mapping relationship between the preset fatigue level and the prompt voice according to the fatigue level, and determine the currently required prompt voice.
  • the prompt voice corresponding to each fatigue level may be preset by the user, or may be a template sentence preset in the electronic device 100 .
  • the prompt voice can be determined to be "Attention! The driver is moderately fatigued, please open the window for ventilation.”
  • the target audio data is "Please turn left 50 meters ahead”
  • the audio data to be broadcast by the electronic device 100 can be "Please turn left 50 meters ahead. Attention! The driver is moderately fatigued, please open the window for ventilation.” .
  • the electronic device 100 can determine the corresponding prompt voice based on the fatigue level, and broadcast the prompt voice.
  • the electronic device 100 can also determine the prompt voice to be broadcast based on the fatigue level and the map information in the navigation. For example, continue to refer to Table 3.
  • the required prompt voice is "Attention! Attention! The driver is extremely tired and can stop and rest at xxx intersection/supermarket/transfer station xxx meters away.”
  • the electronic device 100 determines that there is a service area 500 meters away based on the map information in the navigation, the electronic device 100 can determine that the prompt voice to be broadcast is "Attention! Attention!
  • the driver is extremely tired and can be reached 500 meters away.” Park and rest in the service area far away.” Among them, the electronic device 100 can determine the "service area 500 meters away” determined by the map information in the navigation, and the prompt voice "Attention! Attention! The driver is extremely tired and can be located xxx meters away” determined according to the fatigue level. xxx intersection/supermarket/transit station stop and rest" to get the final prompt voice that needs to be broadcast.
  • the pulse code modulation (PCM) data of one piece of audio data is inserted into the PCM data of another piece of audio data at a certain time point, that is, the splicing of the two audio data is completed.
  • PCM pulse code modulation
  • the electronic device 100 can also determine the color and/or flashing frequency of the signal lights provided in the vehicle 200 according to the fatigue level; and control the vehicle 200
  • the signal lights in the system work with a certain color and/or flashing frequency, thereby visually impacting the driver, thereby improving the driver's attention, achieving safe driving, and achieving visual and auditory synchronization warnings with the navigation sound.
  • the electronic device 100 can query the mapping relationship between the preset fatigue level and the signal light, and determine the color and/or flashing frequency of the signal light, etc.
  • Table 2 shows the mapping relationship between the fatigue level and the color and flashing frequency of the signal light.
  • the driver's attention is generally not required at this time. But when the road conditions are poor (such as accident-prone roads), or when the user is in a critical road section (such as an intersection that requires a turn), the driver is often required to control the vehicle. Therefore, in order to improve the safety of the vehicle during autonomous driving, when the vehicle 200 is in the autonomous driving state, the electronic device 100 can combine the road condition information outside the vehicle 200 and broadcast the target audio data. In addition, when the vehicle 200 is not in the autonomous driving state, if the road condition on which the vehicle 200 is currently traveling is poor or it is on a critical road section that needs to remind the user, the electronic device 100 can also play the target audio data to remind the driver to pay attention.
  • the vehicle 200 can notify the electronic device 100 of the information that it is in the automatic driving state. In this way, the electronic device 100 can learn that the vehicle 200 is in an autonomous driving state.
  • the vehicle 200 can use its external sensors (such as radar, camera, etc.) to collect its external road condition information, and transmit the collected information to the electronic device 100 .
  • the electronic device 100 After the electronic device 100 obtains the road condition information outside the vehicle 200, it can broadcast the target audio data when the road conditions are poor.
  • Figure 38 shows a sound processing method in some embodiments of the present application.
  • the electronic device 100 is a device separated from the vehicle 200 , such as a mobile phone, etc., and a connection is established between the electronic device 100 and the vehicle 200 through a short-distance communication method such as Bluetooth.
  • the driver uses electronic device 100 for navigation.
  • S3801, S3802, S3804, and S3805 can refer to the relevant description in Figure 36, and will not be described again here.
  • the method may include the following steps:
  • the vehicle 200 obtains the driver's facial data.
  • the vehicle 200 determines the driver's fatigue level based on the driver's facial data.
  • the vehicle 200 sends the driver's fatigue level to the electronic device 100.
  • the vehicle 200 may send the fatigue level to the electronic device 100 .
  • the vehicle 200 may also directly send the driver's facial data obtained in step S3801 to the electronic device 100 . Further, the electronic device 100 may determine the driver's fatigue level based on the driver's facial data.
  • the electronic device 100 determines the target adjustment value of the first characteristic parameter according to the fatigue level.
  • the first characteristic parameter is the characteristic parameter of the audio data that currently needs to be played.
  • the electronic device 100 processes the audio data that currently needs to be played according to the target adjustment value to obtain target audio data, where the value of the characteristic parameter of the target audio data is higher than the value of the first characteristic parameter.
  • the electronic device 100 sends the target audio data to the vehicle 200.
  • the electronic device 100 may send the target audio data to the vehicle 200 .
  • the vehicle 200 plays the target audio data.
  • the vehicle 200 can play the target audio data.
  • the electronic device 100 may also play the target audio data through its own speaker, that is, the electronic device 100 does not need to send the target audio data to the vehicle 200.
  • the characteristic parameters (such as pitch, loudness, etc.) of the audio data broadcast by the electronic device 100 can be changed according to the driver's fatigue level, so that the played audio data can be heard in the auditory sense. It has an impact on the driver, thereby improving the driver's attention and achieving safe driving.
  • Figure 39 shows a sound processing method in some embodiments of the present application.
  • the electronic device 100 is a device separated from the vehicle 200 , such as a mobile phone, etc., and a connection is established between the electronic device 100 and the vehicle 200 through a short-distance communication method such as Bluetooth.
  • the driver uses electronic device 100 for navigation.
  • S3901 to S3906 please refer to the related descriptions mentioned above and will not be described again here.
  • the method may include the following steps:
  • the vehicle 200 obtains the driver's facial data.
  • the vehicle 200 determines the driver's fatigue level based on the driver's facial data.
  • the vehicle 200 determines the target adjustment value of the first characteristic parameter according to the fatigue level.
  • the first characteristic parameter is the characteristic parameter of the audio data that currently needs to be played.
  • the electronic device 100 sends the audio data to be played to the vehicle 200.
  • the vehicle 200 processes the audio data to be played according to the target adjustment value to obtain the target audio data, where the value of the characteristic parameter of the target audio data is higher than the value of the first characteristic parameter.
  • the vehicle 200 plays the target audio data.
  • the vehicle 200 may also send the target audio data to the electronic device 100, so that the electronic device 100 plays the target audio data.
  • the characteristic parameters (such as pitch, loudness, etc.) of the audio data broadcast by the electronic device 100 can be changed according to the driver's fatigue level, so that the played audio data can be heard in the auditory sense. It has an impact on the driver, thereby improving the driver's attention and achieving safe driving.
  • Figure 40 shows a sound processing method in some embodiments of the present application.
  • the electronic device 100 is a device separated from the vehicle 200 , such as a mobile phone, etc., and a connection is established between the electronic device 100 and the vehicle 200 through a short-distance communication method such as Bluetooth.
  • the driver uses electronic device 100 for navigation.
  • the method may include the following steps:
  • the vehicle 200 obtains the driver's facial data.
  • the vehicle 200 determines the driver's fatigue level based on the driver's facial data.
  • the vehicle 200 determines the target adjustment value of the first characteristic parameter according to the fatigue level.
  • the first characteristic parameter is the characteristic parameter of the audio data that currently needs to be played.
  • the vehicle 200 sends the target adjustment value to the electronic device 100.
  • the electronic device 100 processes the audio data to be played according to the target adjustment value to obtain target audio data, where the value of the characteristic parameter of the target audio data is higher than the value of the first characteristic parameter.
  • the electronic device 100 sends the target audio data to the vehicle 200.
  • the vehicle 200 plays the target audio data.
  • step S4006 the electronic device 100 may also play the target audio data through its own speaker, that is, the electronic device 100 does not need to send the target audio data to the vehicle 200.
  • the characteristic parameters (such as pitch, loudness, etc.) of the audio data broadcast by the electronic device 100 can be changed according to the driver's fatigue level, so that the played audio data can be heard in the auditory sense. It has an impact on the driver, thereby improving the driver's attention and achieving safe driving.
  • the data that can be interacted between the electronic device 100 and the vehicle 200 includes but is not limited to the driver's facial data, the driver's fatigue level, and the first characteristic.
  • the above-mentioned processes of determining the driver's fatigue level, determining the target adjustment value of the first characteristic parameter, and processing the audio data to be played can be completed on the electronic device 100 or on the vehicle 200 . For example, after the vehicle 200 obtains the driver's facial data, the vehicle 200 can determine the driver's fatigue level.
  • the vehicle 200 can also send the driver's facial data to the electronic device 100 , and the electronic device determines the driver's fatigue level.
  • the vehicle 200 can determine the target adjustment value of the first characteristic parameter according to the driver's fatigue level, and send the target adjustment value to the electronic device 100.
  • the electronic device 100 can also determine the first characteristic parameter itself according to the driver's fatigue level.
  • the target adjustment value of the characteristic parameter does not list them all.
  • each step in the above embodiments can be adaptively adjusted to the execution subject according to the actual situation, and the adjusted solution is still within the protection scope of this application.
  • the user selects a scenario where multiple audio data are overlaid and played.
  • embodiments of the present application provide a sound processing method.
  • This method can transform the white noise selected by the user based on the background sound selected by the user (i.e., the other sounds mentioned above), so that the two can be more naturally blended together, thereby giving the user a better listening experience. .
  • Figure 41 shows a sound processing method. It can be understood that this method can be executed by any device, device, platform, or device cluster with computing and processing capabilities. For example, it can be executed by, but is not limited to, speakers, mobile phones, etc.
  • the method may be executed when the user turns on a target function (such as a white noise function, etc.) and the user has a need to play audio data.
  • a target function such as a white noise function, etc.
  • the user can turn on the target function in the system of the device or an application (APP) on the device, and the user can use the device to play songs.
  • APP application
  • the speakers and other devices can be controlled through other devices connected to the speakers and other devices to enable target functions on the speakers and other devices, and the user can use the speakers and other devices to play songs.
  • the sound processing method may include the following steps:
  • the first audio data may be background sound
  • the second audio data may be white noise
  • the background sound may be, but is not limited to, a certain song.
  • the first audio data and the second audio data can be obtained from the network or the local database based on the user's selection.
  • an electronic device such as a mobile phone, etc.
  • an application application, APP
  • the user can select background sound and white noise on the APP.
  • the background sound can be obtained from the network or a local database based on the user's selection.
  • the mapping relationship between the preset background sound and the white noise can also be queried, and the white noise adapted to the background sound can be obtained from the network or a local database.
  • the first duration of the first audio data may be equal to the second duration of the second audio data, so that the two may be played synchronously.
  • the second duration of the second audio data is greater than the first duration of the first audio data
  • data of a duration equal to the first duration can be intercepted from the second audio data, and the intercepted data can be used as the required of the second audio data.
  • the first duration is 10 seconds and the second duration is 20 seconds
  • the first 10 seconds of data in the second audio data can be used as the required data, or the 5th second to the 15th second in the second audio data can be used. Seconds of data as required data.
  • multiple second audio data can be spliced, and data with a duration equal to the first duration can be intercepted from the spliced data, And use the intercepted data as the required second audio data.
  • the target audio features include: loudness at each moment and position points of each beat.
  • the amplitude of the waveform at each moment can be determined from the waveform diagram of the first audio data in the time domain, and then the loudness at each moment can be determined.
  • an amplitude is the loudness at a moment.
  • the first audio data can be input into a pre-trained machine learning model to obtain the position points of each beat; the machine learning model can be trained based on a deep learning neural network.
  • the first audio data can also be processed based on a beat detection algorithm (such as librosa, etc.) to obtain the position points of each beat in the first audio data.
  • the target loudness corresponding to each moment in the second audio data can be determined based on the loudness at each moment in the first audio data and in combination with the preset proportional relationship between noise loudness and music loudness. Further, the loudness at each moment in the second audio data can be adjusted so that the loudness at each moment is adjusted to the determined target loudness corresponding to each moment. For example, if the loudness at the first moment in the first audio data is 10 decibels, and the preset ratio between the noise loudness and the music loudness is 1/2, the target at the first moment in the second audio data can be determined. Loudness is 5 decibels. Further, the loudness of the second audio data at the first moment can be adjusted to 5 decibels.
  • the pitch of the second audio data may be adjusted based on the position points of each beat, so that the pitch of the second audio data matches the rhythm of the first audio data. For example, when the first audio data is soothing in a certain time period, the pitch of the second audio data in the time period can be lowered, so that the second audio data is also gradually soothing.
  • the preset reference rhythm determines whether to adjust the pitch of the second audio data, and, when it is necessary to adjust the pitch of the second audio data, determines whether to raise the pitch or lower the pitch.
  • the preset base tempo is: 30 beats per minute.
  • the time interval between two adjacent beats in the first audio data is 1 second, it can be determined that the rhythm corresponding to the two adjacent beats is 60 beats per minute.
  • the determined rhythm is greater than the reference rhythm, indicating that the rhythm of the first audio data is faster between the two adjacent beats. Therefore, the pitch in the second audio data can be increased within the same time period, so that the emotions expressed by the first audio data and the second audio data within the time period are the same.
  • the second audio data can be determined based on the rhythm determined by two adjacent beats and the preset mapping relationship between the rhythm and the pitch adjustment.
  • the target pitch adjustment value that needs to be adjusted for the pitch within the position points corresponding to these two beats.
  • the pitch of the data in the position points corresponding to the two beats in the second audio data can be adjusted. For example, when the target pitch adjustment value is 0.8, the pitch of the data in the position points corresponding to the two beats in the second audio data can be reduced to 0.8 times the original pitch based on the pitch shifting algorithm.
  • the purpose of raising the pitch can be achieved by extracting a certain number of sample points from the data that needs to be adjusted by using upsampling.
  • the sound speed of the second audio data i.e. audio playback speed
  • the second audio data can be determined based on the rhythm determined by two adjacent beats and the mapping relationship between the preset rhythm and the sound speed adjustment.
  • the target sound speed adjustment value that needs to be adjusted for the sound speed within the position points corresponding to these two beats. Then, based on the target sound speed adjustment value, the sound speed of the data in the position points corresponding to the two beats in the second audio data can be adjusted. For example, when the target sound speed adjustment value is 0.8, the sound speed of the data in the position points corresponding to the two beats in the second audio data can be reduced to 0.8 times the original sound speed.
  • a certain number of sampling points can be extracted from the data that needs to be adjusted by using upsampling to achieve the purpose of increasing the speed of sound.
  • the pitch and the sound speed of the second audio data can be adjusted at the same time or selectively, and are not limited here.
  • the third audio data can be obtained, and S4104 can be executed.
  • the target audio data is obtained based on the first audio data and the third audio data.
  • the first audio data and the third audio data can be mixed using a mixing algorithm to obtain target audio data.
  • a mixing algorithm For example, when the types of the first audio data and the third audio data are both floating point (float) types, the first audio data and the third audio data can be directly superimposed and mixed to obtain the target audio data.
  • the type of the first audio data and the third audio data is not float type, the first audio data and the third audio data can be processed using adaptive weighted mixing algorithm, linear superposition averaging and other mixing algorithms to obtain Target audio data.
  • the second audio data is transformed based on the audio characteristics of the first audio data, so that the two can be more naturally integrated, thereby giving the user a better listening experience.
  • making a video may involve editing the original video, or generating a video from multiple pictures, which is not limited here.
  • Dynamic pictures can be understood as graphics exchange format (GIF) files.
  • embodiments of the present application also provide a sound processing method.
  • users create videos or dynamic pictures on electronic devices, they can add spatial audio to the target objects in the video or dynamic pictures according to their own needs, so that In videos or dynamic pictures, the sound of the target object can move with the movement of the target object, making the user's hearing more realistic and improving the viewing experience.
  • This sound processing method has no requirements on the environment and information collection equipment, and the audio position of the object in the video or dynamic picture is consistent with the actual position of the object's audio, so that when subsequent users watch the video, there will be no separation between listening and viewing. Improved user experience.
  • Figure 42 shows a sound processing method. It can be understood that this method can be executed by any device, device, platform, or device cluster with computing and processing capabilities. As shown in Figure 42, the method may include the following steps:
  • the N pictures may be pictures selected by the user.
  • the user can select N pictures from electronic devices such as mobile phones to create a video from these N pictures.
  • the N pictures can also be pictures taken by users within a period of time. For example, when a user takes pictures using an electronic device such as a mobile phone, the pictures of a week, a month or a year can be determined as the required N pictures.
  • the N pictures may also be pictures extracted from the target video selected by the user according to the preset sampling frequency. For example, in the process of extracting N pictures from the target video, the moment corresponding to each extracted picture can be recorded. For example, if the sampling frequency is to collect one picture every second (S), and the time of the first picture collected is 0s, then the time of the second picture collected is 1s, and the time of the third picture collected is 1s. The time of the picture is 2s, and so on.
  • the N pictures can also be pictures extracted from dynamic pictures.
  • a dynamic picture can be understood as being formed by splicing multiple pictures. Therefore, the N pictures can be multiple pictures that make up the dynamic picture.
  • S4202. Determine the moment when each of the N pictures appears in the target video.
  • the target video is obtained based on the N pictures.
  • the preset order may be based on the time order of taking pictures or extracting pictures, or it may be a user-specified order, and so on.
  • the duration of the target video can be the default playback duration required to play N pictures, or the playback duration set by the user.
  • the corresponding time of each picture in the video can be used as the time when each picture appears in the target video.
  • the target video can be the same as the video selected by the user.
  • the time when each picture appears in the dynamic picture can be used as the time when each picture appears in the target video.
  • the target video can be understood as the dynamic picture.
  • the duration of the target video can be the duration required to play the dynamic picture.
  • audio data suitable for these pictures can be filtered out. And, based on the determined audio data, determine the time when each of the N pictures appears in the target video.
  • N pictures can be input to an artificial intelligence (artificial intelligence, AI) model (such as a machine learning model, neural network model, etc.), so that the AI model processes the N pictures to obtain an image adapted to these pictures.
  • AI artificial intelligence
  • the audio data may be data stored in a local database or audio data on the network, which is not limited here.
  • the duration of the target video may be the duration of the filtered audio data.
  • the duration of the acquired audio data is long, a piece of data can be intercepted from it as the required audio data.
  • the climax part in the audio data may be used as the required audio data, but is not limited to.
  • the audio data can be analyzed to determine the position points of each beat in the audio data and/or the position points of each section.
  • the position point of each beat can be understood as the time point of the starting position of each beat
  • the position point of each section can be understood as the time point of the starting position of each section.
  • the position points of each beat in the audio data and/or the position points of each section can be extracted through an AI model, a beat extraction algorithm, etc.
  • the determined playback duration of the audio data can be obtained, and N pictures are evenly spaced on the playback duration. And, based on the determined position points of each beat and/or the position points of each section, adjust the appearance time of at least a part of the N pictures, so that the appearance time of at least a part of the N pictures is It can be consistent with the position points of certain beats or the position points of certain sections, so that visual impact changes are presented at the key points of the listening sense, that is, the user can watch the picture at the key points of the listening sense, thus creating an audio-visual impact. A consistent sense of impact, thereby improving user experience.
  • the position point of each section is used to adjust the appearance time of at least part of the N pictures, for any picture, when there is no picture set at the position point at the beginning of the section closest to the time when the picture appears. , you can adjust the moment when the picture appears to this position.
  • the distance between the moment when the picture appears and the end point of the section is less than the distance between the moment when the picture appears and the start point of the section, the moment when the picture appears can also be adjusted to the section. Point at the end.
  • the specific adjustment method can refer to the adjustment method when using the position points of each beat, and will not be described again here.
  • certain audio data specified by the user can also be used as the required audio data. And, according to the method described above, determine the moment when each of the N pictures appears in the target video.
  • each of the N pictures can be input into a pre-trained target detection model, so that the target objects contained in each picture can be detected through the target detection model, thereby obtaining each picture.
  • target objects contained in can be, but is not limited to, trained based on convolutional neural networks (CNN).
  • CNN convolutional neural networks
  • the target object can be understood as an object in the picture that can produce sound.
  • the target object in the picture can be an airplane.
  • each picture can also be processed based on a target detection algorithm (such as YOLOv4, etc.) to obtain the target objects contained in each picture.
  • a target detection algorithm such as YOLOv4, etc.
  • a target object selection interface can also be displayed to the user so that the user can select the target object he or she needs. At this time, the target object is the target object required by the user.
  • the target object contained in each picture can also be obtained based on the user's selection operation on the picture. For example, after N pictures are determined, each picture can be displayed to the user. When a user views a picture, he or she can mark the target object in the picture through manual marking.
  • S4204. Determine the spatial positions of the M target objects in each picture to obtain (M*N) spatial positions, and determine the duration of each target object appearing in the target video to obtain M first durations.
  • a three-dimensional coordinate system can be constructed with the position of the device that captured the picture as the center.
  • the center position of each of the N pictures is the origin of the three-dimensional coordinate system.
  • the plane composed of the x-axis and the y-axis can be the plane where the picture is located.
  • the z-axis can represent depth, which describes the actual distance from the target object to the device that takes the picture.
  • the position of the target object in the three-dimensional coordinate system can be expressed as (x i , y i , z i ).
  • the values of x i and y i can be determined.
  • z i can be obtained through the time of flight (ToF) camera on the device that takes the picture, or through a pre-trained depth detection model.
  • the depth detection model can be, but is not limited to, trained based on a convolutional neural network.
  • the spatial position of the target object may refer to the position of the target object in the three-dimensional coordinate system.
  • the N pictures when the N pictures may be pictures selected by the user, or pictures taken by the user within a period of time, the N pictures may be sorted according to the shooting time of each picture. Then, the spatial position of the target object contained in each picture in each picture can be determined sequentially from far to near in time.
  • the i-th picture can be any one of the N pictures, and the k-th target object can be any target object in the i-th picture.
  • FIG. 44 For example, referring to Figure 44, (A) of Figure 44 shows the (i-1)th picture, (B) of Figure 44 shows the i-th picture, and (C) of Figure 44 is the (i+1)th picture, and at the same time, the determined target object is the bird 4301 shown in (B) of Figure 44 .
  • the bird 4301 does not exist in the picture shown in FIG. 44(A), and the shooting time of the picture is before the shooting time of the picture shown in FIG. 44(B).
  • the picture shown in FIG. 44(A) There is no other picture before, so the bird 4301 can be placed at an infinite distance from the spatial position of the picture shown in (A) of FIG. 44 .
  • the position on a certain boundary of the (i+1)-th picture can be used as the k-th target object in the (i-th +1) Spatial position in the picture.
  • the position on the boundary can be a certain position on a specified boundary, or it can be the target object determined in the i-th picture.
  • the k-th target object can be The spatial position of the target object in the i-th picture and the (i+1)-th picture is determined, and its moving direction is determined, and the position of the boundary in the (i+2)-th picture in the moving direction is determined. As the spatial position of the k-th target object in the (i+2)-th picture.
  • the determined target object is the bird 4501 shown in (A) and (B) of Figure 45.
  • Bird 4501 exists in both (A) and (B) of FIG. 45 , but bird 4501 does not exist in (C) of FIG. 45 .
  • the position of the boundary in the direction pointed by the arrow in Figure 45(C) is area 42, so area 42 can be regarded as the spatial position of the bird 4501 in the (i+2)th picture.
  • the target object when the k-th target object does not exist in the (i+3)-th picture, the target object can be moved according to the moving direction, moving speed, and the difference between the (i+2)-th picture and the (i+-th) picture. 3) Based on the time interval between pictures, determine a position outside the (i+3)th picture, and use this position as the spatial position of the k-th target object in the (i+3)th picture.
  • (D) of Figure 45 shows the (i+3)th picture.
  • the moving direction ie the direction pointed by the arrow in the figure
  • moving speed of the bird 4501 can be determined.
  • the position shown in area 43 can be regarded as the spatial position of bird 4501 in the (i+3)th picture.
  • the time interval between two adjacent pictures please refer to the description below for details.
  • the spatial position of the k-th target object in the (i+j)-th picture can also be determined by the aforementioned method, j ⁇ 1.
  • the spatial position of the k-th target object at the (i-1)th picture it can be placed at a certain position on the boundary of the (i-1)th picture. For example, it can be placed at the (i-1)th picture.
  • the i-th picture can be used as the basis, and through the aforementioned method of determining the spatial position of the k-th target object at the (i+j)-th picture, the k-th target object at the (i+1)-th picture can be determined
  • the spatial position in each picture from the first picture to the (i+j)th picture, and a spatial position set ⁇ P i+1 ,...,P i+j ⁇ is obtained.
  • Pi +j is the spatial position of the k-th target object in the (i+j)-th picture.
  • the k-th target object can be determined based on the (i+j+1)-th picture and by determining the spatial position of the k-th target object in the (i+j)-th picture as mentioned above.
  • P′ i+j is the spatial position of the k-th target object in the (i+j)-th picture.
  • the k-th target can be determined based on the spatial position set ⁇ P i+1 ,...,P i+j ⁇ and the spatial position set ⁇ P′ i+1 ,...,P′ i+j ⁇ The spatial position of the object in each picture from the (i+1)th picture to the (i+j)th picture.
  • the two spatial positions of the k-th target object in the same picture can be weighted and averaged, and the obtained result can be used as the spatial position of the k-th target object in the picture.
  • the position can be (P i+1 +P′ i+1 )/2.
  • the k-th target object has two spatial positions in each of the (i+1)-th picture to the (i+j)-th picture. Therefore, for each picture, the distance between the two spatial positions of the k-th target object in the picture can be determined, so that j distances can be obtained.
  • a weighted average of the two spatial positions of the k-th target object in the target picture can be performed, and the obtained result is used as the spatial position of the k-th target object in the target picture.
  • the spatial position of the k-th target object in the i-th picture can be connected to the spatial position in the target picture to obtain the target connection line, and the position of the k-th target object in the i-th picture can be determined on the target connection line.
  • the movement of the k-th target object can be determined based on the spatial position of the k-th target object in the i-th picture, its spatial position in the target picture, and the time interval between the i-th picture and the target picture. speed.
  • the position of the k-th target object within the time interval can be determined.
  • Moving distance Take the spatial position of the k-th target object in the i-th picture as the starting point, and find the position point on the target connection line that is the moving distance of the k-th target object from the starting point. This position point is the k-th target.
  • the spatial position of the object on any picture that is, a picture between the i-th picture and the target picture).
  • N pictures are pictures extracted from the target video selected by the user according to the preset sampling frequency, or are pictures extracted from dynamic pictures, the target contained in each picture is determined.
  • the spatial position of the object in each picture you can refer to the aforementioned method of N pictures being pictures selected by the user, which will not be described again here.
  • the duration between the moment when it first appears and the moment when the last picture ends playing can be used as the duration of its appearance in the target video.
  • the duration of the target video can also be used as the duration of its appearance in the target video.
  • S4205. Determine the moving speed of each target object between adjacent pictures according to the target positions of the M target objects in each picture and the time intervals between the occurrence of each adjacent picture in the N pictures.
  • the M target objects can be placed in each picture based on the speed calculation formula.
  • the target object p in the i-th image is Pi ( xi , y i , z i )
  • the position in the (i+1)-th image is Pi+1 ( xi +1 , y i+1 , z i+1 )
  • the time when the i-th image appears is t i
  • the time when the (i+1)-th image appears is t i+1
  • the target object p is at the i-th
  • the audio library may include at least one piece of audio data.
  • the user can also select Q target objects from M target objects, and add their respective associated first audio data to the Q target objects.
  • the first audio data added by the user is the audio data selected by the user based on his or her own needs. For example, the user can add the sound of a train to an airplane, or the sound of an airplane to an airplane, etc.
  • the first audio data added by the user may be data in the local audio library or data on the network, which is not limited here.
  • S4207 Adjust the second duration of each first audio data to be equal to the first duration corresponding to the corresponding target object, so as to obtain Q pieces of second audio data.
  • the first audio data when the second duration of the first audio data is greater than the first duration of the target object corresponding to the first audio data appearing in the target video, the first audio data can be Data of a duration equal to the first duration is intercepted from the data, thereby obtaining second audio data.
  • the first duration is 10 seconds and the second duration is 20 seconds
  • the first 10 seconds of data in the first audio data can be used as the second audio data, or the 5th second to the 15th second in the first audio data can be used. seconds of data as the second audio data.
  • first audio data When the second duration of the first audio data is less than the first duration of the target object corresponding to the first audio data appearing in the target video, multiple first audio data can be spliced, and the audio obtained after splicing can be Data of a duration equal to the first duration is intercepted from the data, thereby obtaining second audio data.
  • the third audio data is audio data with spatial sound effects.
  • the audio parameters of the second audio data may include sampling rate, number of channels, bit rate, etc.
  • the k-th target object in the i-th picture As an example, assume that the k-th target object does not appear before the i-th picture, and the objects after the i-th picture are oriented away from the three-dimensional coordinates The direction of the origin of the system moves.
  • the position of the k-th target object in the pictures before the i-th picture is set to infinity, the audio data corresponding to the k-th target object can not be played before the i-th picture appears, but from the The i picture begins to play the audio data corresponding to the k-th target object, and after the i-th picture, the sound of the target object is controlled to gradually fade away at a certain speed.
  • the sound of the target object can be controlled to gradually move toward the user at a certain speed, and in After the i-th picture, the sound of the target object is controlled to gradually fade away at a certain speed.
  • the sound volume of the audio data corresponding to the target object can be preset, or can be determined based on the spatial position in the picture where the target object is located. For example, based on the distance between the spatial position of the target object in the picture and the origin of the three-dimensional coordinate system, the mapping relationship between the preset distance and the sound volume can be queried to determine the audio corresponding to the target object in the picture. The sound size of the data.
  • Q third audio data can be mixed based on a mixing algorithm, thereby obtaining spatial environment audio related to N pictures.
  • these audio data can also be mixed with Q third audio data. , to obtain the spatial environment audio related to N pictures.
  • the spatial environment audio can be combined with the N pictures through ffmpeg technology or javaCV technology to generate a video with spatial audio, that is, the target video is obtained.
  • the obtained spatial environment audio can also be synthesized with the video corresponding to the N pictures to generate a video with spatial sound effects, that is, the target is obtained. video.
  • the target video finally obtained is a video with spatial sound effects
  • the sound related to the target object heard by the user moves with the movement of the target object, so that the sound moves with the animation, allowing People feel immersed in the scene.
  • embodiments of the present application also provide a sound processing method.
  • Figure 46 shows a sound processing method. It can be understood that this method can be executed by any device, device, platform, or device cluster with computing and processing capabilities. As shown in Figure 46, the method may include the following steps:
  • the target parameters include environmental information associated with the target device and/or user status information.
  • the environment information associated with the target device may include one or more of the following:
  • the data is audio data played continuously in the first time period, and the second audio data is audio data played sporadically in the first time period;
  • the target position of the target device in the target space, and at least one speaker is configured in the target space;
  • the target position of the picture generated by the target device in the target space, and at least one speaker is configured in the target space; or, the driving speed of the vehicle equipped with the target device.
  • Presence information for the user associated with the target device can include one or more of the following:
  • S4602. Process the original audio data according to the target parameters to obtain target audio data.
  • the target audio data matches the environment information and/or status information.
  • the original audio data after obtaining the target parameters, can be processed according to the target parameters so that the original audio data can match the target parameters, thereby constructing an audio system that is adapted to the current environment or the current user's status.
  • the audio data to be played allows the audio data to be played to be integrated with the current environment or the current user's status, thereby improving the user experience.
  • target audio data can be obtained, and the target audio data can match the environment information and/or status information.
  • the target audio data can be output.
  • the target audio data is adapted to the current environment or the current user's state
  • the target audio data can be integrated with the current environment or the current user's state, thereby improving the user experience.
  • each step in the above embodiments does not mean the order of execution.
  • the execution order of each process should be determined by its functions and internal logic, and should not constitute the implementation process of the embodiments of the present application. Any limitations.
  • each step in the above embodiments may be selectively executed according to actual conditions, may be partially executed, or may be executed in full, which is not limited here. All or part of any features of any embodiment of the present application can be combined freely and in any way provided that there is no contradiction. The combined technical solutions are also within the scope of this application.
  • the electronic device 100 involved in the embodiment of the present application can be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, or an ultra-mobile personal computer (UMPC). , netbooks, as well as cellular phones, personal digital assistants (PDAs), augmented reality (AR) devices, virtual reality (VR) devices, artificial intelligence (AI) devices, wearables devices, vehicle-mounted devices, smart home devices and/or smart city devices, etc.
  • PDAs personal digital assistants
  • AR augmented reality
  • VR virtual reality
  • AI artificial intelligence
  • wearables devices wearables devices
  • vehicle-mounted devices smart home devices and/or smart city devices
  • smart home devices smart home devices and/or smart city devices
  • Exemplary embodiments of electronic devices include, but are not limited to, electronic devices equipped with iOS, android, Windows, Harmony OS, or other operating systems.
  • the specific type of the electronic device is not particularly limited in the embodiments of this application.
  • FIG. 47 shows a schematic structural diagram of the electronic device 100.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, and a battery 142.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 110 . If the processor 110 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • Interfaces may include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, pulse code modulation (pulse code modulation, PCM) interface, universal asynchronous receiver and transmitter (universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and /or universal serial bus (USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • UART universal asynchronous receiver and transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (SDA) and a serial clock line (derail clock line, SCL).
  • processor 110 may include multiple sets of I2C buses.
  • the processor 110 can separately couple the touch sensor 180K, charger, flash, camera 193, etc. through different I2C bus interfaces.
  • the processor 110 can be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to implement the touch function of the electronic device 100 .
  • the I2S interface can be used for audio communication.
  • processor 110 may include multiple sets of I2S buses.
  • the processor 110 can be coupled with the audio module 170 through the I2S bus to implement communication between the processor 110 and the audio module 170 .
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface to implement the function of answering calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communications to sample, quantize and encode analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface to implement the function of answering calls through a Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • audio module 170 may be used to encode and decode audio signals.
  • the audio module 170 can also be used to perform audio processing on the audio signal, such as adjusting the gain of the audio signal, etc.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is generally used to connect the processor 110 and the wireless communication module 160 .
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface to implement the function of playing music through a Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 .
  • MIPI interfaces include camera serial interface (CSI), display serial interface (DSI), etc.
  • the processor 110 and the camera 193 communicate through the CSI interface to implement the shooting function of the electronic device 100 .
  • the processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the electronic device 100 .
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 110 with the camera 193, display screen 194, wireless communication module 160, audio module 170, sensor module 180, etc.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface that complies with the USB standard specification, and may be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones to play audio through them. This interface can also be used to connect other electronic devices, such as AR devices, etc.
  • the interface connection relationships between the modules illustrated in the embodiment of the present invention are only schematic illustrations and do not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive wireless charging input through the wireless charging coil of the electronic device 100 . While the charging management module 140 charges the battery 142, it can also provide power to the electronic device through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, the wireless communication module 160, and the like.
  • the power management module 141 can also be used to monitor battery capacity, battery cycle times, battery health status (leakage, impedance) and other parameters.
  • the power management module 141 may also be provided in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be reused as a diversity antenna for a wireless LAN. In other embodiments, antennas may be used in conjunction with tuning switches.
  • the mobile communication module 150 can provide solutions for wireless communication including 2G/3G/4G/5G applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation.
  • at least part of the functional modules of the mobile communication module 150 may be disposed in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs sound signals through audio devices (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 194.
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110 and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), Bluetooth (bluetooth, BT), and global navigation satellites. System (global navigation satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, frequency modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long termevolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi) -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the display screen 194 is used to display images, videos, etc.
  • Display 194 includes a display panel.
  • the display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the electronic device 100 can implement the shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the ISP is used to process the data fed back by the camera 193. For example, when taking a photo, the shutter is opened, the light is transmitted to the camera sensor through the lens, the optical signal is converted into an electrical signal, and the camera sensor passes the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193.
  • Camera 193 is used to capture still images or video.
  • the object passes through the lens to produce an optical image that is projected onto the photosensitive element.
  • the photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other format image signals.
  • the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.
  • Video codecs are used to compress or decompress digital video.
  • Electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG moving picture experts group
  • MPEG2 MPEG2, MPEG3, MPEG4, etc.
  • NPU is a neural network (NN) computing processor.
  • NN neural network
  • Intelligent cognitive applications of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, etc.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement the data storage function. Such as saving music, videos, etc. files in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the internal memory 121 may include a program storage area and a data storage area.
  • the stored program area can store an operating system, at least one application program required for a function (such as a sound playback function, an image playback function, etc.).
  • the storage data area may store data created during use of the electronic device 100 (such as audio data, phone book, etc.).
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to hands-free calls.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 answers a call or a voice message, the voice can be heard by bringing the receiver 170B close to the human ear.
  • Microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak close to the microphone 170C with the human mouth and input the sound signal to the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which in addition to collecting sound signals, may also implement a noise reduction function. In other embodiments, the electronic device 100 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions, etc.
  • the headphone interface 170D is used to connect wired headphones.
  • the headphone interface 170D may be a USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, or a Cellular Telecommunications Industry Association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA Cellular Telecommunications Industry Association of the USA
  • the pressure sensor 180A is used to sense pressure signals and can convert the pressure signals into electrical signals.
  • pressure sensor 180A may be disposed on display screen 194 .
  • the capacitive pressure sensor may be composed of at least two Parallel plates of conductive material.
  • the electronic device 100 determines the intensity of the pressure based on the change in capacitance.
  • the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position based on the detection signal of the pressure sensor 180A.
  • touch operations acting on the same touch location but with different touch operation intensities may correspond to different operation instructions. For example: when a touch operation with a touch operation intensity less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold is applied to the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the motion posture of the electronic device 100 .
  • the angular velocity of electronic device 100 about three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization. For example, when the shutter is pressed, the gyro sensor 180B detects the angle at which the electronic device 100 shakes, calculates the distance that the lens module needs to compensate based on the angle, and allows the lens to offset the shake of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • Air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • Magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 may utilize the magnetic sensor 180D to detect opening and closing of the flip holster.
  • the electronic device 100 may detect the opening and closing of the flip according to the magnetic sensor 180D. Then, based on the detected opening and closing status of the leather case or the opening and closing status of the flip cover, features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices and be used in horizontal and vertical screen switching, pedometer and other applications.
  • Distance sensor 180F for measuring distance.
  • Electronic device 100 can measure distance via infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may utilize the distance sensor 180F to measure distance to achieve fast focusing.
  • Proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 100 emits infrared light outwardly through the light emitting diode.
  • Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100 . When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100 .
  • the electronic device 100 can use the proximity light sensor 180G to detect when the user holds the electronic device 100 close to the ear for talking, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in holster mode, and pocket mode automatically unlocks and locks the screen.
  • the ambient light sensor 180L is used to sense ambient light brightness.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket to prevent accidental touching.
  • Fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to achieve fingerprint unlocking, access to application locks, fingerprint photography, fingerprint answering of incoming calls, etc.
  • Temperature sensor 180J is used to detect temperature.
  • the electronic device 100 utilizes the temperature detected by the temperature sensor 180J to execute the temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 reduces the performance of a processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection.
  • the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to prevent the low temperature from causing the electronic device 100 to shut down abnormally.
  • the electronic device 100 responds to the battery 142 The output voltage is boosted to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also known as "touch device”.
  • the touch sensor 180K can be disposed on the display screen 194.
  • the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near the touch sensor 180K.
  • the touch sensor can pass the detected touch operation to the application processor to determine the touch event type.
  • Visual output related to the touch operation may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a location different from that of the display screen 194 .
  • Bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human body's vocal part.
  • the bone conduction sensor 180M can also contact the human body's pulse and receive blood pressure beating signals.
  • the bone conduction sensor 180M can also be provided in an earphone and combined into a bone conduction earphone.
  • the audio module 170 can analyze the voice signal based on the vibration signal of the vocal vibrating bone obtained by the bone conduction sensor 180M to implement the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M to implement the heart rate detection function.
  • the electronic device 100 can process data collected by at least one sensor based on a pedestrian dead reckoning (pedestrian dead reckoning, PDR) algorithm to obtain the user's motion status, such as moving direction, moving speed, etc. .
  • PDR pedestrian dead reckoning
  • the buttons 190 include a power button, a volume button, etc.
  • Key 190 may be a mechanical key. It can also be a touch button.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for vibration prompts for incoming calls and can also be used for touch vibration feedback.
  • touch operations for different applications can correspond to different vibration feedback effects.
  • the motor 191 can also respond to different vibration feedback effects for touch operations in different areas of the display screen 194 .
  • Different application scenarios such as time reminders, receiving information, alarm clocks, games, etc.
  • the touch vibration feedback effect can also be customized.
  • the indicator 192 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be connected to or separated from the electronic device 100 by inserting it into the SIM card interface 195 or pulling it out from the SIM card interface 195 .
  • the electronic device 100 can support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card, etc. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different.
  • the SIM card interface 195 is also compatible with different types of SIM cards.
  • the SIM card interface 195 is also compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as calls and data communications.
  • the electronic device 100 uses an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100 .
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiment of this application takes the Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 .
  • Figure 48 is a software structure block diagram of the electronic device 100 according to the embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has clear roles and division of labor.
  • the layers communicate through software interfaces.
  • the Android system is divided into four layers, from top to bottom: application layer, application framework layer, Android runtime (Android runtime) and system libraries, and kernel layer.
  • the application layer can include a series of application packages.
  • the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
  • the application framework layer provides an application programming interface (API) and programming framework for applications in the application layer.
  • API application programming interface
  • the application framework layer includes some predefined functions.
  • the application framework layer can include a window manager, content provider, view system, phone manager, resource manager, notification manager, etc.
  • a window manager is used to manage window programs.
  • the window manager can obtain the display size, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make this data accessible to applications.
  • Said data can include videos, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
  • the view system includes visual controls, such as controls that display text, controls that display pictures, etc.
  • a view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of the electronic device 100 .
  • call status management including connected, hung up, etc.
  • the resource manager provides various resources to applications, such as localized strings, icons, pictures, layout files, video files, etc.
  • the notification manager allows applications to display notification information in the status bar, which can be used to convey notification-type messages and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also be notifications that appear in the status bar at the top of the system in the form of charts or scroll bar text, such as notifications for applications running in the background, or notifications that appear on the screen in the form of conversation windows. For example, text information is prompted in the status bar, a beep sounds, the electronic device vibrates, the indicator light flashes, etc.
  • Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
  • the core library contains two parts: one is the functional functions that need to be called by the Java language, and the other is the core library of Android.
  • the application layer and application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and application framework layer into binary files.
  • the virtual machine is used to perform object life cycle management, stack management, thread management, security and exception management, and garbage collection and other functions.
  • System libraries can include multiple functional modules. For example: surface manager (surface manager), media libraries (media libraries), 3D graphics processing libraries (for example: OpenGL ES), 2D graphics engines (for example: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as static image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, composition, and layer processing.
  • 2D Graphics Engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
  • processor in the embodiment of the present application can be a central processing unit (CPU), or other general-purpose processor, digital signal processor (DSP), Application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.
  • CPU central processing unit
  • DSP digital signal processor
  • ASIC Application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor can be a microprocessor or any conventional processor.
  • the method steps in the embodiments of the present application can be implemented by hardware or by a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable rom) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or other well-known in the art any other form of storage media.
  • An exemplary storage medium is coupled to the processor such that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and storage media may be located in an ASIC.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted over a computer-readable storage medium.
  • the computer instructions may be transmitted from one website, computer, server or data center to another website through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. , computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated therein.
  • the available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Stereophonic System (AREA)

Abstract

一种声音处理方法,包括:获取目标参数,目标参数包括与目标设备关联的环境信息和/或用户的状态信息;根据目标参数,对原始音频数据进行处理,得到目标音频数据,目标音频数据与环境信息和/或状态信息相匹配;输出目标音频数据。这样,根据目标参数对原始音频数据进行处理,以使得原始音频数据能够与目标参数相匹配,由此以构建出与当前环境或当前用户的状态适配的待播放的音频数据,从而使得待播放的音频数据能够与当前环境或当前用户的状态相融合,提升了用户体验。

Description

一种声音处理方法及电子设备
本申请要求于2022年6月24日提交中国国家知识产权局、申请号为202210727150.7、申请名称为“一种声音处理方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,尤其涉及一种声音处理方法及电子设备。
背景技术
目前,手机、音箱等具备音频播放功能的电子设备已逐渐进入到人们的生活中。通过这种类型的电子设备,用户可以随时随地播放其所需的音频数据。例如,用户可以在家庭中使用音箱播放其喜欢的音乐,也可以在车辆中使用手机进行导航或播放音乐,亦可以在车辆中使用配置在车辆中的车载终端进行导航或播放音乐等。但目前电子设备在播放音频数据过程中,仅能播放够原汁原味的音频数据,用户体验较差。
发明内容
本申请提供了一种声音处理方法、电子设备、计算机存储介质及计算机程序产品,能够构建出与当前环境或当前用户的状态适配的待播放的音频数据,从而使得待播放的音频数据能够与当前环境或当前用户的状态相融合,提升了用户体验。
第一方面,本申请提供一种声音处理方法,该方法可以包括:获取与目标设备关联的环境信息,环境信息包括目标设备所处区域的环境数据;根据环境数据,确定与环境数据相关联的N个声音对象,N≥1;获取各个声音对象对应的白噪音,得到N个音频数据,每个音频数据均与一个声音对象关联;将N个音频数据合成,得到目标音频数据,其中,目标音频数据与环境信息相匹配;输出目标音频数据。这样,由于N个声音对象是与目标设备所处区域的环境数据相关联的,因此,由N个声音对象对应的白噪音得到的目标音频数据也是与目标设备所处区域的环境数据相匹配的,这样,用户在收听目标音频数据时即可以有身处环境中的体验,从而具有身临其境的感受,提升了用户体验。
在一些实施例中,该方法可以应用于下文图1所描述的场景中。此时,目标设备可以为车辆,也可以为车辆中的电子设备。示例性的,目标设备可以为集成在车辆中的设备,比如车载终端等,也可以为与车辆分离的设备,比如驾驶员的手机等。另外,环境数据可以包括环境图像,环境声音,天气信息或季节信息等中的一项或多项。
在一些实施例中,N个声音对象可以为基于环境数据识别出的声音对象,也可以为用户对基于环境数据识别出的声音对象进行筛选后得到的声音对象,比如,剔除某些声音对象所剩的声音对象,或者,添加一些新的声音对象所得到的声音对象等等。
在一种可能的实现方式中,获取各个声音对象对应的白噪音,得到N个音频数据,具体包括:基于N个声音对象,查询原子数据库,得到N个音频数据,其中,原子数据库中配置有各个单一对象在特定的一段时间内的音频数据。示例性的,将原子数据库中的多个对象的音频数据随机组合或者按照预设规律组合,可以获取到一定时长的音频数据。示例性的,原 子数据库中可以包括:水流的音频数据、蝉鸣的音频数据、草木的音频数据等。示例性的,原子数据库中的白噪音的音频数据可以提前配置在车辆中,或者实时从服务器中获取等。
在一种可能的实现方式中,环境数据中包括环境声音。获取各个声音对象对应的白噪音,得到N个音频数据,具体包括:从环境声音中提取出M个声音对象的音频数据,以得到M个音频数据,0≤M≤N;其中,当M<N时,基于N个声音对象中剩余的声音对象,查询原子数据库,得到(N-M)个音频数据,其中,原子数据库中配置有各个单一对象在特定的一段时间内的音频数据。示例性的,当从环境声音中提取出的声音对象的音频数据不满足要求时,可以舍弃该音频数据,并从原子数据库中得到相应的声音对象对应的音频数据,由此以提升后续得到的目标音频数据的质量。可以预先设定一些策略,比如,隔绝全部的环境声音,隔绝环境声音中的部分声音,不隔绝环境声音,或者,当提取到的声音对象的音频数据的幅值大于预设值时保留该音频数据等等。其中,当隔绝全部的环境声音时,则M=0;当隔绝部分的环境声音时,则0<M≤N;当不隔绝环境声音时,则M=N。
在一种可能的实现方式中,在得到M个音频数据之后,还包括:将M个音频数据中各个音频数据所包含的声道的增益均调整至目标值。由此以提升音频数据的响度等,从而更能真实的还原环境声音,提升用户体验。
在一种可能的实现方式中,每个音频数据所表达的情感均与环境数据所表达的情感相同。由此以进一步使目标音频数据与环境信息相匹配,提升用户体验。
第二方面,本申请提供一种声音处理方法,该方法可以包括:获取与目标设备关联的环境信息,环境信息包括目标设备所处的环境中需同时播放第一音频数据和第二音频数据,且第一音频数据和第二音频数据均通过同一设备播放,其中,第一音频数据为第一时间段内持续性播放的音频数据,第二音频数据为第一时间段内偶发性播放的音频数据;获取待播放的第二音频数据;根据第二音频数据,从第一音频数据中提取出待播放的第三音频数据,以及,对第三音频数据进行目标处理,得到第四音频数据,其中,第二音频数据和第四音频数据对应的播放时间段相同,目标处理包括人声消除或人声降低;根据第二音频数据,确定第二音频数据所需调整的第一增益,以及,基于第一增益,对第二音频数据中各个声道的增益进行调整,得到第五音频数据;根据第四音频数据或者第五音频数据,确定第四音频数据所需调整的第二增益,以及,基于第二增益,对第四音频数据中各个声道的增益进行调整,得到第六音频数据;基于第五音频数据和第六音频数据,得到目标音频数据,其中,目标音频数据与环境信息相匹配;输出目标音频数据。
这样,通过对持续性播放的音频数据进行人声消除或人声降低处理等,并同时播报偶发性播放的音频数据和经处理后的需持续性播放的音频数据,使得用户在能够清楚感知到偶发性播放的音频数据中所包含的信息的同时,也可以清楚的感知到其他的音频数据的曲调、背景声等,从而更加有效的满足了用户听感,提升了用户体验。示例性的,持续性播放的音频数据(即第一音频数据)可以为某种类型的音乐,偶发性播放的音频数据(即第二音频数据)可以为导航时需播报的导航的音频数据。示例性的,人声消除可以理解为是消除音频数据中的人声,人声降低可以理解为是降低音频数据中的人声。
在一些实施例中,该方法可以应用于下文图4所描述的场景中。此时,目标设备可以为车辆,也可以为车辆中的电子设备。示例性的,目标设备可以为集成在车辆中的设备,比如车载终端等,也可以为与车辆分离的设备,比如驾驶员的手机等。
在一些实施例中,该方法可以但不限于应用于第一设备,该第一设备可以为播放第一音频数据和第二音频数据的设备。
在一种可能的实现方式中,第二音频数据为第一数据,或者,第四音频数据为第一数据;其中,根据第一数据,确定第一数据所需调整的增益,具体包括:获取第一数据的音频特征,音频特征包括以下一项或多项:时域特征,频域特征,或者,乐理特征;根据音频特征,确定第一数据所需调整的增益。示例性的,可以基于预先设定的增益计算公式,对音频特征进行处理,以得到所需调整的增益。
在一些实施例中,当第一数据为第二音频数据时,音频特征可以但不限于为时域特征,比如响度,包络能量,或者,短时能量等。响度可以为第二音频数据中各个时刻的响度,或者,最大的响度等。
在一些实施例中,当第一数据为第四音频数据时,音频特征可以但不限于为时域特征(比如响度,包络能量,或者,短时能量等)、频域特征(比如:多个频段的频谱能量等)、乐理特征(比如:节拍,调式,和弦,音高,音色,旋律,情感等)。
在一种可能的实现方式中,根据第五音频数据,确定第四音频数据所需调整的第二增益,具体包括:获取第五音频数据的最大响度值;根据第五音频数据的最大响度值和第一比例,确定第二增益,其中,第一比例为第二音频数据的最大响度值和第四音频数据的最大响度值间的比例。
在一种可能的实现方式中,根据第五音频数据,确定第四音频数据所需调整的第二增益,具体包括:获取第五音频数据的最大响度值;根据第五音频数据的最大响度值和第一比例,确定第二增益,其中,第一比例为第二音频数据的最大响度值和第四音频数据的最大响度值间的比例。
在一种可能的实现方式中,在确定出第二增益之后,方法还包括:基于第一增益,对第二增益进行修正。由此以使得在后续播放第五音频数据产生的声音更容易被感知。示例性的,基于预先设定的第一增益和第二增益之间的线性关系,对第二增益进行修正。
在一种可能的实现方式中,在确定出第二增益之后,方法还包括:确定第二增益大于预设增益值;将第二增益更新为预设增益值。示例性的,当第二增益大于预先设定的增益值时,表明播放第四音频数据产生的声音较小,其对播放后续得到的第五音频数据产生的声音造成影响较小,因此可以将确定出的第二增益的值更新为预先设定的增益值。
在一种可能的实现方式中,基于第二增益,对第四音频数据中各个声道的增益进行调整,具体包括:在第四音频数据播放开始之后,且与第四音频数据播放开始的时刻相距第一预设时间的第一时长内,按照第一预设步长将第四音频数据中各个声道的增益逐渐调整至第二增益;以及,在第四音频数据播放结束之前,且与第四音频数据播放结束的时刻相距第二预设时间的第二时长内,按照第二预设步长将第四音频数据中各个声道的增益逐渐由第二增益调整至预设增益值。由此以避免出现音量突变的情况,进而使得用户感知到的声音的音量等是逐渐变化的,提升用户体验。
在一种可能的实现方式中,基于第二增益,对第四音频数据中各个声道的增益进行调整,具体包括:在第四音频数据播放开始之前,且与第四音频数据播放开始的时刻相距第一预设时间的第一时长内,按照第一预设步长将第四音频数据中各个声道的增益逐渐调整至第二增益;以及,在第四音频数据播放结束之后,且与第四音频数据播放结束的时刻相距第二预设时间的第二时长内,按照第二预设步长将第四音频数据中各个声道的增益逐渐由第二增益调 整至预设增益值。由此以避免出现音量突变的情况,进而使得用户感知到的声音的音量等是逐渐变化的,提升用户体验。
在一种可能的实现方式中,基于第二增益,对第四音频数据中各个声道的增益进行调整,具体包括:在第四音频数据播放开始之后,且与第四音频数据播放开始的时刻相距第一预设时间的第一时长内,按照第一预设步长将第四音频数据中各个声道的增益逐渐调整至第二增益;以及,在第四音频数据播放结束之后,且与第四音频数据播放结束的时刻相距第二预设时间的第二时长内,按照第二预设步长将第四音频数据中各个声道的增益逐渐由第二增益调整至预设增益值。由此以避免出现音量突变的情况,进而使得用户感知到的声音的音量等是逐渐变化的,提升用户体验。
在一种可能的实现方式中,基于第二增益,对第四音频数据中各个声道的增益进行调整,具体包括:在第四音频数据播放开始之前,且与第四音频数据播放开始的时刻相距第一预设时间的第一时长内,按照第一预设步长将第四音频数据中各个声道的增益逐渐调整至第二增益;以及,在第四音频数据播放结束之前,且与第四音频数据播放结束的时刻相距第二预设时间的第二时长内,按照第二预设步长将第四音频数据中各个声道的增益逐渐由第二增益调整至预设增益值。由此以避免出现音量突变的情况,进而使得用户感知到的声音的音量等是逐渐变化的,提升用户体验。
第三方面,本申请提供一种声音处理方法,该方法可以包括:第一设备获取第二设备发送的第一消息,第一消息为第二设备需要播报音频数据时发送;响应于第一消息,第一设备对其待播放的音频数据进行目标处理,以及播放经目标处理的音频数据,目标处理用于消除或降低音频数据中的目标声音;第一设备获取第二设备发送的第二消息,第二消息为第二设备结束播报音频数据时发送;响应于第二消息,第一设备停止对其待播放的音频数据进行目标处理,以及播放未经目标处理的音频数据。
这样,在偶发性播放音频数据的电子设备播报音频数据的过程中,可以降低持续性播放音频数据的电子设备所播放的音频数据的干扰,使得用户能够清楚的感知到偶发性播放音频数据的电子设备所播放的音频数据。示例性的,偶发性播放音频数据可以为通话时的音频数据,持续性播放的音频数据可以为某种类型的音乐。
在一些实施例中,该方法可以应用于家居场景中,此时,第二设备可以为手机,第一设备可以为智能音箱、智能电视等。在该场景下,第一设备可以正在播放音乐、电视剧或者电影等,第二设备需播报的音频数据可以是用户使用第二设备进行通话时第二设备需播放的音频数据。另外,该方法也可以应用于驾车场景中,此时,第二设备可以为手机,第一设备可以为车载终端。在该场景下,第一设备可以正在播放音乐等,第二设备需播报的音频数据可以是用户使用第二设备进行导航或通话时第二设备需播放的音频数据。
在一种可能的实现方式中,目标处理包括人声消除处理或者人声降低处理。
第四方面,本申请提供一种声音处理方法,该方法可以包括:获取与目标设备关联的环境信息,环境信息包括目标设备在目标空间中的目标位置,目标空间中配置有至少一个扬声器;确定目标设备与N个扬声器间的距离,以得到N个第一距离,N为正整数,其中,N个扬声器与目标设备处于同一空间中;根据N个第一距离和N个扬声器,构建目标虚拟扬声器组,目标虚拟扬声器组由M个目标虚拟扬声器组成,M个目标虚拟扬声器位于以目标设备所处的 位置为中心,且以N个第一距离中的目标距离为半径的圆上,M的值与构建空间环绕声所需的扬声器的数量相等,M个目标虚拟扬声器的布置方式与构建空间环绕声所需的扬声器的布置方式相同,每个目标虚拟扬声器均通过调整N个扬声器中的至少一个扬声器对应的音频信号的增益得到;根据在N个扬声器中且与目标虚拟扬声器关联的扬声器对应的音频信号所需调整的增益,对原始音频数据中各个声道的增益进行调整,得到目标音频数据,其中,目标音频数据与环境信息相匹配;输出目标音频数据。这样,目标电子设备在空间中所处的位置,调整空间中各个扬声器输出的音频信号的增益,从而使得用户可以随时随地享受到空间环绕声。示例性的,构建空间环绕声所需的扬声器的布置方式可以为5.1.X或者7.1.X的要求中所需的布置方式。在一些实施例中,该方法可以应用于下文图9或10所描述的场景中。其中,目标设备可以为图10中的电子设备100。
在一些实施例中,一个音频数据中可以但不限于包括各个相应的扬声器所需播放的音频信号。示例性的,一个音频数据中所包含的每个音频信号均可以与一个声道相对应。在一种可能的实现方式中,目标距离为N个第一距离中的最小值。这样可以将扬声器均虚拟至与目标设备距离最近的区域,提升空间环绕声效果。
在一种可能的实现方式中,根据N个第一距离和N个扬声器,构建目标虚拟扬声器组,具体包括:以目标距离为基准,确定N个扬声器中除目标扬声器之外的各个扬声器对应的音频信号所需调整的增益,以构建出第一虚拟扬声器组,第一虚拟扬声器组为将N个扬声器均虚拟至以目标设备为中心,且以目标距离为半径的圆上得到的扬声器的组合,目标扬声器为目标距离对应的扬声器;根据第一虚拟扬声器组和构建空间环绕声所需的扬声器的布置方式,确定目标虚拟扬声器组,其中,目标虚拟扬声器组中的中置扬声器位于目标设备当前的朝向上的预设角度范围内。
示例性的,可以以目标距离为基准,并基于预先设定的增益计算模型,对目标距离和除目标扬声器之外的各个扬声器与目标设备间的距离进行处理,以得到除目标扬声器之外的各个扬声器对应的音频信号所需调整的增益,从而构建出第一虚拟扬声器组。接着,可以基于构建空间环绕声所需的扬声器的布置方式,从第一虚拟扬声器组中确定出目标虚拟扬声器组。其中,当目标虚拟扬声器组中的某个虚拟扬声器未在第一虚拟扬声器组中时,可以通过VBAP算法对第一虚拟扬声器组中的虚拟扬声器进行处理,以构建出目标虚拟扬声器组中的虚拟扬声器。其中,该确定目标虚拟扬声器组的方式可以参阅下文图11中的描述。
在一种可能的实现方式中,根据N个第一距离和N个扬声器,构建目标虚拟扬声器组,具体包括:根据N个扬声器,N个第一距离,构建空间环绕声所需的扬声器的布置方式,目标设备的朝向,以及目标设备所处的位置,构建第一虚拟扬声器组,第一虚拟扬声器组中包括M个第一虚拟扬声器,每个第一虚拟扬声器均通过调整N个扬声器中的至少一个扬声器对应的音频信号的增益得到;确定目标设备与各个第一虚拟扬声器间的第二距离,以得到M个第二距离;将M个第一虚拟扬声器均虚拟至以目标设备所处的位置为中心,且以第二距离中的一个距离为半径的圆上,以得到目标虚拟扬声器组。也即是说,可以先确定出一定数量(即构建空间环绕声所需的扬声器的数量)的虚拟扬声器,然后,再将这些虚拟扬声器虚拟至同一个圆上,以得到目标虚拟扬声器组。其中,该确定目标虚拟扬声器组的方式可以参阅下文图17中的描述。
在一种可能的实现方式中,在确定目标设备与N个扬声器间的距离之前,方法还包括:根据目标设备所处空间中配置的扬声器,目标设备的朝向,目标设备所处的位置,以及构建 空间环绕声所需的扬声器的布置方式,从目标设备所处空间中配置的扬声器中筛选出N个扬声器,N个扬声器用于构建空间环绕声。也即是说,可以先筛选出构建空间环绕声所需的真实的扬声器,然后再由这些真实的扬声器构建出所需的虚拟扬声器。其中,该确定目标虚拟扬声器组的方式可以参阅下文图19中的描述。
在一种可能的实现方式中,方法还包括:确定目标设备与目标空间中的各个扬声器间的距离;根据目标设备与目标空间中的各个扬声器间的距离,确定目标空间中的各个扬声器在播放音频数据时的延迟时间;控制目标空间中的各个扬声器按照相应的延迟时间播放音频数据。由此以控制各个扬声器同步播放,提升用户体验。
第五方面,本申请提供一种声音处理方法,该方法可以包括:获取与目标设备关联的环境信息,环境信息包括目标设备产生的画面在目标空间中的目标位置,目标空间中配置有至少一个扬声器;根据目标位置,构建与目标空间匹配的虚拟空间,虚拟空间的体积小于目标空间的体积;根据目标空间中各个扬声器的位置,在虚拟空间中构建出目标虚拟扬声器组,目标虚拟扬声器组中包括至少一个目标虚拟扬声器,且每个目标虚拟扬声器均通过调整目标空间中的一个扬声器对应的音频信号的增益得到;根据在目标空间中且与目标虚拟扬声器关联的扬声器对应的音频信号所需调整的增益,对原始音频数据中各个声道的增益进行调整,得到目标音频数据,其中,目标音频数据与环境信息相匹配;输出目标音频数据。
这样,结合目标设备设备产生的画面在空间中的目标位置,在目标位置处构建出一个虚拟的扬声器组,并控制目标设备中的音频数据由该虚拟扬声器组播放,从而使得目标设备播放的画面和音频数据同步,提升用户的听感和视感一致性体验。在一些实施例中,该方法可以应用于下文图20所描述的场景中。其中,目标设备可以为图20中的电子设备100。此时,原始音频数据可以为用户使用目标设备所播放的音频数据。
在一种可能的实现方式中,根据目标空间中各个扬声器的位置,在虚拟空间中构建出目标虚拟扬声器组,具体包括:根据虚拟空间和目标空间间的比例,在虚拟空间中确定出目标虚拟扬声器组中各个目标虚拟扬声器的位置;根据各个目标虚拟扬声器和与各个目标虚拟扬声器对应的目标扬声器间的距离,确定出各个目标扬声器对应的音频信号所需调整的增益,以得到目标虚拟扬声器组,目标扬声器为目标空间中的扬声器。
在一种可能的实现方式中,方法还包括:确定目标设备产生的画面与目标空间中的各个扬声器间的距离;根据目标设备产生的画面与目标空间中的各个扬声器间的距离,确定目标空间中的各个扬声器在播放音频数据时的延迟时间;控制目标空间中的各个扬声器按照相应的延迟时间播放音频数据。由此以控制各个扬声器同步播放,提升用户体验。
进一步地,该方法还可以包括:从确定出的目标设备产生的画面与目标空间中的各个扬声器间的距离中,选取一个距离作为基准距离;并根据该基准距离,确定目标设备产生的画面的出现时间。由此以提升音画同步的效果。示例性的,该基准距离可以为确定出的目标设备产生的画面与目标空间中的各个扬声器间的距离中的最大的一个距离。示例性的,可以基于该基准距离和声音的传播速度,确定出产生的画面相对于该基准距离对应的扬声器产生的声音出现的延时时间;然后,在控制目标设备在该基准距离对应的扬声器播放相应的音频数据的时刻之后,且达到该延时时间时,在显示出相应的画面。例如,若确定出的延时时间为3s,该基准距离对应的扬声器播放相应的音频数据的时刻为t,则目标设备产生的画面出现的时刻为(t+3)。
第六方面,本申请提供一种声音处理方法,该方法可以包括:获取与目标设备关联的用户的状态信息,用户的状态信息包括目标设备与目标用户的头部间的目标距离,目标用户的头部在目标空间中的目标位置,目标空间中配置有至少一个扬声器;根据目标距离、目标位置和目标空间中各个扬声器的位置,构建目标虚拟扬声器组,目标虚拟扬声器组中包括至少一个目标虚拟扬声器,每个目标虚拟扬声器均通过调整目标空间中的一个扬声器对应的音频信号的增益得到,每个目标虚拟扬声器均处于以目标位置为圆心且以目标距离为半径的圆上;根据在目标空间中且与目标虚拟扬声器关联的扬声器对应的音频信号所需调整的增益,对原始音频数据中各个声道的增益进行调整,得到目标音频数据,其中,目标音频数据与用户的状态相匹配;输出目标音频数据。这样,结合目标设备与目标用户的头部间的目标距离,目标用户的头部在目标空间中的目标位置等,在目标用户的周围构建出一个虚拟的扬声器组,并控制目标设备中的音频数据由该虚拟扬声器组播放,从而使得目标设备播放的画面和音频数据同步,提升用户的听感和视感一致性体验。在一些实施例中,该方法可以应用于下文图24所描述的场景中。其中,目标设备可以为图24中的电子设备100。此时,原始音频数据可以为用户使用目标设备所播放的音频数据。
在一种可能的实现方式中,根据目标距离、目标位置和目标空间中各个扬声器的位置,构建目标虚拟扬声器组之后,还包括:根据目标虚拟扬声器组,构建第一虚拟扬声器组,第一虚拟扬声器组由M个虚拟扬声器组成,M个虚拟扬声器位于以目标位置为中心,且以目标距离为半径的圆上,M的值与构建空间环绕声所需的扬声器的数量相等,M个虚拟扬声器的布置方式与构建空间环绕声所需的扬声器的布置方式相同,M个虚拟扬声器中每个虚拟扬声器均通过调整目标空间中的至少一个扬声器对应的音频信号的增益得到。
此时,根据在目标空间中且与目标虚拟扬声器关联的扬声器对应的音频信号所需调整的增益,对原始音频数据中各个声道的增益进行调整,得到目标音频数据,具体包括:根据在目标空间中且与M个虚拟扬声器关联的扬声器对应的音频信号所需调整的增益,对原始音频数据中各个声道的增益进行调整,得到目标音频数据。由此,以构建出播放空间环绕声所需的虚拟扬声器,并可以通过这些虚拟扬声器播放目标音频数据,从而使得用户可以收听到空间环绕声,提升用户体验。
在一种可能的实现方式中,目标虚拟扬声器组中包括S个虚拟扬声器,S个虚拟扬声器为构建空间环绕声所需的扬声器,S个虚拟扬声器中的每个虚拟扬声器均通过调整N个扬声器中的至少一个扬声器对应的音频信号的增益得到;确定目标位置与S个虚拟扬声器中各个虚拟扬声器间的距离,以得到S个距离;将S个虚拟扬声器均虚拟至以目标位置为中心,且以S个距离中的一个距离为半径的圆上,以得到所需的虚拟扬声器组,以及基于构建所需的虚拟扬声器组过程中确定出的各个真实的扬声器对应的音频信号所需调整的增益,对原始音频数据进行调整,以得到目标音频数据。也即是说,可以先确定出一定数量(即构建空间环绕声所需的扬声器的数量)的虚拟扬声器,然后,再将这些虚拟扬声器虚拟至同一个圆上,以得到所需虚拟扬声器组;最后,可以基于构建所需的虚拟扬声器组过程中确定出的各个真实的扬声器对应的音频信号所需调整的增益,对原始音频数据进行调整,以得到目标音频数据。
在一种可能的实现方式中,该方法还可以包括:根据目标距离、目标位置、目标空间中各个扬声器的位置,以及构建空间环绕声所需的扬声器的布置方式,从目标设备所处空间中 配置的扬声器中筛选出N个扬声器,N个扬声器用于构建空间环绕声;根据N个扬声器,确定所需的虚拟扬声器组,以及基于构建所需的虚拟扬声器组过程中确定出的各个真实的扬声器对应的音频信号所需调整的增益,对原始音频数据进行调整,以得到目标音频数据。也即是说,可以先筛选出构建空间环绕声所需的真实的扬声器,然后再由这些真实的扬声器构建出所需的虚拟扬声器;最后,可以基于构建所需的虚拟扬声器组过程中确定出的N个真实的扬声器对应的音频信号所需调整的增益,对原始音频数据进行调整,以得到目标音频数据。
第七方面,本申请提供一种声音处理方法,该方法可以包括:获取与目标设备关联的环境信息,其中,目标设备位于车辆中,环境信息包括车辆的行驶速度、转速和加速踏板的开度中的一项或多项;根据行驶速度、转速和加速踏板的开度中的至少一个,从原始音频数据中,确定出第一音频数据,其中,第一音频数据为基于行驶速度对原始音频数据中的目标音频粒子进行伸缩变换得到;根据行驶速度,确定车辆的加速度,并根据加速度,调整第一音频数据中各个声道的增益,以得到第二音频数据,以及,确定车辆中的声场向目标方向移动的目标速度;根据目标速度,确定目标音频数据的声源的虚拟位置;根据虚拟位置,确定车辆中多个扬声器对应的音频信号的所需调整的目标增益,得到F个目标增益,F≥2;根据F个目标增益,调整第二音频数据中各个声道的增益,以得到目标音频数据,其中,目标音频数据与环境信息相匹配;输出目标音频数据。这样,驾驶员在车辆中听到的声音可以是与车辆的行驶速度相关联的,使得听感更真实,提升了用户体验。
在一些实施例中,该方法可以应用于下文所描述的“控制新能源车辆加速行驶”的场景。此时,在用户驾驶车辆过程中,根据车辆中的扬声器控制车辆中声场的移动,使得声浪声音可以产生空间上的变化,从而使得车辆的内部可以出现多普勒效应,进而使得车辆所播放的声浪声音与真实驾驶状态相符,使得听感更真实,提升了用户体验。另外,在该场景下,目标设备可以为车辆,也可以为车辆中的电子设备。示例性的,目标设备可以为集成在车辆中的设备,比如车载终端等,也可以为与车辆分离的设备,比如驾驶员的手机等。
在一种可能的实现方式中,在根据行驶速度,调整第一音频数据中各个声道的增益之前,还包括:确定行驶速度的变化值超过预设速度阈值;和/或,确定第一音频数据中每个声道的增益对应的调整值均小于或等于预设调整值,其中,当第一音频数据中目标声道的增益对应的目标调整值大于预设调整值时,将目标调整值更新为预设调整值。由此以避免用户听到的声音忽大忽小或者声音产生突变,提升用户体验。
在一种可能的实现方式中,目标参数还包括车辆的加速时长,方法还包括:根据加速时长,控制车辆中的氛围灯工作。由此以为用户带来视觉上的体验。另外,还可以控制氛围灯颜色颜色变化的速度与车辆中声场移动的目标速度相同,以使得车辆中的空间听感和空间视感相对应,提升用户体验。
第八方面,本申请提供一种声音处理方法,该方法可以包括:获取与目标设备关联的用户的状态信息,状态信息包括用户的疲劳等级;根据疲劳等级,确定第一特征参数的目标调整值,第一特征参数为当前所需播放的原始音频数据的特征参数,第一特征参数包括音调和/或响度;根据目标调整值,对原始音频数据进行处理,得到目标音频数据,其中,目标音频数据的特征参数的值高于第一特征参数的值,目标音频数据与用户的状态信息相匹配;输出目标音频数据。这样,当检测到用户出现驾驶疲劳时,可以根据用户的疲劳等级改变原始音 频数据的特征参数(比如音调、响度等),从而使得播放的音频数据能够在听觉上对用户产生冲击,进而提高用户的注意力。在一些实施例中,该方法可以应用于下文图35所描述的场景中。在该场景下,目标设备可以为车辆,也可以为车辆中的电子设备。示例性的,目标设备可以为集成在车辆中的设备,比如车载终端等,也可以为与车辆分离的设备,比如驾驶员的手机等。另外,在该场景下,原始音频数据可以为待播放的导航音的音频数据。
在一种可能的实现方式中,输出目标音频数据,具体包括:根据疲劳等级,确定第一目标提示音;根据预先设定的播报顺序,输出目标音频数据和第一目标提示语音。由此以进一步在听觉上对用户产生冲击,并使得播报方式和语言更具生活化和人性化,提升用户体验。示例性的,第一目标提示语音可以下文“表2”中所示的提示语音。
在一种可能的实现方式中,方法还包括:根据疲劳等级和地图信息,确定第二目标提示音;输出第二目标提示音。由此以进一步在听觉上对用户产生冲击,进而提高用户的注意力。示例性的,第二目标提示语音可以为“注意!注意!驾驶人员已极度疲劳,可于xxx米远的xxx路口/超市/中转站停车休息”。
在一种可能的实现方式中,目标设备位于车辆中。此时,在输出目标音频数据之前,方法还包括:确定车辆处于自动驾驶状态,且车辆所处的路段的路况低于预设路况阈值,和/或,确定车辆所处的路段为预设路段。由此,以在特定的条件下提高用户的注意力。
在一种可能的实现方式中,方法还包括:根据疲劳等级,确定警示灯的闪烁频率和/或颜色,以及控制警示灯按照确定出的闪烁频率和/颜色工作。由此以给予用户在视觉上的冲击,进而提高用户的注意力。
第九方面,本申请提供一种声音处理方法,该方法可以包括:获取与目标设备关联的用户的状态信息,状态信息包括用户选择的第一音频数据和第二音频数据;确定第一音频数据的第一音频特征,第一音频特征包括:各个时刻的响度和/或各个节拍的位置点;根据第一音频特征,调整第二音频数据的第二音频特征,以得到第三音频数据,第二音频特征包括响度、音调和音速中的至少一项;根据第一音频数据和第三音频数据,得到目标音频数据,其中,目标音频数据与用户的状态信息相匹配;输出目标音频数据。这样,可以基于用户选择的一种音频数据对另一种音频数据进行处理,从而使得两种音频数据能够自然的融合到一起,进而给用户带来更好的听觉体验。在一些实施例中,该方法可以应用于下文所描述的“用户选择多种音频数据叠加播放”的场景。在该场景下,第一音频数据可以为背景音,第二音频数据可以为白噪音。
在一种可能的实现方式中,第一音频特征包括:第一音频数据的各个时刻的响度,第二音频特征包括响度。根据目标音频特征,调整第二音频数据的第二音频特征,具体包括:根据第一音频数据的各个时刻的响度和预设响度比例,确定第二音频数据中各个时刻对应的目标响度;将第二音频数据中各个时刻的响度,调整至第二音频数据中各个时刻对应的目标响度。由此以使得两个音频数据中各个时刻的响度与预先设定的响度比例相符,从而使得两者可以自然的融合到一起。
在一种可能的实现方式中,目标音频特征包括:各个节拍的位置点,第二音频特征包括音调和/或音速。根据目标音频特征,调整第二音频数据的音调,具体包括:针对第一音频数据中任意相邻的两个节拍,根据任意相邻的两个节拍,确定任意相邻的两个节拍对应的目标节奏;根据目标节奏,确定第二音频数据在任意相邻的两个节拍对应的位置点内的第二音频 特征的目标调整值;根据目标调整值,对第二音频数据在任意相邻的两个节拍对应的位置点内的第二音频特征进行调整。由此,以使得第二音频数据的音频特征能够与第一音频数据的节奏相匹配,从而使得两者可以自然的融合到一起。
第十方面,本申请提供一种声音处理方法,该方法可以包括:获取与目标设备关联的用户的状态信息,用户的状态信息包括以下一项或多项:用户选择的图片,视频,或者,用户为目标对象所添加的音频数据;确定N张图片,N≥2;确定N张图片中各张图片内包含的目标对象,以得到M个目标对象,M≥1;确定各个目标对象在N张图中每张图片中的空间位置,以及,确定各个目标对象在目标视频中出现的时长,以得到M个第一时长,目标视频基于N张图片得到;根据各个目标对象的空间位置,以及N张图片中各个相邻的图片在目标视频中出现的时刻,确定各个目标对象在各个相邻的图片间的移动速度;根据M个目标对象,得到Q个第一音频数据,1≤Q≤M,其中,一个第一音频数据至少与一个目标对象相关联;将各个第一音频数据的第二时长均调整至与相应的目标对象对应的第一时长相等,以得到Q个第二音频数据;根据各个目标对象的空间位置,以及各个目标对象在各个相邻的图片间的移动速度,分别对各个目标对象对应的第二音频数据进行处理,以得到Q个第三音频数据;根据Q个第三音频数据和N张图片,得到目标视频,其中,目标视频中包括目标音频数据,目标音频数据基于Q个第三音频数据得到,其中,目标音频数据与用户的状态信息相匹配;输出目标音频数据。这样,基于用户所选择的数据,为数据中的目标对象添加空间音频,从而使得在制作完成的视频中目标对象的声音可以随着目标对象的运动而移动,进而使得用户听感更加真实,提升了观看体验。在一些实施例中,该方法可以应用于下文所所描述的“制作视频或动态图片”的场景。在一些实施例中,目标视频的时长可以是按照固定时间播放一张图片计算得到,也可以是通过选取的一段音频数据的时长得到。
在一种可能的实现方式中,方法还包括:根据N张图片,确定出与N张图片匹配的第四音频数据;将第四音频数据中至少一部分节拍的位置点作为N张图片中至少一部分图片出现的时刻,和/或,将第四音频数据中至少一部分小节的开始或结束的位置点作为N张图片中至少一部分图片出现的时刻。由此以使得N张图片中的至少一部分图片出现的时刻可以与某些节拍的位置点或者某些小节的位置点一致,使得在听感的关键点处呈现视觉的冲击变化,即在听感的关键点处用户可以观看到图片,从而在视听上产生一致的冲击感,进而提升用户体验。
在一种可能的实现方式中,确定各个目标对象在N张图中每张图片中的空间位置,具体包括:针对第i张图片内的第k个目标对象,基于预先设定的三维坐标系,确定第k个目标对象在第i张图片中的第一空间位置,其中,三维坐标系的中心点为第i张图片的中心位置,第i张图片为N张图中的任意一张图片,第k个目标对象为第i张图片中的任意一个目标对象。
在一种可能的实现方式中,方法还包括:确定第(i+1)张图片中不存在第k个目标对象;将第(i+1)张图片的第一边界上的第一位置,作为第k个目标对象在第(i+1)张图片中的第二空间位置。由此以避免在第(i+1)张图片中第k个目标对象的声音突然消失。
在一种可能的实现方式中,第一边界为第k个目标对象在第i张图片中的目标朝向上的边界,第一位置在第(i+1)张图片中以第一空间位置为起点,且在目标朝向上延伸的直线与第一边界的交点。
在一种可能的实现方式中,方法还包括:确定第(i+2)张图片中不存在第k个目标对象; 根据第一空间位置,第二空间位置,以及第i张图片和第(i+1)张图片间的时间间隔,确定第k个目标对象的第一移动速度和第一移动方向;将第(i+2)张图片之外的第二位置,作为第k个目标对象在第(i+2)张图片中的第三空间位置;其中,第二位置为在第一移动方向上,且与在第(i+2)张图片中的第二空间位置相距第一目标距离的位置点,第一目标距离根据第一移动速度,以及第(i+1)张图片和第(i+2)张图片间的时间间隔得到。由此以使得第k个目标对象的声音是逐渐向目标方向远去,而不是突然消失,提升用户体验。
在一种可能的实现方式中,方法还包括:确定第(i-1)张图片中不存在第k个目标对象,其中,i≥2;将第(i-1)张图片的第二边界上的第三位置,作为第k个目标对象在第(i-1)张图片中的第四空间位置。由此以避免在第i张图片中第k个目标对象的声音突然出现。
在一种可能的实现方式中,第二边界为第k个目标对象在第i张图片中的目标朝向的反方向上的边界,第三位置在第(i-1)张图片中以第一空间位置为起点,且在目标朝向的反方向上延伸的直线与第二边界的交点。
在一种可能的实现方式中,方法还包括:确定第(i-2)张图片中不存在第k个目标对象,其中,i≥3;根据第一空间位置,第四空间位置,以及第i张图片和第(i-1)张图片间的时间间隔,确定第k个目标对象的第二移动速度和第二移动方向;将第(i-2)张图片之外的第四位置,作为第k个目标对象在第(i-2)张图片中的第五空间位置;其中,第四位置为在第二移动方向的反方向上,且与在第(i-2)张图片中的第四空间位置相距第二目标距离的位置点,第二目标距离根据第二移动速度,以及第(i-1)张图片和第(i-2)张图片间的时间间隔得到。由此以使得第k个目标对象的声音是逐渐向目标方向靠近,而不是在第i张图片中突然出现,提升用户体验。
在一种可能的实现方式中,方法还包括:确定第(i+1)张图片至第(i+j)张图片中均不存在第k个目标对象,j≥2,且第(i+j+1)张图片中存在第k个目标对象,(i+j+1)≤N;以第i张图片为基准,分别确定第k个目标对象在第(i+1)张图片至第(i+j)张图片中各张图片中的空间位置,以得到第一空间位置集合{Pi+1,...,Pi+j},其中,Pi+j为第k个目标对象在第(i+j)张图片中的空间位置,以及,以第(i+j+1)张图片为基准,分别确定第k个目标对象在第(i+1)张图片至第(i+j)张图片中各张图片中的空间位置,以得到第二空间位置集合{P′i+1,...,P′i+j},其中,P′i+j为第k个目标对象在第(i+j)张图片中的空间位置;根据第一空间集合和第二空间集合,确定第k个目标对象在第(i+1)张图片至第(i+j)张图片中各张图片中的空间位置。由此以提升第k个目标对象在第(i+1)张图片至第(i+j)张图片中各张图片中的空间位置中的准确度。
在一种可能的实现方式中,根据第一空间集合和第二空间集合,确定第k个目标对象在第(i+1)张图片至第(i+j)张图片中各张图片中的空间位置,具体包括:根据第一空间集合和第二空间集合,分别确定第k个目标对象在第(i+1)张图片至第(i+j)张图片中每张图片内的两个空间位置之间的距离,以得到j个距离;根据第一空间集合和第二空间集合,确定第k个目标对象在第(i+c)张图片中的空间位置,第(i+c)张图片为j个距离的一个距离对应的图片,1≤c≤j;根据第k个目标对象在第i张图片中的空间位置,第k个目标对象在第(i+j+1)张图片中的空间位置,第k个目标对象在第(i+c)张图片中的空间位置,以及,第i张图片至第(i+j+1)张图片中各张图片在目标视频中出现的时刻,确定第k个目标对象第i张图片至第(i+c)张图片间的各张图片中的空间位置,以及确定第k个目标对象第第(i+c)张图片至第(i+j+1)张图片间的各张图片中的空间位置。
第十一方面,本申请提供一种电子设备,包括:至少一个存储器,用于存储程序;至少一个处理器,用于执行存储器存储的程序;其中,当存储器存储的程序被执行时,处理器用于执行第一方面至第十方面中所提供的任意一方面中所提供的方法。
第十二方面,本申请提供一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,当计算机程序在电子设备上运行时,使得电子设备执行第一方面至第十方面中所提供的任意一方面中所提供的方法。
第十三方面,本申请提供一种计算机程序产品,当计算机程序产品在电子设备上运行时,使得电子设备执行第一方面至第十方面中所提供的任意一方面中所提供的方法。
第十四方面,本申请还提供了一种芯片,包括处理器,所述处理器与存储器耦合,用于读取并执行所述存储器中存储的程序指令,以使所述芯片实现上述第一方面至第十方面中所提供的任意一方面中所提供的方法。可以理解的是,上述第十一方面至第十四方面的有益效果可以参见上述第一方面至第十方面中的相关描述,在此不再赘述。
附图说明
下面对实施例或现有技术描述中所需使用的附图作简单地介绍。
图1是本申请一实施例提供的一种应用场景的示意图;
图2是本申请一实施例提供的一种声音处理方法的流程示意图;
图3是本申请一实施例提供的一种电子设备的显示界面示意图;
图4是本申请一实施例提供的一种应用场景的示意图;
图5是本申请一实施例提供的一种声音处理方法的流程示意图;
图6是本申请一实施例提供的一种音频数据的时域波形示意图和包络示意图;
图7是本申请一实施例提供的一种对音频数据进行短时傅里叶变换后得到的频谱图的示意图;
图8是本申请一实施例提供的一种声音处理方法的流程示意图;
图9是本申请一实施例提供的一种应用场景的示意图;
图10是本申请一实施例提供的一种应用场景的示意图;
图11是本申请一实施例提供的一种声音处理方法的流程示意图;
图12是本申请一实施例提供的一种电子设备的朝向示意图;
图13是本申请一实施例提供的一种构建虚拟扬声器的示意图;
图14是本申请一实施例提供的一种构建虚拟扬声器的过程示意图;
图15是本申请一实施例提供的一种构建虚拟扬声器组的过程示意图;
图16是本申请一实施例提供的另一种构建虚拟扬声器组的过程示意图;
图17是本申请一实施例提供的一种声音处理方法的流程示意图;
图18是本申请一实施例提供的一种构建虚拟扬声器的过程示意图;
图19是本申请一实施例提供的又一种声音处理方法的流程示意图;
图20是本申请一实施例提供的一种应用场景的示意图;
图21是本申请一实施例提供的一种三点定位的示意图;
图22是本申请一实施例提供的一种声音处理方法的流程示意图;
图23是本申请一实施例提供的一种构建虚拟空间的示意图;
图24是本申请一实施例提供的一种声音处理方法的流程示意图;
图25是本申请一实施例提供的一种在虚拟空间中构建虚拟扬声器组的示意图;
图26是本申请一实施例提供的一种声音处理方法的流程示意图;
图27是本申请一实施例提供的一种声音处理方法的流程示意图;
图28是本申请一实施例提供的一种声音处理方法的流程示意图;
图29是本申请一实施例提供的一种声音处理方法的流程示意图;
图30是本申请一实施例提供的一种车辆的硬件结构示意图;
图31是本申请一实施例提供的一种声音处理方法的流程示意图;
图32是本申请一实施例提供的一种声场移动的示意图;
图33是本申请一实施例提供的一种声场移动的示意图;
图34是本申请一实施例提供的一种车辆中氛围灯的颜色跟随车辆的加速时长逐渐变化的示意图;
图35是本申请一实施例提供的一种应用场景的示意图;
图36是本申请一实施例提供的一种声音处理方法的流程示意图;
图37是本申请一实施例提供的一种对声音进行变速不变调处理的过程示意图;
图38是本申请一实施例提供的一种声音处理方法的流程示意图;
图39是本申请一实施例提供的一种声音处理方法的流程示意图;
图40是本申请一实施例提供的一种声音处理方法的流程示意图;
图41是本申请一实施例提供的一种声音处理方法的流程示意图;
图42是本申请一实施例提供的一种声音处理方法的流程示意图;
图43是本申请一实施例提供的一种将图片出现的时刻调整至节拍的位置点上的示意图;
图44是本申请一实施例提供的一种确定图片中目标对象的空间位置的示意图;
图45是本申请一实施例提供的一种确定图片中目标对象的空间位置的示意图;
图46是本申请一实施例提供的一种声音处理方法的流程示意图;
图47是本申请一实施例提供的一种电子设备的硬件结构示意图;
图48是本申请一实施例提供的一种电子设备的软件结构框图。
具体实施方式
本文中术语“和/或”,是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。本文中符号“/”表示关联对象是或者的关系,例如A/B表示A或者B。
本文中的说明书和权利要求书中的术语“第一”和“第二”等是用于区别不同的对象,而不是用于描述对象的特定顺序。例如,第一响应消息和第二响应消息等是用于区别不同的响应消息,而不是用于描述响应消息的特定顺序。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
在本申请实施例的描述中,除非另有说明,“多个”的含义是指两个或者两个以上,例如,多个处理单元是指两个或者两个以上的处理单元等;多个元件是指两个或者两个以上的元件 等。
示例性的,本申请实施例提供了一种声音处理方法,该方法可以根据外部信息输入,对原始音频数据进行处理,构建出待播放的音频数据。例如,该方法可以根据与电子设备关联的环境信息和/或用户的状态信息等,构建出与当前环境或当前用户的状态适配的待播放的音频数据,从而使得待播放的音频数据能够与当前环境或当前用户的状态相融合,提升了用户体验。其中,在构建与当前环境或当前用户的状态适配的待播放的音频数据时,可以通过调整待播放的音频数据的音频特征(比如:增益,音调或响度等),得到所需的音频数据,和/或,通过将与当前环境适配的目标对象的音频数据进行组合,得到所需的音频数据。又例如,该方法可以根据电子设备拍摄的图片或视频相关的信息,构建出与拍摄的图片或视频适配的待播放的音频数据。
在一些实施例中,与电子设备关联的环境信息可以包括以下一项或多项:电子设备所处区域的环境数据(比如:环境图像,环境声音,天气信息或季节信息等),电子设备所处的环境中是否需要同时播放不同的音频数据,电子设备在空间中的位置,电子设备产生的画面在空间中的位置,或者,当电子设备位于车辆中时,车辆的行驶参数(比如:行驶速度等),等。
与电子设备关联的用户的状态信息可以包括以下一项或多项:用户的疲劳等级,电子设备与与用户的头部间的距离和用户的头部在空间中的位置,用户选择的音频数据,或者,用户选择的图片或视频,等。
在本申请实施例中,该声音处理方法主要涉及以下几个场景:
1、在车辆中融合环境声音的场景。在该场景下,可以通过车辆中的电子设备,并结合与该电子设备所处区域的环境数据,从预先配置的白噪音的原子数据库中,确定出与当前环境适配的各个声音对象的音频数据。以及,可以将确定出的各个声音对象的音频数据合成,得到目标音频数据,并播放该目标音频数据。这样,驾驶员或其他的用户在车辆中即可以听到与外部环境相匹配的声音,从而使得用户可以有身临其境的体验。其中,白噪音的原子数据库中可以配置各个单一对象在特定的一段时间内的音频数据,比如水流的音频数据、蝉鸣的音频数据、草木的音频数据等。在该场景中,可以根据电子设备关联的环境信息构建出待播放的音频数据,其中,与电子设备关联的环境信息可以为电子设备所处区域的环境数据。
2、持续播放一种音频数据,且,偶发性播放另一种音频数据的场景。该场景可以包括两种场景。
第一种场景是,持续性播放的音频数据和偶发性播放的音频数据,是通过同一个电子设备播放。在该场景下,可以通过电子设备,对持续性播放的音频数据进行人声消除或人声降低处理等,并可以同时播报偶发性播放的音频数据和经处理后的需持续性播放的音频数据。这样,用户在能够清楚感知到偶发性播放的音频数据中所包含的信息的同时,也可以清楚的感知到其他的音频数据的曲调、背景声等,从而更加有效的满足了用户听感,提升了用户体验。示例性的,持续性播放的音频数据可以为某种类型的音乐,偶发性播放的音频数据可以为导航时需播报的导航的音频数据。
第二种场景是,持续性播放的音频数据和偶发性播放的音频数据,是通过不同的电子设备播放。在该场景下,一个电子设备(以下简称“第一设备”)可以持续性播放一种音频数据,另一个电子设备可以偶发性播放另一种音频数据。在该场景下,当偶发性播放音频数据的电子设备需要播放音频数据时,该电子设备可以指示持续性播放音频数据的电子设备执行人声消除或人声降低操作;以及在偶发性播放音频数据的电子设备播报结束后,该电子设备可以 指示持续性播放音频数据的电子设备停止执行人声消除或人声降低操作。这样,在偶发性播放音频数据的电子设备播报音频数据的过程中,可以降低持续性播放音频数据的电子设备所播放的音频数据的干扰,使得用户能够清楚的感知到偶发性播放音频数据的电子设备所播放的音频数据。示例性的,偶发性播放音频数据可以为通话时的音频数据,持续性播放的音频数据可以为某种类型的音乐。
在上述两种场景中,可以根据电子设备关联的环境信息构建出待播放的音频数据,其中,与电子设备关联的环境信息可以为电子设备所处的环境中是否需要同时播放不同的音频数据。
3、利用空间中设置的扬声器播放音频数据的场景。该场景可以包括两种场景。
其中,第一种场景可以是:在空间中配置有多个扬声器,且至少有一部分扬声器是按照一定的要求(比如:5.1.X,或,7.1.X等)布置。另外,在该场景下,电子设备或者其他的设备正在使用扬声器播放音频数据。在该场景下,可以结合电子设备所处的位置,调整各个扬声器输出的音频信号的增益,从而使得用户可以随时随地享受到空间环绕声。在该场景中,可以根据电子设备关联的环境信息构建出待播放的音频数据,其中,与电子设备关联的环境信息可以为电子设备在空间所处的位置。
第二种场景可以是:在空间中配置有多个扬声器,且电子设备可以产生画面(比如:用户使用电子设备观看影片等),以及电子设备通过空间中布置的扬声器播放其上的音频数据。在该场景下,可以结合电子设备所处的位置,在电子设备或者电子设备所产生的画面的的周围构建出一个虚拟的扬声器组,使得电子设备中的音频数据可以由该虚拟扬声器组播放,使得电子设备播放的画面和音频数据同步,提升用户的听感和视感一致性体验。在该场景中,可以根据电子设备关联的环境信息或者用户的状态信息,构建出待播放的音频数据,其中,与电子设备关联的环境信息可以为电子设备产生的画面在空间中的位置;用户的状态信息包括电子设备与用户的头部间的距离,用户的头部在空间中的位置等。
4、控制新能源车辆加速行驶的场景。在该场景下,可以通过车辆中的电子设备,并结合车辆的行驶速度等行驶参数,控制车辆中声场的移动,使得声浪声音(比如:模仿的燃油车辆的引擎的声音等)可以产生空间上的变化,从而使得车辆的内部可以出现多普勒效应,进而使得车辆所播放的声浪声音与真实驾驶状态相符,使得听感更真实,提升了用户体验。应理解的是,在本申请实施例中,新能源车辆是指采用非常规的车用燃料作为动力来源(或使用常规的车用燃料、采用新型车载动力装置)的车辆。比如:混合动力电动汽车、纯电动汽车、燃料电池电动汽车、其他新能源(如超级电容器、飞轮等高效储能器)汽车等。其中,非常规的车用燃料指除汽油、柴油之外的燃料。在该场景中,可以根据电子设备关联的环境信息构建出待播放的音频数据,其中,电子设备关联的环境信息可以为车辆的行驶参数。
5、驾车,并利用车辆中的电子设备进行导航,且驾驶员出现驾驶疲劳的场景。在该场景下,当检测到驾驶员出现驾驶疲劳时,可以根据驾驶员的疲劳等级改变导航播报的音频数据的特征参数(比如音调、增益等),从而使得播放的音频数据能够在听觉上对驾驶员产生冲击,进而提高驾驶员的注意力,实现安全驾驶。在该场景中,可以根据与电子设备关联的用户的状态信息构建出待播放的音频数据,其中,与电子设备关联的用户的状态信息可以为用户的疲劳等级。
6、用户选择多种音频数据叠加播放的场景。在该场景下,可以基于用户选择的至少一种音频数据,对用户所选择的其他的音频数据进行改造,从而使得两者可以更自然的融合在一 起,进而给用户带来更好的听觉体验。示例性的,用户所选择的音频数据可以包括背景音和白噪音等。在该场景中,可以根据与电子设备关联的用户的状态信息构建出待播放的音频数据,其中,与电子设备关联的用户的状态信息可以为用户选择的音频数据。
7、制作视频或动态图片的场景。在该场景下,可以在制作视频或动态图片过程中,基于电子设备拍摄的图片或视频,为电子设备拍摄的图片或视频中的目标对象添加空间音频,从而使得在制作完成的视频或动态图片中目标对象的声音可以随着目标对象的运动而移动,进而使得用户听感更加真实,提升了观看体验。在该场景中,可以根据与电子设备关联的用户的状态信息构建出待播放的音频数据,该音频数据为制作完成的视频或动态图片中目标对象的音频数据。其中,与电子设备关联的用户的状态信息可以为用户选择的图片,视频,和/或,为目标对象所添加的音频数据。
接下来,基于上述各个场景的顺序,依次分场景对本申请实施例中提供的声音处理方法进行介绍。
1、在车辆中融合环境声音的场景。
示例性的,图1示出了本申请一些实施例中的一种应用场景。如图1所示,驾驶员A位于车辆200中。在车辆200中配置有电子设备100和扬声器230,且电子设备100处于开机状态。其中,电子设备100可以为集成在车辆200中的设备,比如车载终端,也可以为与车辆200分离的设备,比如驾驶员A的手机等,此处不做限定。
当电子设备100集成在车辆200中时,电子设备100可以直接利用车辆200中的扬声器230播报其所需播报的音频数据。当电子设备100与车辆200分离布置时,电子设备100与车辆200间可以但不限于通过短距通信(比如蓝牙等)的方式建立连接。其中,当电子设备100与车辆200间分离布置时,电子设备100可以将其所需播报的音频数据传输至车辆200,并通过车辆200上的扬声器230进行播报,或者,电子设备100可以通过其内置的扬声器播报其所需播报的音频数据。
另外,车辆200的外部可以设置有摄像头等图像采集装置210,以采集车辆200外部的环境图像。车辆200的外部还可以设置有用于采集环境中声音的拾音器220,比如麦克风等。
可以理解的是,本申请实施例示意的结构并不构成对车辆200的具体限定。在本申请另一些实施例中,车辆200可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。
示例性的,图2示出了一种声音处理方法。在图2中,电子设备100可以为集成在车辆200中的设备,比如车载终端等,也可以为与车辆200分离的设备,比如驾驶员A的手机等。另外,图2中所示的方法,可以但不限于应用于驾车场景,比如开车时的场景等,或者户外露营场景,比如在山谷或湖畔露营的场景等。此外,在图2中的电子设备100上可以但不限于设置有用于启动执行该方法的控件,比如:该控件的名称可以为“露营模式”,当用户选择开启露营模式时,可以执行图2中所示的方法。如图2所示,该方法包括以下步骤:
S201、电子设备100获取车辆200所处区域的环境数据,环境数据包括:环境图像,环境声音,天气信息或季节信息等中的一项或多项。
本实施例中,车辆200上的图像采集装置210可以实时或周期性采集200所处区域的环境图像,并将采集到的数据传输至电子设备100。车辆200上的拾音器220可以实时或周期性采集200所处区域的环境声音,并将采集到的数据传输至电子设备100。另外,电子设备100可以实时或周期性通过网络获取车辆200所处区域的天气信息和/或季节信息。
S202、电子设备100根据环境数据,确定当前所需的各个声音对象。
本实施例中,电子设备100可以将环境数据输入至预先训练的声音对象检测模型中,以由声音对象检测模型输出当前所需的各个声音对象。在一些实施例中,声音对象检测模型可以但不限于是基于卷积神经网络(convolutional neural network,CNN)训练得到。
举例来说,当车辆200在树林中的道路行驶,当前为白天且天气晴朗时,由环境图像可以确定出车辆200处于树林中,由环境声音可以确定出当前环境中存在鸟叫的声音,由天气信息可以确定出当前的天气是晴朗的,且是白天。这样,确定出的各个声音对象为树木、鸟叫、白天且晴朗。
在一些实施例中,除了由声音对象检测模型得到当前所需的声音对象外,还可以根据环境数据,确定出与环境数据适配的声音主题。然后,再将该声音主题中包含的声音对象,作为当前所需的声音对象。其中,每个声音主题下均包含有至少一个与该声音主题相关联的声音对象。示例性的,声音主题可以为“夏夜蝉鸣”,在该声音主题下所包含的声音对象可以有“蝉鸣”、“夜晚且晴朗”、“微风”、“流水”;另外,声音主题也可以为“夏夜暴雨”,在该声音主题下所包含的声音对象可以有“狂风”、“暴雨”、“雷鸣”。
S203、电子设备100基于各个声音对象从白噪音的原子数据库中,确定出各个声音对象的音频数据。
本实施例中,电子设备100在获取到各个声音对象后,可以查询白噪音的原子数据库,从而获取到各个声音对象在特定的一段时间内的音频数据。其中,白噪音的原子数据库中配置的是各个单一对象在特定的一段时间内的音频数据,比如水流的音频数据、蝉鸣的音频数据、草木的音频数据等。将原子数据库中的多个对象的音频数据随机组合或者按照预设规律组合,可以获取到一定时长的音频数据。示例性的,原子数据库中的白噪音音频数据可以提前配置在车辆中,或者实时从服务器中获取等。
在一些实施例中,原子数据库中可以包括一个声音对象在不同的时间段内的音频数据,且,不同时间段内的音频数据可以具有不同的情感。例如,当声音对象为鸟叫时,原子数据库中可以包括一段欢快的鸟叫声和一段悲伤的鸟叫声。
进一步地,在确定各个声音对象的音频数据时,可以基于当前的环境数据,确定出与当前的环境数据所表达的情感适配的各个声音对象的音频数据。例如,当天气晴朗时,可以确定当前的环境数据所表达的情感为快乐,此时可以从原子数据库中筛选出当前所需的各个声音对象中音频数据,且这些音频数据所表达的情感均为快乐。
S204、电子设备100将各个声音对象的音频数据合成,得到目标音频数据,以及播放目标音频数据。
本实施例中,电子设备100可以将各个声音对象的音频数据进行合成,得到目标音频数据,以及播放该目标音频数据。其中,电子设备100在播放该目标音频数据时,可以通过车辆200的扬声器进行播放。这样,驾驶员在车辆中即可以听到与外部环境相匹配的声音,从而使得用户可以有身临其境的体验。
在一些实施例中,可以通过混音算法对各个声音对象的音频数据进行混音处理,以得到目标音频数据。其中,可以根据音频数据的类型,选择使用与该类型相适配的混音算法进行处理。例如,当音频数据的类型为浮点(float)型时,可以直接将各个音频数据叠加混合,以得到目标音频数据。当音频数据的类型不是float型时,可以采用自适应加权混音算法、线性叠加求平均等混音算法对各个音频数据进行处理,以得到目标音频数据。
此外,在混音处理过程中,可以根据声音对象的类型,在混音过程中选择混音的次数。例如,对于蝉鸣、鸟叫类的声音对象,其声音较为短促,因此在混音过程中可以采用随机时间下多次输入这些声音对象的音频数据进行混音处理。
对于底噪类的声音对象,当其对应的音频数据的时长足够长时,在混音过程中可以输入一次即可;当其对应的音频数据的时长较短时,在混音过程中可以输入多次,且相邻的两个音频数据间头尾相连,即第一个音频数据的播放结束时间是第二个音频数据的播放起始时间,由此以得到足够时长的底噪类声音。
在一些实施例中,电子设备100在播放目标音频数据时,可以向用户展示组成目标音频数据的各个声音对象的标识,以及用户当前可以添加的声音对象的标识。这样,用户可以根据自身需求选择添加声音对象或者删减声音对象。例如,如图3所示,电子设备100可以在控件31处显示当前播放的声音对象(即组成目标音频数据的声音对象),以及在控件32处显示可添加的声音对象。继续参阅图3,用户可以在控件31中的子控件33选择删除声音对象,和/或,在控件32中的子控件34选择添加声音对象。
当用户选择删除一个或多个声音对象,或者,选择添加一个或多个声音对象时,电子设备100可以对用户所选择的其所需播放的声音对象重新进行合成,以得到用户所需的音频数据。举例来说,继续参阅图3,当用户删除“蜂鸣”、“鸟叫”、“微风”,并选择添加“落石”、“狂风”后,此时用户所期望播放的声音对象是:“白天晴朗”、“树叶窸窣”、“狂风”、“流水”、“落石”。在用户选择完成后,电子设备100可以将用户所期望播放的声音对象(即“白天晴朗”、“树叶窸窣”、“狂风”、“流水”、“落石”)的音频数据进行合成,以得到新的目标音频数据,并进行播放。
在一些实施例中,电子设备100获取到环境声音后,可以基于设定的透传策略,确定是否透传环境声音。示例性的,透传环境声音可以理解为是播放环境声音。
示例性的,透传策略可以包括:隔绝全部的环境声音,隔绝环境声音中的部分声音,或者,不隔绝环境声音中的任意一项。其中,该透传策略可以为用户自行选择,此时在电子设备100上可以设置有用于选择透传策略的机械按键或虚拟按键,用户可以根据自身需求进行选择。另外,该透传策略也可以由电子设备100自行决定,例如,当环境噪声大于第一噪声值时,电子设备100可以选择的透传策略可以为隔绝全部的环境声音;当环境噪声大于第二噪声值,且小于第一噪声值时,电子设备100可以选择的透传策略可以为隔绝环境声音中的部分声音;当环境噪声小于第二噪声值时,电子设备100可以选择的透传策略可以为不隔绝环境声音。
当透传策略为隔绝全部分环境声音时,电子设备100可以舍弃该环境声音,即不播放环境声音。
当透传策略为隔绝环境声音中的部分声音时,电子设备100可以将环境声音输入至预先训练好的声音分离模型,以由该声音分离模型提取到环境声音中所包含的各个声音对象对应的音频数据。电子设备100获取到环境声音中所包含的各个声音对象对应的音频数据后,可以从中舍弃一部分声音对象对应的音频数据,并将剩余的声音对象对应的音频数据与前述确定出的各个声音对象的音频数据进行合成,以得到目标音频数据,以及播放该目标音频数据,从而将真实环境中的音频数据与从原子数据库中确定出的音频数据相融合,使得用户能够更真实的感受外部的环境。
示例性的,电子设备100可以根据环境数据,确定出与环境数据适配的声音主题。其中, 每个声音主题下均包含有至少一个与该声音主题相关联的声音对象。当与环境数据适配的声音主题中未包含环境声音中所包含的某个声音对象时,电子设备100可以舍弃该声音对象对应的音频数据。当与环境数据适配的声音主题中包含环境声音中所包含的某个声音对象时,电子设备100可以保留该声音对象对应的音频数据。例如,若确定出的声音主题为“夏夜蝉鸣”,在该声音主题下所包含的声音对象有“蝉鸣”、“夜晚且晴朗”、“微风”、“流水”,如果由环境声音中包含的声音对象为“蝉鸣”、“落石”,电子设备100则可以保留环境声音中的“蝉鸣”对应的音频数据,并舍弃环境声音中的“落石”对应的音频数据。
进一步地,为了能够真实还原环境声音,电子设备100可以对提取到的声音对象对应的音频数据中各个声道的增益进行调整。例如,当提取到的声音对象的音频数据为风声时,电子设备100可以提升该风声的响度。
另外,电子设备100在从环境声音中提取出声音对象的音频数据后,还可以对各个音频数据对应的声音对象进行标记。同时,电子设备100可以从前述确定出的当前所需的各个声音对象中剔除与此时标记的各个声音对象相同的对象。由此以避免后续将相似的音频数据进行合成,提升合成后的音频数据的质量。例如,当前述确定出的当前所需的各个声音对象为:树木、鸟叫、白天且晴朗,而从环境声音中提取出的音频数据对应的声音对象为鸟叫时,电子设备100可以将前述确定出的当前所需的声音对象中的“鸟叫”剔除。
作为一种可能的实现方式,电子设备100在剔除前述确定出的某个声音对象之前,还可以判断在环境声音中的该声音对象对应的音频数据幅值等是不是满足要求。当满足要求时,则可以剔除前述确定出的某个声音对象,否则,则保留前述确定出的某个声音对象对应的音频数据,并剔除环境声音中的该声音对象对应的音频数据,或者,对环境声音中的该声音对象对应的音频数据进行调整,以使其满足要求,并剔除前述确定出的某个声音对象。
举例来说,若前述确定出的声音对象(即前述S202中根据环境数据得到的声音对象)为“蝉鸣”,且从环境声音中可以提取到“蝉鸣”对应的音频数据。此时,若电子设备100确定出从环境声音中可以提取到“蝉鸣”对应的音频数据的幅值低于预设值,电子设备100则可以舍弃从环境声音中可以提取到“蝉鸣”对应的音频数据;或者,电子设备100可以对从环境声音中可以提取到“蝉鸣”对应的音频数据的幅值进行调整,以使其幅值高于预设值,并舍弃前述确定出的声音对象对应的音频数据。若电子设备100确定出从环境声音中可以提取到“蝉鸣”对应的音频数据的幅值高于预设值,电子设备100则可以保留从环境声音中可以提取到“蝉鸣”对应的音频数据,并舍弃前述确定出的声音对象对应的音频数据。
当透传策略为不隔绝环境声音时,电子设备100可以将环境声音与前述确定出的各个声音对象的音频数据进行合成,以得到目标音频数据,以及播放该目标音频数据。
2、持续播放一种音频数据,且,偶发性播放另一种音频数据的场景。
2.1、持续性播放的音频数据和偶发性播放的音频数据,是通过同一个电子设备播放。
示例性的,图4示出了本申请一些实施例中的一种应用场景。如图4所示,在驾驶员A驾驶车辆200前往目的地的过程中,驾驶员A可以利用位于车辆200中的电子设备100导航至目的地。同时,驾驶员A可以利用电子设备100播放音乐。也即是说,在电子设备100上同时开启有与导航相关的软件(比如Google等),和,与播放音乐相关的软件(比如等)。在图4中,电子设备100可以为集成在车辆200中的设备,比如车载终端,也可以为与车辆200分离的设备,比如驾驶员A的手机等,此处不做限定。当电子设备100 集成在车辆200中时,电子设备100可以直接利用车辆200中的扬声器播报其所需播报的音频数据。当电子设备100与车辆200分离布置时,电子设备100与车辆200间可以但不限于通过短距通信(比如蓝牙等)的方式建立连接。其中,当电子设备100与车辆200间分离布置时,电子设备100可以将其所需播报的音频数据传输至车辆200,并通过车辆200上的扬声器进行播报,或者,电子设备100可以通过其内置的扬声器播报其所需播报的音频数据。
一般地,当导航播报的声音和音乐播放的声音并发时,即需要同时播放这两种声音时,电子设备100可以降低音乐播放的音量,并以正常的音量播放导航的声音。其中,正常的音量可以理解为:在播放音乐过程中未降低音乐播放的声音之前的音量。当导航的声音播报完毕后,电子设备100可以将音乐播放的音量恢复至正常音量。当这种方式是以降低音乐播放声音为基础,使得用户听感上只有导航播报的声音,而对音乐播放的声音几乎无法感知,即大幅牺牲了用户的音乐体验。
有鉴于此,本申请实施例中提供了一种声音处理方法,在导航播报的声音和音乐播放的声音并发时,使得用户在获得导航播报的声音的同时,可以对音乐播放的声音拥有更好的听感体验。
示例性的,图5示出了本申请一些实施例中的一种声音处理方法。在图5中,电子设备100可以为集成在车辆200中的设备,比如车载终端;也可以为与车辆200分离的设备,比如驾驶员A的手机等。另外,在图5中,电子设备100上同时运行有与导航相关的软件(比如Google等),和,与播放音乐相关的软件(比如Apple等),且用户正在使用电子设备100由一个位置导航至另一个位置,以及正在使用电子设备100播放音乐。如图5所示,该方法可以包括以下步骤:
S501、电子设备100在播放第一音频数据的过程中,获取待播放的第二音频数据。
本实施例中,电子设备100在播放一个音频数据的过程中,可以获取另一个待播放的音频数据。其中,第一音频数据可以是电子设备100所播放的音乐数据,第二音频数据可以是电子设备100所需播放的导航的数据。
S502、电子设备100根据第二音频数据,从第一音频数据中提取出待播放的第三音频数据,其中,第二音频数据和第三音频数据对应的播放时间段相同。
本实施例中,电子设备100可以根据第二音频数据的初始播放时间和数据长度,从第一音频数据中提取出待播放的第三音频数据,其中,该第三音频数据的初始播放时间与第二音频数据的初始播放时间相同,该第三音频数据的数据长度与第二音频数据的数据长度相等。也即是说,第二音频数据和第三音频数据对应的播放时间段相同。
S503、电子设备100对第三音频数据进行人声消除或人声降低处理,得到第四音频数据。
本实施例中,当需要进行人声消除处理时,电子设备100可以将第三音频数据输入至预先训练完毕的人声消除模型,对第三音频数据进行人声消除处理,以得到第四音频数据。当需要进行人声降低处理时,电子设备100可以将第三音频数据输入至预先训练完毕的人声降低模型,对第三音频数据进行人声降低处理,以得到第四音频数据。对于选择人声消除处理,还是选择人声降低处理,可以但不限于预先设定。其中,由于第四音频数据是通过对第三音频数据处理得到,而第二音频数据和第三音频数据对应的播放时间段相同,所以第二音频数据和第四音频数据对应的播放时间段也相同。
作为一种可能的实现方式,在进行人声消除处理或者人声降低处理时,电子设备100还可以先将第三音频数据输入至高通滤波器,以过滤掉特定频率的数据。然后,电子设备100 可以经高通滤波器输出的数据进行声道混合,以消除人声。最后,电子设备100可以将进行声道混合后的数据输入至低通滤波器,以过滤掉特定频率的数据,从而得到第四音频数据。
示例性的,在进行声道混合时,以左声道和右声道两个声道为例,可以在一个声道中设定两个声道对应的音频信号的比例。例如:新左声道里原左声道所占的百分数a1;新左声道里原右声道所占的百分数a2;新右声道里原左声道所占的百分数b1;新右声道里原右声道所占的百分数b2。a1、a2、a3、a4这四个数的数值在-100到100之间,则新左声道采样值newLeft=a1*Left+a2*Right,新右声道采样值newRight=b1*Left+b2*Right。
当选择进行人声消除处理时,为了实现左右声道的相减,声道混合的四个数值分别为:100,-100,-100,100,这样生成了一个左右声道波形相反的立体声波形。当一个声道中的两个波形相加后,即相互抵消,此时即完成对人声的消除。
当选择进行人声降低处理时,可以根据预先设定的降低比例,更改声道混合的四个数值。例如,当选择将人声的音量降低一半时,声道混合的四个数值可以分别为:100,-50,-50,100。这样,当一个声道中的两个波形相加后,即抵消一半,此时即完成对人声的降低。
S504、电子设备100根据第二音频数据,确定第二音频数据所需调整的第一增益,以及,基于第一增益,对第二音频数据中各个声道的增益进行调整,得到第五音频数据。
本实施例中,电子设备100可以先提取第二音频数据的音频特征,比如时域特征等。然后,再根据确定出的音频特征,确定第二音频数据所需调整的第一增益。其中,时域特征可以包括响度,包络能量,或者,短时能量等。
当需要提取第二音频数据的音频特征是响度时,可以由第二音频数据在时域上的波形图,确定出各个时刻的波形的幅值,进而确定出各个时刻的响度。其中,一个幅值为一个时刻的响度。另外,也可以根据需求选择特定的响度,比如最大的响度等。
当需要提取第二音频数据的音频特征是包络能量时,可以基于第二音频数据在时域上的波形图,构建第二音频数据对应的包络;然后通过积分计算该包络所围成的图形的面积,得到第二音频数据在时域上的平均包络能量,该平均包络能量即为所需的包络能量。示例性的,可以将时域波形图上各个时刻对应的幅值做比较,当后一时刻的幅值大于前一时刻的幅值时,基于两个幅值之间的差值和预先设定的控制因子控制两个时刻间的幅值的峰值的连线上升;当后一时刻的幅值小于前一时刻的幅值时,基于两个幅值之间的差值和预先设定的控制因子控制两个时刻间的幅值的峰值的连线下降;最后构成的曲线即为第二音频数据对应的包络。在一些实施例中,包络可以理解为在时域波形图上第二音频数据的幅值随时间的变化曲线。示例性的,如图6a所示,该图为第二音频数据的时域波形图,此时,图6a中的第二音频数据对应的包络的曲线可以为图6b所示,其中,图6b中包络的曲线与横轴之间的面积即为第二音频数据对应的平均包络能量。
当需要提取第二音频数据的音频特征是短时能量时,可以由第二音频数据在时域上的波形图,确定出各个时刻的波形的幅值,并对各个时刻的波形的幅值进行平方求和,以得到第二音频数据的短时能量。
在获取到第二音频数据的音频特征后,电子设备100可以基于确定出的音频特征和预先设定的第一增益计算公式,确定出第一增益。示例性的,第一增益计算公式可以为:
g=w1*(K1-x1)+w2*(K2-x2)+…+wn*(Kn-xn)          (公式1)
其中,g为增益;wn为预先设定的第n个权重值;Kn为预设的第n个门槛值;xn为第n个音频特征的最大值,比如,响度的最大值等。
在一些实施例中,电子设备100在确定第一增益时,还可以先对第二音频数据进行分帧处理,得到至少一个音频帧。然后,电子设备100可以前述的方式获取到各个音频帧对应的响度和/或短时能量等。
进一步地,当音频特征为响度时,可以从各个音频帧对应的响度中选取一个最大的响度,并将其代入上述的“公式1”,即可以得到第一增益。
当音频特征为包络能量时,可以从各个音频帧对应的包络能量中选取一个最大的包络能量,并将其代入上述的“公式1”,即可以得到第一增益。
当音频特征为短时能量时,可以从各个音频帧对应的短时能量中选取一个最大的短时能量,并将其代入上述的“公式1”,即可以得到第一增益。
当音频特征为响度和包络能量时,可以从各个音频帧对应的响度中选取一个最大的响度,以及从各个音频帧对应的包络能量中选取一个最大的包络能量,并将两者代入上述的“公式1”,即可以得到第一增益。
当音频特征为响度和短时能量时,可以从各个音频帧对应的响度中选取一个最大的响度,以及从各个音频帧对应的短时能量中选取一个最大的短时能量,并将两者代入上述的“公式1”,即可以得到第一增益。
当音频特征为响度、包络能量和短时能量时,可以从各个音频帧对应的响度中选取一个最大的响度,从各个音频帧对应的包络能量中选取一个最大的包络能量,以及从各个音频帧对应的短时能量中选取一个最大的短时能量,并将两者代入上述的“公式1”,即可以得到第一增益。
在确定出第二音频数据所需调整的第一增益后,电子设备100可以基于第一增益,对第二音频数据中各个声道的增益进行调整,得到第五音频数据。
在一些实施例中,当第二音频数据对应的最大的响度值超过一定值时,表明第二音频数据的响度能够满足要求。此时,在根据第二音频数据确定第一增益,且第一增益的单位使用分贝表示时,可以将第一增益的值置为0,以降低后续的计算量。这样,后续得到的第五音频数据即为第二音频数据。
S505、电子设备100根据第四音频数据,确定第四音频数据所需调整的第二增益,以及,基于第二增益,对第四音频数据中各个声道的增益进行调整,得到第六音频数据。
本实施例中,电子设备100可以先提取第四音频数据的音频特征,比如时域特征,乐理特征,或者,频域特征等。然后,再根据确定出的音频特征,确定第四音频数据所需调整的第二增益。其中,时域特征可以包括响度和/或短时能量等。乐理特征可以包括节拍,调式,和弦,音高,音色,旋律,情感等等。频域特征可以包括预先设定的多个频段的频谱能量等。
对于确定时域特征,可以参见S504中的描述,此处不再一一赘述。
对于确定乐理特征,电子设备100可以将第四音频数据输入至预先训练出的乐理特征确定模型,得到第四音频数据的乐理特征。示例性的,乐理特征确定模型可以使用高斯过程模型、神经网络模型、支持向量机等,对用于训练的音频数据进行训练得到。另外,还可以基于Krumhansl-Schmuckler调性分析算法确定出第四音频数据所包含的调式。此外,也可以基于Thayer情感模型确定出第四音频数据所包含的情感等。
对于确定频域特征,电子设备100可以对第四音频数据进行短时傅里叶变换(short time fourier transform,STFT),将该帧音频数据从时域转换至频域,得到第四音频数据对应的频谱图。由第四音频数据对应的频谱图,即可以得到第四音频数据对应的频谱能量。示例性 的,可以将第四音频数据划分成n个频段,每个频段中的各个频率均对应存在一个频谱能量,将每个频段中各个频率对应的频谱能量进行求和或均值计算即可以得到该频段对应的频谱能量。举例来说,如图7所示,该图为对第四音频数据进行短时傅里叶变换后得到的频谱图,横轴为频率,纵轴为频谱能量值;将第四音频数据划分成了3个频段,每个频段中的各个频率均对应有一个频谱能量,将这些频谱能量进行求和或均值计算就可以得到相应的频段(如频段1)对应的频谱能量。
在确定出第四音频数据的乐理特征和/或频域特征后,可以基于预先设定的第二增益计算公式,确定出第四音频数据所需调整的第二增益。示例性的,第二增益计算公式可以为:
g=w1*x1+w2*x2+…+wn*xn     (公式2)
其中,g为增益,wn为预先设定的第n个权重值,xn为第n个音频特征的值。
在确定出第四音频数据所需调整的第二增益后,电子设备100可以基于第二增益,对第四音频数据中各个声道的增益进行调整,得到第六音频数据。
在一些实施例中,电子设备100在确定第二增益时,还可以先对第四音频数据进行分帧处理,得到至少一个音频帧。然后,电子设备100可以对各个音频帧进行短时傅里叶变换(short time fourier transform,STFT),将该帧音频数据从时域转换至频域,得到各个音频帧对应的频谱图。由各个音频帧对应的频谱图,即可以得到各个音频帧对应的频谱能量。接着,可以选取频谱能量最大的一个音频帧作为所需的音频帧,以及采用前述的确定时域特在,乐理特征,或者,频域特征的方式,对该音频帧进行处理,以得到第四音频数据所需调整的第二增益。
在一些实施例中,在基于前述的方式,确定出第二增益后,为了使得在后续播放第五音频数据产生的声音更容易被感知,还可以基于预先设定的第一增益和第二增益之间的线性关系,对第二增益进行修正,以得到所需的第二增益。示例性的,第一增益和第二增益之间的线性关系可以为:
g=g1*K+g2
其中,g为修正后的第二增益,g1为第一增益,g2为修正前的第二增益,K为常数。
S506、电子设备100同时播放第五音频数据和第六音频数据。
本实施例中,电子设备100在获取到第五音频数据和第六音频数据后,可以同时播放第五音频数据和第六音频数据。这样,用户在能够清楚感知到原有的第二音频数据中所包含的信息的同时,也可以清楚的感知到原有的第一音频数据的曲调、背景声等,从而更加有效的满足了用户听感,提升了用户体验。
在一些实施例中,在确定第四音频数据所需调整的第二增益时,除了前述S505中所描述的方式外,还可以根据第五音频数据(即基于第一增益对第二音频数据进行调整后得到的数据),确定第二增益。
示例性的,可以根据第五音频数据的最大响度值,和,实时计算的第二音频数据的最大响度值和第四音频数据的最大响度值间的比例,确定第二增益。或者,根据第五音频数据的最大响度值,和,预先设定的第二音频数据的最大响度值和第四音频数据的最大响度值间的比例,确定第二增益。
举例来说,若第二音频数据的最大响度值和第四音频数据的最大响度值间的比例为f,第二音频数据当前的最大响度值为A,第五音频数据的最大响度值为B,则由f和B可以确定出第六音频数据(即基于第二增益对第四音频数据进行调整后得到的数据)的最大响度值为 fB。由fB和A的差值,可以确定出第四音频数据需要调整的响度值。根据响度值与增益之间的映射关系,可以确定出第四音频数据所需调整的第二增益。
在一些实施例中,在S505中,在确定出第四音频数据所需调整的第二增益后,可以将该第二增益与预先设定的增益值(比如0、0.1等)进行比较。当第二增益大于预先设定的增益值时,表明播放第四音频数据产生的声音较小,其对播放S504中得到的第五音频数据产生的声音造成影响较小,因此可以将确定出的第二增益的值更新为预先设定的增益值。例如,若第二增益的单位使用标准化值(比如放大倍数等)表示时,若确定出的第二增益的值为0.2,预先设定的增益值为0.1,此时则可以将第二增益的值由0.2调整为0.1。
在一些实施例中,在S505中,当基于第二增益,对第四音频数据中各个声道的增益进行调整,得到第六音频数据时,可以在开始播放之后且与开始播放的时刻相距预设的时长内,以一定步长将需调整的增益由预设值(比如0、1等)逐渐调整至第二增益,以及,在结束播放之前且与结束播放的时刻相距预设的时长内,以一定步长将需调整的增益由第二增益调整至预设值(比如0、1等)。由此以在过渡到播放第六音频数据时,或者,由播放第六音频数据过渡到播放第一音频数据中的其他数据时,避免出现音量突变的情况,提升用户体验。
另外,还可以在开始播放之前且与开始播放的时刻相距预设的时长内,以一定步长将需调整的增益由预设值(比如0、1等)逐渐调整至第二增益,以及,在结束播放之后且与结束播放的时刻相距预设的时长内,以一定步长将需调整的增益由第二增益调整至预设值(比如0、1等)。由此以在过渡到播放第六音频数据时,或者,由播放第六音频数据过渡到播放第一音频数据中的其他数据时,避免出现音量突变的情况,提升用户体验。
也可以在开始播放之前且与开始播放的时刻相距预设的时长内,以一定步长将需调整的增益由预设值(比如0、1等)逐渐调整至第二增益,以及,在结束播放之前且与结束播放的时刻相距预设的时长内,以一定步长将需调整的增益由第二增益调整至预设值(比如0、1等)。由此以在过渡到播放第六音频数据时,或者,由播放第六音频数据过渡到播放第一音频数据中的其他数据时,避免出现音量突变的情况,提升用户体验。
亦可以在开始播放之后且与开始播放的时刻相距预设的时长内,以一定步长将需调整的增益由预设值(比如0、1等)逐渐调整至第二增益,以及,在结束播放之后且与结束播放的时刻相距预设的时长内,以一定步长将需调整的增益由第二增益调整至预设值(比如0、1等)。由此以在过渡到播放第六音频数据时,或者,由播放第六音频数据过渡到播放第一音频数据中的其他数据时,避免出现音量突变的情况,提升用户体验。
2.2、持续性播放的音频数据和偶发性播放的音频数据,是通过不同的电子设备播放。
示例性的,图8示出了本申请一些实施例中的另一种声音处理方法。在图8中,第一设备与第二设备为分离的设备,且第一设备与第二设备之间可以但不限于通过蓝牙等短距通信方式建立连接。在图8中,第一设备上配置有能够持续播放音频数据相关的软件(比如Apple等),或者,第一设备可以为能够持续播放音频数据的设备,比如智能电视、智能音箱等,且第一设备正在使用其自身所拥有的扬声器播放音频数据。第二设备上配置有能够偶发性播放音频数据相关的软件(比如通话、Google等);其中,第二设备上产生的声音是通过其自身所拥有的扬声器播放。示例性的,第一设备可以为智能电视、智能音箱、车载终端等;第二设备可以为手机、平板电脑等。如图8所示,该方法可以包括以下步骤:
S801、当第二设备需要播报音频数据时,第二设备向第一设备发送第一消息,第一消息用于指示第一设备执行人声消除或人声降低操作。
本实施例中,当第二设备需要播报音频数据时,第二设备可以向第一设备发送第一消息,以指示第一设备执行人声消除或人声降低操作。
示例性的,在家居场景中,第二设备可以为手机,第一设备可以为智能音箱、智能电视等。在该场景下,第一设备可以正在播放音乐、电视剧或者电影等,第二设备需播报的音频数据可以是用户使用第二设备进行通话时第二设备需播放的音频数据。也即是说,在家居场景中,当用户需要使用第二设备进行通话时(例如,当第二设备收到来电时,或者用户接通第二设备上的来电时),第二设备可以向第一设备发送指示第一设备进行人声消除或人声降低操作的消息。
在驾车场景中,第二设备可以为手机(例如图5中所示的电子设备100),第一设备可以为车载终端(例如图4中所示的车辆200)。在该场景下,第一设备可以正在播放音乐等,第二设备需播报的音频数据可以是用户使用第二设备进行导航或通话时第二设备需播放的音频数据。也即是说,在驾车场景中,当第二设备需要播放导航音频数据,或者,用户需要使用第二设备进行通话时,第二设备可以向第一设备发送指示第一设备进行人声消除或人声降低操作的消息。
S802、第一设备响应于第一消息,对其待播放的音频数据进行人声消除或人声降低操作。
本实施例中,当选择进行人声消除操作时,第一设备可以通过前述的人声消除方式对其待播放的音频数据进行人声消除操作。当选择进行人声降低操作时,第一设备可以通过前述的人声降低方式对其待播放的音频数据进行人声降低操作。
在一些实施例中,当第二设备待播放的音频数据为导航音频数据时,第一消息中可以包括导航音频数据的初始播放时间和数据长度。第一设备可以在获取到第一消息后,从其待播放的音频数据中提取出与该初始播放时间和数据长度均相等的子数据,并对该子数据进行人声消除操作,其中,该子数据的初始播放时间与导航音频数据的初始播放时间相同,该子数据的数据长度与导航音频数据的长度相等。
S803、第二设备播报音频数据,以及,第一设备播放进行人声消除或人声降低后的音频数据。
S804、当第二设备结束播报音频数据时,第二设备向第一设备发送第二消息,第二消息用于指示第一设备停止执行人声消除或人声降低操作。
本实施例中,当第二设备结束播报音频数据时,第二设备向第一设备发送第二消息,第二消息用于指示第一设备停止执行人声消除或人声降低操作。
示例性的,当用户使用第二设备进行通话时,在用户结束通话时(例如,用户挂断电话时),第二设备可以将其结束通话的状态告知第一设备,从而使得第一设备可以停止执行人声消除或人声降低操作。当用户使用第二设备进行导航时,在第二设备结束导航播报时,第二设备可以将其结束导航播报的状态告知第一设备,从而使得第一设备可以停止执行人声消除或人声降低操作。
S805、第一设备响应于第二消息,停止对其待播放的音频数据进行人声消除或人声降低操作,以及播放未进行人声消除或人声降低的音频数据。
这样,在第二设备播报音频数据的过程中,可以降低第一设备所播放的音频数据的干扰,使得用户能够清楚的感知到第二设备播放的音频数据。
3、利用空间中设置的扬声器播放音频数据的场景。
3.1、在空间中配置有多个扬声器,且至少有一部分扬声器是按照一定的要求(比如:5.1.X,或,7.1.X等)布置,以及,电子设备或者其他的设备正在使用扬声器播放音频数据。
示例性的,图9的(A)示出了本申请一些实施例中的一种应用场景。如图9的(A)所示,可以按照5.1.X的要求在房间内的固定位置配置扬声器,以使用户享受到极致的影院级的声音。其中,5.1.X中,5代表构建空间环绕声的扬声器的数量,1代表低音炮,X代表在房间的顶部需要设置的扬声器的数量。在图9的(A)中,扬声器201布置在用户A所处位置的正前方;扬声器202布置在用户A的右前方,比如,扬声器202可以布置在以用户A所处的位置和扬声器201间的连线为基准线,并以用户A所处的位置为圆心,向右偏30度的位置处;扬声器203布置在用户A的右后方,比如,扬声器203可以布置在以用户A所处的位置和扬声器201间的连线为基准线,并以用户A所处的位置为圆心,向右偏120度的位置处;扬声器204布置在用户A的左后方,比如,扬声器204可以布置在以用户A所处的位置和扬声器201间的连线为基准线,并以用户A所处的位置为圆心,向左偏120度的位置处;扬声器205布置在用户A的左前方,比如,扬声器205可以布置在以用户A所处的位置和扬声器201间的连线为基准线,并以用户A所处的位置为圆心,向左偏30度的位置处。通过调整扬声器201、202、203、204和205输出的音频信号的增益,可以使得用户A在当前所处的位置处享受到空间环绕声。
图9的(B)示出了本申请一些实施例中的另一种应用场景。如图9的(B)所示,可以按照7.1.X的要求在房间内的固定位置配置扬声器,以使用户享受到极致的影院级的声音。在图9的(B)中,扬声器201布置在用户A所处位置的正前方;扬声器202布置在用户A的右前方,比如,扬声器202可以布置在以用户A所处的位置和扬声器201间的连线为基准线,并以用户A所处的位置为圆心,向右偏30度的位置处;扬声器203布置在用户A的正右方,比如,扬声器203可以布置在以用户A所处的位置和扬声器201间的连线为基准线,并以用户A所处的位置为圆心,向右偏90度的位置处;扬声器204布置在用户A的右后方,比如,扬声器204可以布置在以用户A所处的位置和扬声器201间的连线为基准线,并以用户A所处的位置为圆心,向右偏150度的位置处;扬声器205布置在用户A的左后方,比如,扬声器205可以布置在以用户A所处的位置和扬声器201间的连线为基准线,并以用户A所处的位置为圆心,向左偏150度的位置处;扬声器206布置在用户A的正左方,比如,扬声器206可以布置在以用户A所处的位置和扬声器201间的连线为基准线,并以用户A所处的位置为圆心,向左偏90度的位置处;扬声器207布置在用户A的左前方,比如,扬声器207可以布置在以用户A所处的位置和扬声器201间的连线为基准线,并以用户A所处的位置为圆心,向左偏30度的位置处。通过调整扬声器201、202、203、204、205、206和207输出的音频信号的增益,可以使得用户A在当前期所处的位置处享受到空间环绕声。
但在图9中当用户A离开其当前所处的位置时,用户A在其他的位置处将不能享受到空间环绕声。
为使得用户能够随时随地享受到空间环绕声,本申请实施例中提供了一种声音处理方法,可以基于用户与各个扬声器之间的距离,调整各个扬声器输出的音频信号的增益,从而使得用户可以随时随地享受到空间环绕声。
示例性的,图10的(A)示出了本申请一些实施例中的又一种应用场景。图10的(A) 中所示的场景与图9中所示的场景主要的不同之处在于:图10的(A)中所示的空间中配置有图像采集装置,比如摄像头300,和/或,用户A携带有电子设备100。
在图10的(A)所示的场景中,摄像头300可以采集用户A在空间中的图像,以由采集到的图像确定出用户A与各个扬声器之间的距离。
在一些实施例中,摄像头300可以与用于控制各个扬声器的控制器(图中未示出)通过有线网络或无线网络(比如蓝牙等)建立连接,这样,摄像头300可以将其采集到的图像传输至控制器,以由控制器对图像进行处理,比如将图像输入至预先训练好的图像处理模型,以由控制器根据该模型输出用户A与各个扬声器之间的距离。示例性的,图像处理模型可以但不限于是基于卷积神经网络(convolutional neural network,CNN)训练得到。在另一些实施例中,摄像头300可以与电子设备100通过无线网络(比如蓝牙等)建立连接,这样,摄像头300可以将其采集到的图像传输至电子设备100,以由电子设备100对图像进行处理,比如将图像输入至预先训练好的图像处理模型,以由电子设备100根据该模型输出用户A与各个扬声器之间的距离。
在一些实施例中,电子设备100可以与各个扬声器通过无线网络(比如蓝牙等)建立连接。此时,除了可以通过摄像头300采集到的图像确定用户A与各个扬声器之间的距离外,还可以基于电子设备100与各个扬声器之间的无线通信信号确定,例如:可以通过基于接收信号的强度指示(received signal strength indication,RSSI)测距方法确定电子设备100与各个扬声器之间的距离。由于电子设备100被用户A携带,因此,确定出电子设备100与各个扬声器之间的距离,即确定出用户A与各个扬声器之间的距离。应理解的,确定用户A与各个扬声器间的距离的执行主体可以是电子设备100,也可以是用于控制各个扬声器的控制器(图中未示出),此处不做限定。示例性的,当电子设备100为执行主体确定电子设备100与扬声器的距离时,可以通过下述“公式一”确定电子设备100与某个扬声器之间的距离,该公式为:
其中,d为电子设备100与扬声器间的距离;abs为绝对值函数;RSSI为电子设备100获取到的扬声器发送的消息对应的RSSI;A为电子设备100与扬声器相隔1米时,电子设备100获取到的扬声器发送的消息对应的RSSI,该值可以预先标定;n为环境衰减因子,其可以为经验值。当用于控制各个扬声器的控制器为执行主体确定电子设备100与扬声器的距离时,可以参考电子设备100为执行主体确定电子设备100与扬声器的距离的方式,此处不再赘述。
在一些实施例中,在确定出电子设备100与扬声器间的距离后,可以利用三点定位法对电子设备100与至少三个扬声器间的距离进行处理,以得到电子设备100的位置。另外,由电子设备100在不同时刻的位置可以获取到电子设备100的移动距离。其中,由于电子设备100是由用户携带,因此,电子设备100的移动距离即为用户的移动距离。
示例性的,图10的(B)示出了本申请一些实施例中的又一种应用场景。图10的(B)中所示的场景与图10的(A)中所示的场景主要的不同之处在于:图10的(B)中所示的空间中在由扬声器201至205围成的区域之外还配置有其他的扬声器,比如,扬声器208、209等。在图10的(B)中,当用户A移动至扬声器202、208、209、210和203所围成的区域时,可以在该区域内控制产生空间环绕声。应当理解的是,在图10的(B)中所示的空间中在由扬声器201至205围成的区域之外配置的扬声器,除了可以与扬声器201至205处于一个空间内之外,还可以处于与扬声器201至205所处的空间相邻的某个空间中,此处不做限定。 另外,图10中示出的是按照5.1.X的要求配置扬声器的场景,对于按照其他要求配置扬声器的场景,可以参考图10中的描述,此处不再赘述。
在一些实施例中,在图10所示的场景中,用户A可以在电子设备100上配置摄像头和/或各个扬声器在空间中的位置,和/或,配置摄像头和/或各个扬声器的标识等,以便于后续确定用户A与各个扬声器之间的距离,以及便于后续选择所需的扬声器。示例性的,电子设备100上可以安装有用于配置摄像头和/或扬声器的应用程序(application,APP),用户A可以登录该APP进行配置。在另一些实施例中,在图10所示的场景中,各个扬声器可以根据与电子设备之间的距离,自动识别其在空间中的位置,并显示在电子设备100安装的APP界面中。用户A还可以在该APP中对各个扬声器在空间中的位置进行调整。
接下来,基于上文所描述的内容,对本申请实施例提供的一种声音处理方法进行详细介绍。
示例性的,图11示出了本申请一些实施例中的一种声音处理方法的流程。在图11中,电子设备100与各个扬声器之间可以但不限于通过蓝牙建立连接。在图11中,扬声器中所播放的音频信号可以是电子设备100中的音频信号,也可以是其他设备中的音频信号,此处不做限定。在图11中,用户的移动区域可以是构建空间环绕声的扬声器所围成的区域,比如:图10的(A)中扬声器201至205所围成的区域等,也可以是其他的区域,比如:图10的(A)中扬声器201至205所围成的区域之外的区域,此处不做限定。另外,图11中所示的方法可以实时的执行,也可以在满足一定条件时再执行,比如,当检测到用户移动的距离大于一定阈值时再执行,此处不做限定。如图11所示,该声音处理方法包括以下步骤:
S1101、电子设备100确定其与N个扬声器之间的距离,以得到N个第一距离,N为正整数。
本实施例中,电子设备100可以基于用户所处的空间中配置有图像采集装置采集到图像,确定其与各个扬声器之间的距离,以得到N个第一距离。另外,电子设备100也可以基于其与各个扬声器之间的无线通信信号,确定其与各个扬声器之间的距离,以得到N个第一距离。其中,N为正整数。可选地,N≥5。
在一些实施例中,N个扬声器可以为按照某个要求(比如5.1.X或7.1.X等)配置的构建空间环绕声的扬声器。例如,N个扬声器可以为图10的(A)中所示的扬声器201至205。在另一些实施例中,N个扬声器可以为空间中所有的扬声器,例如图10的(B)所示的扬声器201至205、以及扬声器208至210。
S1102、电子设备100基于N个第一距离,从N个扬声器中筛选出目标扬声器,目标扬声器与电子设备100间的距离最短。
本实施例中,电子设备100可以对N个第一距离进行排序,比如由大到小或者由小到大排序等,并从中挑选出最小的一个第一距离,以及将该最小的第一距离对应的扬声器作为目标扬声器。
在一些实施例中,目标扬声器也可以是其他的扬声器,比如,与电子设备100距离最远的扬声器等,具体可根据实际情况而定,此处不做限定。
S1103、电子设备100以其与目标扬声器间的距离为基准,确定除目标扬声器之外的各个扬声器对应的音频信号所需调整的增益,以构建出第一扬声器组,其中,第一扬声器组为将N个扬声器均虚拟至以电子设备100为中心,且以电子设备100与目标扬声器间的距离为半 径的圆上得到的扬声器的组合。在一些实施例中,一个音频数据中可以但不限于包括各个相应的扬声器所需播放的音频信号。示例性的,一个音频数据中所包含的每个音频信号均可以与一个声道相对应。
本实施例中,电子设备100可以选择以其与目标扬声器间的距离为基准,并根据该基准距离和其他的扬声器与电子设备100间的距离,确定出其他的扬声器对应的音频信号所需调整的增益,以将其他的扬声器均虚拟至以电子设备100与目标扬声器间的距离为半径的圆上,从而构建出第一扬声器组。
在一些实施例中,若电子设备100与目标扬声器间的距离为d1,电子设备100与除目标扬声器之外的其中一个扬声器间的距离为d2,则该其中一个扬声器对应的音频信号所需调整的增益gi=d2/d1。另外,在确定其他扬声器对应的音频信号所需调整的增益时,电子设备100也可以选用其他的线性模型进行确定,例如,上述的其中一个扬声器对应的音频信号所需调整的增益可以为gi=Q(d2/d1)+P,其中,Q和P均为常数,具体可根据实际情况而定,此处不做限定。
此外,在构建第一扬声器组过程中,电子设备100可以记录每个真实的扬声器对应的音频信号所需调整的增益,以得到第一增益集合。
S1104、电子设备100以其当前的朝向为基准,并基于第一扬声器组,构建虚拟扬声器组,虚拟扬声器组由M个虚拟扬声器组成,M的值与构建空间环绕声所需的扬声器的数量相等,虚拟扬声器组中各个虚拟扬声器的布置方式与构建空间环绕声所需的扬声器的布置方式相同。
本实施例中,电子设备100可以先基于第一扬声器组,在其朝向上确定出一个虚拟扬声器,然后,再基于预先设定的构建空间环绕声所需的扬声器的布置方式(比如5.1.X或7.1.X的布置方式等)确定出剩余的虚拟扬声器,从而构建出虚拟扬声器组。其中,虚拟扬声器可以理解为是虚拟的扬声器。
在一些实施例中,当在第一扬声器组中位于电子设备100的朝向上存在一个扬声器,或者,在其朝向上的预设角度范围内存在一个扬声器时,可以将该扬声器定为虚拟扬声器组中的中置扬声器。中置扬声器可以理解为在电子设备100的朝向上,且处于0度方向上的扬声器,比如图10的(A)中所示的扬声器201。示例性的,电子设备100的朝向可以理解为是由电子设备100的底部朝向其顶部的方向。其中,对于电子设备100的顶部和底部,以电子设备100为手机为例,如图12所示,手机的听筒1201所在的位置可以为手机的顶部,手机上与听筒1201相对的位置1202可以为手机的底部,箭头1203所指的方向即为手机的朝向。可选地,当电子设备100的显示屏与水平面不平行时,电子设备100的朝向可以由电子设备100在水平面上的投影确定,此时电子设备100的朝向可以为电子设备100的底部在水平面上的投影朝向其顶部在水平面上的投影的方向。
当在第一扬声器组中位于电子设备100的朝向上不存在扬声器,或者,在其朝向上的预设角度范围内不存在扬声器时,可以由在第一扬声器组中位于该朝向上左右相邻的两个扬声器虚拟出一个扬声器,并将该虚拟出的扬声器作为虚拟扬声器组中的中置扬声器。其中,在由第一扬声器组中的两个扬声器虚拟出一个虚拟的扬声器时,可以通过调整第一扬声器组中的两个扬声器对应的音频信号的增益,以虚拟出一个虚拟的扬声器。示例性的,对于电子设备100的朝向上的预设角度范围,继续参阅图12,当预设角度为α时,电子设备100的朝向上的预设角度范围为由角度α所构建出的区域。示例性的,可以利用向量基础振幅平移 (vector base amplitude panning,VBAP)算法由两个扬声器虚拟出一个扬声器。应理解的是,当在电子设备100的朝向上存在一个扬声器时,也可以理解为是虚拟出一个扬声器,只是这个虚拟的扬声器本质上是第一扬声器组中的一个扬声器。
举例来说,如图13所示,若第一扬声器组中包括扬声器SP1和SP2,扬声器SP1和SP2在以用户U11为中心的圆上,且用户U11的当前的朝向为矢量P所指的方向。在该情况下,可以利用扬声器SP1和SP2的位置将声音固定在虚拟扬声器VSP1的位置处。例如,在将用户U11的位置作为原点O的情况下,其具有垂直方向和水平方向分别作为x轴方向和y轴方向的二维坐标系。在该二维坐标系中,虚拟扬声器VSP1的位置可以由矢量P表示。由于矢量P是二维矢量,因此矢量P可以由以原点O作为起始点、分别在扬声器SP1的方向和扬声器SP2的方向上延伸的矢量L1和L2的线性和来表示,即,P=g1L1+g2L2。其中,计算出g1和g2后,以系数g1作为扬声器SP1对应的音频信号的增益,以系数g2作为扬声器SP2对应的音频信号的增益,即可以将声音固定在虚拟扬声器VSP1的位置处。在图13中,通过调整g1和g2的值,可以使虚拟扬声器VSP1位于将扬声器SP1和SP2连接的弧线AR11上的任意位置处。
在确定出虚拟扬声器组中的虚拟的中置扬声器后,可以按照预先设定的构建空间环绕声所需的扬声器的布置方式确定出剩余的虚拟扬声器,从而构建出虚拟扬声器组。例如,以构建5.1.X要求的虚拟扬声器组为例,在确定出中置扬声器后,可以确定用户U11右前方、右后方、左后方和左前方的虚拟扬声器。可选的,如前所述,用户U11右前方的虚拟扬声器位于以用户U11所处的位置和扬声器VSP1间的连线为基准线,并以用户U11所处的位置为圆心,向右偏30度的位置处;用户U11右后方的虚拟扬声器位于以用户U11所处的位置和扬声器VSP1间的连线为基准线,并以用户U11所处的位置为圆心,向右偏120度的位置处;用户U11左后方的扬声器位于以用户U11所处的位置和扬声器VSP1间的连线为基准线,并以用户U11所处的位置为圆心,向左偏120度的位置处;用户U11左前方的扬声器位于以用户U11所处的位置和扬声器VSP1间的连线为基准线,并以用户U11所处的位置为圆心,向左偏30度的位置处。其中,在确定剩余的虚拟扬声器时,当特定角度或特定角度范围不存在扬声器时,通过左右两个扬声器虚拟出虚拟扬声器,具体可以参考前述的确定中置扬声器的方式,此处不再一一赘述。
在根据第一扬声器组,构建虚拟扬声器组的过程中,可以记录第一扬声器组中每个扬声器对应的音频信号所需调整的增益,从而得到第二增益集合。
在一些实施例中,在构建虚拟扬声器组完毕后,当得到虚拟扬声器的数量未满足构建空间环绕声所需的扬声器的数量时,还可以由已得到的虚拟扬声器虚拟出所需的扬声器。其中,由已得到的虚拟扬声器虚拟出所需的扬声器的方式,可以参考前述的确定中置扬声器的方式,此处不再一一赘述。
S1105、电子设备100控制虚拟扬声器组播放音频数据。
本实施例中,电子设备100构建出虚拟扬声器组后,可以控制该虚拟扬声器组播放音频数据。其中,虚拟扬声器组所播放的音频数据可以根据前述确定出的第一增益集合和第二增益集合,对音频数据中的各个声道的增益进行调整得到。
举例来说,如图14的(A)所示,以空间中布置的两个扬声器SP1和SP2,且扬声器SP1与用户U11(即电子设备100)间的距离为d1,扬声器SP2与用户U11间的距离为d2为例。如图14的(B)所示,在构建第一扬声器组时,若以d1为基准,可以将扬声器SP2虚拟至圆C1上,即得到扬声器SP2’。接着,如图14的(C)所示,在构建虚拟扬声器组时,可以由 扬声器SP1和SP2’虚拟出一个虚拟扬声器VSP1。
在图14的(B)中,假设确定出的扬声器SP2对应的音频信号所需调整的增益为g1,由于是以d1为基准,因此可以不用调整扬声器SP1对应的音频信号的增益,此时可以将扬声器SP1对应的音频信号所需调整的增益设定为g0。其中,当g0的单位为分贝(decibe l,DB)时,其取值可以为0;当g0的单位是标准化值(比如放大倍数等)时,其取值可以为1。因此,在图14的(B)中得到的第一增益集合为:扬声器SP1对应的音频信号所需调整的增益为g0,扬声器SP2对应的音频信号所需调整的增益为g1。
在图14的(C)中,假设确定出的扬声器SP1所需调整的增益为g2,扬声器SP2’所需调整的增益为g3。因此,在图14的(C)中得到的第二增益集合为:扬声器SP1对应的音频信号所需调整的增益为g2,扬声器SP2’对应的声道所需调整的增益为g3。
由图14的(B)中确定出的第一增益集合和图14的(C)中确定出的第二增益集合,可以确定出扬声器SP1对应的音频信号最终所需调整的增益为g2,扬声器SP2对应的音频信号最终所需调整的增益为gi=g1*g3,或者,gi=g1+g3等。其中,当所需调整的增益的单位为分贝时,可以采用相加的方式,当所需调整的增益的单位为标准化值时,可以采用相乘的方式。
最后,电子设备100可以基于确定出各个真实扬声器对应的音频信号所需调整的增益,对音频数据中相应的声道的增益进行调整,以得到所需的音频数据,并将相应的声道对应的信号发送至相应的扬声器,从而在听感上使得声音就像是通过虚拟扬声器组播放产生的一样。这样,用户感知到的声音近似是在其身边产生,从而使得用户可以随时随地的享受到空间环绕声。
在一些实施例中,当存在用户与扬声器间的距离大于预设距离阈值时,可以分别确定各个扬声器对应的时延,以便各个扬声器可以同步播放同一个音频数据。示例性的,可以选择最大的一个第一距离为基准,并由该距离确定出其他的各个扬声器的时延。例如,若确定出的基准距离为d1,电子设备100与其中一个虚拟扬声器间的距离为d2,则该其中一个扬声器的时延delay=(d1-d2)/v,其中,v为声音在空气中的传播速度。
在确定出各个扬声器对应的时延后,电子设备100可以控制各个扬声器按照对应的时延播放音频数据。
这样,用户在移动过程中,跟随用户的移动随时随地调整各个扬声器的增益,从而使得用户可以随时随地的享受到空间环绕声。
为了便于理解上述方案,下面举例进行说明。
示例性的,如图15的(A)所示,在空间中布置于5个扬声器,即扬声器SP1、SP2、SP3、SP4和SP5,用户所使用的电子设备100处于位置a1处,且用户在5个扬声器所围成的区域中移动。
在图15的(B)中,电子设备100的位置由位置a1切换至位置a2处,此时触发执行上述图11中的方法。其中,可以假设扬声器SP2在电子设备100的朝向上。
在图15的(C)中,由于电子设备100与扬声器SP2之间的距离最短,因此可以选用该距离为基准距离,以及将扬声器SP1、SP3、SP4和SP5均虚拟至以该基准距离为半径且以位置a2为圆心的圆C1上。在图15的(C)中,扬声器SP1对应的虚拟扬声器为SP1’,扬声器SP3对应的虚拟扬声器为SP3’,扬声器SP4对应的虚拟扬声器为SP4’,扬声器SP5对应的虚拟扬声器为SP5’。
在图15的(D)中,电子设备100可以按照5.1.X的要求构建出虚拟扬声器组。该虚拟 扬声器组由扬声器SP2、VSP1、VSP2、SP4’和SP1’组成。其中,扬声器VSP1由扬声器SP2和SP3’虚拟得到,扬声器VSP2由扬声器SP2和SP1’虚拟得到。可以理解,扬声器SP2、SP4’和SP1’位于满足条件的角度或角度范围内。
最后,电子设备100可以控制该虚拟扬声器组播放音频数据。
示例性的,如图16的(A)所示,在空间中布置于7个扬声器,即扬声器SP1、SP2、SP3、SP4、SP5、SP6和SP8,用户所使用的电子设备100处于位置a1处。
在图16的(B)中,电子设备100的位置由位置a1切换至位置a2处,此时触发执行上述图11中的方法。其中,可以假设扬声器SP5在电子设备100的朝向上。
在图16的(C)中,由于电子设备100与扬声器SP5之间的距离最短,因此可以选用该距离为基准距离,以及将扬声器SP1、SP2、SP3、SP4、SP6和SP8均虚拟至以基准距离为半径且以位置a2为圆心的圆C1上。在图16的(C)中,扬声器SP1对应的虚拟扬声器为SP1’,扬声器SP2对应的虚拟扬声器为SP2’,扬声器SP3对应的虚拟扬声器为SP3’,扬声器SP4对应的虚拟扬声器为SP4’,扬声器SP6对应的虚拟扬声器为SP6’,扬声器SP8对应的虚拟扬声器为SP8’。
在图16的(D)中,电子设备100可以按照5.1.X的要求构建出虚拟扬声器组。该虚拟扬声器组由扬声器SP5、VSP1、SP6’、VSP2、和SP3’组成。其中,扬声器VSP1由扬声器SP1’和SP6’虚拟得到,扬声器VSP2由扬声器SP8’和SP4’虚拟得到。可以理解,扬声器SP5、SP6’和SP3’位于满足条件的角度或角度范围内。
最后,电子设备100可以控制该虚拟扬声器组播放音频数据。
示例性的,图17示出了本申请一些实施例中的一种声音处理方法的流程。在图17中,电子设备100与各个扬声器之间可以但不限于通过蓝牙建立连接。在图17中,扬声器中所播放的音频信号可以是电子设备100中的音频信号,也可以是其他设备中的音频信号,此处不做限定。在图17中,用户的移动区域可以是构建空间环绕声的扬声器所围成的区域,比如:图10的(A)中扬声器201至205所围成的区域等,也可以是其他的区域,比如:图10的(A)中扬声器201至205所围成的区域之外的区域,此处不做限定。另外,图17中所示的方法可以实时的执行,也可以在满足一定条件时再执行,比如,当检测到用户移动的距离大于一定阈值时在执行,此处不做限定。如图17所示,该声音处理方法包括以下步骤:
S1701、电子设备100确定其与N个扬声器之间的距离,以得到N个第一距离,N为正整数。
S1702、电子设备100以其朝向为基准,构建第一虚拟扬声器组,第一虚拟扬声器组由M个虚拟扬声器组成,且M的值与构建空间环绕声所需的扬声器的数量相同。
本实施例中,电子设备100可以先在其朝向上确定出一个虚拟扬声器,然后,再基于预先设定的构建空间环绕声所需的扬声器的布置方式(比如5.1.X或7.1.X的布置方式等)依次确定出剩余的虚拟扬声器,从而构建出第一虚拟扬声器组。
在一些实施例中,当电子设备100的朝向上存在一个扬声器,或者,在其朝向上的预设角度范围内存在一个扬声器时,可以将该扬声器定为虚拟扬声器组中的中置扬声器。
当电子设备100的朝向上不存在扬声器,或者,在其朝向上的预设角度范围内不存在扬声器时,可以由该朝向上左右相邻的两个扬声器虚拟出一个扬声器,并将该虚拟出的扬声器作为虚拟扬声器组中的中置扬声器。其中,在由两个真实的扬声器虚拟出一个虚拟的扬声器 时,可以通过调整两个真实的扬声器的增益,以虚拟出一个虚拟的扬声器。详见前述图13中的描述,此处不再赘述。
另外,当电子设备100的朝向上不存在扬声器(或者,在其朝向上的预设角度范围内不存在扬声器),且该朝向上左右相邻的两个扬声器与电子设备100间的距离不相等时,可以先通过调整两者中的至少一个对应的声道所需调整的增益,将两者虚拟到以电子设备100为中心的圆上;然后,再通过图13中的方式虚拟出一个扬声器VSP1。例如,继续参阅图14,如图14的(A)所示,扬声器SP1和SP2未同时在以用户U11(即电子设备100)为中心的圆上,扬声器SP1和SP2与用户U11间的距离分别为d1和d2,且d1<d2。此时,一种可能的实现方式中,可以将d1作为所需的圆C1的半径。接着,如图14的(B)所示,可以通过前述S1103中描述的方式调整扬声器SP2对应的音频信号所需调整的增益,例如,扬声器SP2对应的音频信号所需调整的增益可以为gi=d2/d1,以将扬声器SP2虚拟至以用户U11为中心,以d1为半径的圆C1上,图中扬声器SP2’为由扬声器SP2虚拟出的扬声器。在图14的(B)中,d2’=d1。之后,在图14的(C)中,可以通过图14中所描述的方式由扬声器SP1和SP2’虚拟出一个虚拟的扬声器VSP1。例如,扬声器SP1对应的增益为g1,扬声器SP2’对应的增益为g2。示例性的,若由扬声器SP2虚拟出扬声器SP2’时,扬声器SP2对应的音频信号所需调整的增益可以为gi,由扬声器SP1和SP2’虚拟出VSP1时,扬声器SP1对应的音频信号所需调整的增益为g1,扬声器SP2’对应的声道所需调整的增益为g2,则在本实现方式中,扬声器SP1的增益为g1,扬声器SP2的总增益为gi与g2的乘积,或者gi与g2相加的和等。其中,当所需调整的增益的单位为分贝时,可以采用相加的方式,当所需调整的增益的单位为标准化值(比如放大倍数等)时,可以采用相乘的方式。
另一种可能的实现方式中,也可以将d2作为所需的圆C1的半径,进而,可以通过前述S1103中描述的方式调整SP1对应的音频信号所需调整的增益,并进一步虚拟出扬声器VSP1。再一种可能的实现方式中,也可以选取d1和d2范围内的任一值作为所需的圆C1的半径,进而,可以通过前述类似方式调整扬声器SP1和SP2对应的音频信号所需调整的增益,并进一步最终虚拟出扬声器VSP1。其具体实现方式可以参考图13及其描述,本申请在此不再赘述。
进一步地,在确定剩余的虚拟扬声器时,可以参考确定中置扬声器的过程,此处不再赘述。
此外,在构建第一虚拟扬声器组的过程中,电子设备100可以记录每个真实的扬声器对应的音频信号所需调整的增益,以得到第一增益集合。
S1703、电子设备100确定其与第一虚拟扬声器组中各个虚拟扬声器间的距离,以得到M个第二距离。
本实施例中,电子设备100在构建第一虚拟扬声器组时,均是以其与某个扬声器间的距离为基准构建,且均是在以其自身为中心,且以该距离为半径的圆上虚拟出一个扬声器。因此,某个虚拟扬声器与电子设备100间的距离,为构建该虚拟扬声器所选用的基准对应的距离。举例来说,继续参阅图14,最终确定出的虚拟扬声器为VSP1’,且是以扬声器SP1与用户U11(即电子设备100)间的距离d1为基准,因此,虚拟扬声器VSP1’与用户U11(即电子设备100)间的距离为d1。
S1704、电子设备100基于M个第二距离,从M个虚拟扬声器中筛选出目标扬声器,目标扬声器与电子设备100间的距离最短。
本实施例中,电子设备100可以对M个第二距离进行排序,比如由大到小或者由小到大 排序等,并从中挑选出最小的一个第二距离,以及将该最小的第二距离对应的虚拟扬声器作为目标扬声器。
在一些实施例中,目标扬声器也可以是其他的虚拟扬声器,比如,与电子设备100距离最远的虚拟扬声器等,具体可根据实际情况而定,此处不做限定。
S1705、电子设备100以其与目标扬声器间的距离为基准,并基于第一虚拟扬声器组,构建第二虚拟扬声器组,其中,第二虚拟扬声器组为将第一虚拟扬声器组中的M个虚拟扬声器均虚拟至以电子设备100为中心,且以电子设备100与目标扬声器间的距离为半径的圆上得到的虚拟扬声器的组合。
本实施例中,电子设备100可以选择以其与目标扬声器间的距离为基准,并根据该基准距离和其他的虚拟扬声器与电子设备100间的距离,确定出其他的虚拟扬声器对应的音频信号所需调整的增益,以将其他的虚拟扬声器均调整至以电子设备100与目标扬声器间的距离为半径的圆上,从而构建出第二虚拟扬声器组。在一些实施例中,若电子设备100与目标扬声器间的距离为d1,电子设备100与除目标扬声器之外的其中一个虚拟扬声器间的距离为d2,则该其中一个虚拟扬声器对应的音频信号所需调整的增益gi=d2/d1。另外,在确定其他扬声器对应的音频信号所需调整的增益时,电子设备100也可以选用其他的线性模型进行确定,例如,上述的其中一个扬声器对应的音频信号所需调整的增益可以为gi=Q(d1/d2)+P,其中,Q和P均为常数,具体可根据实际情况而定,此处不做限定。
在根据第一虚拟扬声器组,构建第二虚拟扬声器组的过程中,可以记录第一虚拟扬声器组中每个虚拟扬声器对应的音频信号所需调整的增益,以得到第二增益集合。
S1706、电子设备100控制第二虚拟扬声器组播放音频数据。
本实施例中,电子设备100构建出第二虚拟扬声器组后,可以控制该第二虚拟扬声器组播放音频数据。其中,第二虚拟扬声器组所播放的音频数据可以根据前述确定出的第一增益集合和第二增益集合,对音频数据中的各个声道的增益进行调整得到。
举例来说,如图18的(A)所示,以空间中布置的三个扬声器SP1、SP2和SP3,且扬声器SP3是用户U11(即电子设备100)朝向上的一个扬声器,以及需要由扬声器SP1和SP2虚拟出另一个所需的扬声器。如图18的(B)所示,在构建第一虚拟扬声器组时,若以d1为基准,可以将扬声器SP2虚拟至圆C1上,即得到扬声器SP2’。接着,如图18的(C)所示,可以由扬声器SP1和SP2’虚拟出一个虚拟扬声器VSP1,此时即构建出第一虚拟扬声器组中的两个扬声器,即扬声器SP3和虚拟扬声器VSP1。接着,如图18的(D)所示,在构建第二虚拟扬声器组时,可以以d3为基准,将虚拟扬声器VSP1’虚拟至圆C2上,从而构建出第二扬声器组中的两个扬声器,即扬声器SP3和虚拟扬声器VSP1’。
在图18的(B)中,假设确定出的扬声器SP2对应的音频信号所需调整的增益为g1,由于是以d1为基准,因此可以不用调整扬声器SP1对应的音频信号的增益,此时可以将扬声器SP1对应的音频信号所需调整的增益设定为g0。其中,当g0的单位为分贝(decibel,DB)时,其取值可以为0;当g0的单位是标准化值(比如放大倍数等)时,其取值可以为1。在图18的(C)中,假设确定出的扬声器SP1所需调整的增益为g2,扬声器SP2’所需调整的增益为g3。因此,在构建第一虚拟扬声器组时,由图18的(B)和(C)得到的第一增益集合为:扬声器SP1对应的音频信号所需调整的增益为(g0*g2),或者,(g0+g2);扬声器SP2对应的音频信号所需调整的增益为gi=g1*g3,或者,gi=g1+g3等。其中,当所需调整的增益的单位为分贝时,可以采用相加的方式,当所需调整的增益的单位为标准化值时,可以采用 相乘的方式。
在图18的(D)中,假设确定出的虚拟扬声器VSP1对应的音频信号所需调整的增益为g4,由于是以d3基准,因此可以不用调整扬声器SP3对应的音频信号的增益,此时可以将扬声器SP3对应的音频信号所需调整的增益设定为g0。因此,在构建第二虚拟扬声器组时,由图18的(D)得到的第二增益集合为:扬声器SP3对应的音频信号所需调整的增益为g0,虚拟扬声器VSP1对应的音频信号所需调整的增益为g4。
在图18的(D)中,虚拟扬声器VSP1’等效于是先将扬声器SP1和SP2虚拟至圆C2上,然后再由这两个扬声器虚拟出虚拟扬声器VSP1’。由于扬声器SP1、SP2’、VSP1是在同一个圆C1上,基于将三者虚拟至圆C2上时,三者对应的声道所需调整的增益相等。因此,由图18的(D)得到的第二增益集合,可以确定出在构建第二虚拟扬声器组时,虚拟扬声器VSP1对应的真实的扬声器对应的音频信号所需调整的增益,且两个真实的扬声器(即扬声器SP1和SP2)对应的声道所需调整的增益也为g4。
进一步地,由第一增益集合和第二增益集合,可以确定出扬声器SP1对应的音频信号最终所需调整的增益为(g0+g2+g4)或者(g0*g2*g4),扬声器SP2对应的音频信号最终所需调整的增益为(gi+g4)或者(gi*g4),扬声器SP3对应的音频信号所需调整的增益为g0。其中,当所需调整的增益的单位为分贝时,可以通过将各个增益相加得到最终所需的调整的增益,当所需调整的增益的单位为标准化值时,可以通过将各个增益相加得到最终所需的调整的增益。
最后,电子设备100可以基于确定出各个真实扬声器对应的音频信号所需调整的增益,对音频数据中相应的声道的增益进行调整,以得到所需的音频数据,并将相应的声道对应的信号发送至相应的扬声器,从而在听感上使得声音就像是通过第二虚拟扬声器组播放产生的一样。
这样,用户感知到的声音近似是在其身边产生,从而使得用户可以随时随地的享受到空间环绕声。
示例性的,图19示出了本申请一些实施例中的一种声音处理方法的流程。在图19中,电子设备100与各个扬声器之间可以但不限于通过蓝牙建立连接。在图19中,扬声器中所播放的音频信号可以是电子设备100中的音频信号,也可以是其他设备中的音频信号,此处不做限定。在图19中,用户的移动区域可以是构建空间环绕声的扬声器所围成的区域,比如:图10的(A)中扬声器201至205所围成的区域等,也可以是其他的区域,比如:图10的(A)中扬声器201至205所围成的区域之外的区域,此处不做限定。另外,图19中所示的方法可以实时的执行,也可以在满足一定条件时在执行,比如,当检测到用户移动的距离大于一定阈值时在执行,此处不做限定。如图19所示,该声音处理方法包括以下步骤:
S1901、电子设备100以其朝向为基准,从N个扬声器中筛选出K个扬声器,K个扬声器用于构建空间环绕声。
本实施例中,电子设备100可以先在其朝向上确定出一个扬声器,然后,再基于预先设定的构建空间环绕声所需的扬声器的布置方式(比如5.1.X或7.1.X的布置方式等)依次确定出剩余的所需的扬声器,从而得到K个扬声器。
在一些实施例中,当电子设备100的朝向上存在一个扬声器,或者,在其朝向上的预设角度范围内存在一个扬声器时,可以将该扬声器作为所需的扬声器。
当电子设备100的朝向上不存在扬声器,或者,在其朝向上的预设角度范围内不存在扬 声器时,可以由该朝向上左右相邻的两个扬声器作为所需的扬声器。
进一步地,在确定剩余的所需的扬声器时,可以参考在电子设备100的朝向上确定所需的扬声器的过程,此处不再赘述。
S1902、电子设备100基于K个扬声器,构建虚拟扬声器组,其中,虚拟扬声器组为将K个扬声器均虚拟至以电子设备100为中心的圆上得到的虚拟扬声器的组合。可选的,可以以电子设备100与K个扬声器中的一个扬声器间的距离为半径。其中,构建虚拟扬声器组的过程可以参见前述图11或图17中的描述,此处不再赘述。
S1903、电子设备100控制虚拟扬声器组播放音频数据。其中,电子设备100控制播放音频数据的过程详见前述图11或图17中的描述,此处不再赘述。
这样,用户感知到的声音近似是在其身边产生,从而使得用户可以随时随地的享受到空间环绕声。
在一些实施例中,电子设备100可以基于确定出的各个扬声器对应的音频信号所需调整的增益,分别向各个扬声器发送用于调整音量的指示信息。示例性的,可以预先设定扬声器对应的音频信号所需调整的增益与音量的调整值间映射关系,当确定出扬声器对应的音频信号所需调整的增益后,电子设备100可以查询该映射关系确定出该扬声器的音量的调整值,进而向该扬声器发送指示信息,该指示信息中可以包括音量的调整值。
在一些实施例中,电子设备100还可以控制与虚拟扬声器组无关的各个真实的扬声器所播放的音频信号的响度降至低于预设响度值,以便降低这些扬声器的干扰,且使得后续在需要使用到这些扬声器时,不会出现卡顿的情况。例如,电子设备100可以控制与虚拟扬声器组无关的各个真实的扬声器将音量调整至最低,或者,将这些扬声器对应的音频信号所需调整的增益调整至最低等。当然,电子设备100也可以控制与虚拟扬声器组无关的各个真实的扬声器暂停工作。
需要说明的是,上述实施例中所描述的方法,除了可以对位于空间中水平方向上的扬声器进行处理外,还可以对其他方向上的扬声器进行处理,以构造出相应的环绕声。比如可以对布置在空间顶部的扬声器进行处理,处理方式可以参考前述的方式,此处就不再一一赘述。
3.2、在空间中配置有多个扬声器,且电子设备可以产生画面(比如:用户使用电子设备观看影片等),以及,电子设备通过空间中布置的扬声器播放其上的音频数据。
示例性的,图20的(A)示出了本申请一些实施例中的一种应用场景。如图20的(A)所示,在车辆200中配置有6个扬声器,即扬声器SP1、SP2、SP3、SP4、SP5和SP6。用户U11在车辆200的右后座上使用电子设备100观看影片,且电子设备100与车辆200通过蓝牙等短距通信方式建立连接。电子设备100中的音频数据可以通过车辆200中的扬声器播放,从而获得更好的听感。
图20的(B)示出了本申请一些实施例中的另一种应用场景。如图20的(B)所示,按照一定的要求(比如,5.1.X等)在房间内的固定位置配置扬声器,即扬声器SP1、SP2、SP3、SP4和SP5。用户U11在房间中的座椅上使用电子设备100观看影片,且电子设备100与房间中的扬声器间通过蓝牙等短距通信方式建立连接。电子设备100中的音频数据可以通过房间中的扬声器播放,从而获得更好的听感。
图20的(C)示出了本申请一些实施例中的又一种应用场景。如图20的(C)所示,在房间中配置有扬声器SP1、SP2、SP3和SP4,以及配置有投影设备400。用户U11在房间中的 座椅上,可以使用投影设备400将电子设备100中的影片等内容投影到墙体500上。电子设备100可以与房间中的扬声器间通过蓝牙等短距通信方式建立连接。电子设备100中的音频数据可以通过房间中的扬声器播放,从而获得更好的听感。
在图20的(A)和(B)所示的场景下,用户U11利用电子设备100外部的扬声器播放电子设备100上的音频数据时,当电子设备100与其外部的扬声器的位置不协调时,常会出现电子设备100所显示的画面与扬声器所播放的音频数据不同步的情况。在图20的(C)所示的场景下,用户U11观看的画面是在墙体500上显示,而声音是通过房间中的扬声器播放,且墙体500上所显示的画面的位置与扬声器的位置往往不协调,因此墙体500上所显示的画面与扬声器所播放的音频数据常会存在不同步的情况。
为解决这一问题,本申请实施例中提供了一种声音处理方法,可以基于空间中布置的扬声器,在电子设备100(或者基于电子设备100产生的画面)的周围构建出一个包含有至少一个虚拟扬声器的虚拟扬声器组,使得电子设备100中的音频数据可以由该虚拟扬声器组播放,进而解决音画不同步的问题,提升用户的听感和视感一致性体验。
可以理解的是,在图20的(A)和(B)所示的场景下,还可以配置有摄像头300。摄像头300可以采集用户U11和电子设备100在空间中的图像,以由采集到的图像确定出用户U11的头部与电子设备100在空间中的位置,和/或,电子设备100与各个扬声器间的距离等。另外,在图20的(C)所示的场景下,也可以配置有摄像头300。摄像头300可以采集用户U11、电子设备100和基于电子设备100产生的画面在空间中的图像,以由采集到的图像确定出用户U11的头部与电子设备100在空间中的位置,电子设备100与各个扬声器间的距离,基于电子设备100产生的画面与各个扬声器间的距离,或者,基于电子设备100产生的画面的位置等。
在一些实施例中,摄像头300可以与用于控制各个扬声器的控制器(图中未示出)通过有线网络或无线网络(比如蓝牙等)建立连接,这样,摄像头300可以将其采集到的图像传输至控制器,以由控制器对图像进行处理,比如将图像输入至预先训练好的图像处理模型,以由该模型输出用户U11的头部与电子设备100在空间中的位置,和/或,电子设备100与各个扬声器间的距离等。示例性的,图像处理模型可以但不限于是基于卷积神经网络(convolutional neural network,CNN)训练得到。在另一些实施例中,摄像头300可以与电子设备100通过无线网络(比如蓝牙等)建立连接,这样,摄像头300可以将其采集到的图像传输至电子设备100,以由电子设备100对图像进行处理,比如将图像输入至预先训练好的图像处理模型,以由该模型输出用户U11的头部与电子设备100在空间中的位置,电子设备100与各个扬声器间的距离,基于电子设备100产生的画面与各个扬声器间的距离,或者,基于电子设备100产生的画面的位置等。
在一些实施例中,电子设备100可以与各个扬声器通过无线网络(比如蓝牙等)建立连接。此时,除了可以通过摄像头300采集到的图像确定出电子设备100在空间中的位置,和/或,电子设备100与各个扬声器间的距离等外,还可以基于电子设备100与各个扬声器之间的无线通信信号确定,例如:可以通过基于接收信号的强度指示(received signal strength indication,RSSI)测距方法,确定电子设备100在空间中的位置,和/或,电子设备100与各个扬声器间的距离。应理解的,确定用户A与各个扬声器间的距离的执行主体可以是电子设备100,也可以是用于控制各个扬声器的控制器(图中未示出),也可以是位于图1所示场景中的其他设备,此处不做限定。示例性的,当电子设备100为执行主体确定电子设备100 与扬声器的距离时,可以通过前述3.1的场景中所描述的“公式一”确定电子设备100与某个扬声器之间的距离。另外,当用于控制各个扬声器的控制器为执行主体确定电子设备100与扬声器的距离时,可以参考电子设备100为执行主体确定电子设备100与扬声器的距离的方式,此处不再赘述。
此外,在确定出电子设备100与各个扬声器间的距离后,可以基于电子设备100与至少三个扬声器间的距离,确定出电子设备100所在的位置。举例来说,如图24所示,若电子设备100与扬声器SP1的距离为d1,与扬声器SP2的距离为d2,与扬声器SP3的距离为d3,由于扬声器SP1、SP2和SP3的位置是已知且固定的,因此可以分别以各个扬声器所在的位置为中心,并以相应的扬声器与电子设备100间的距离为半径画圆,这三个圆的交点(即图中的位置E)即为电子设备100的位置。
接下来,基于上文所描述的内容,对本申请实施例提供的一种声音处理方法进行详细介绍。
示例性的,图22示出了本申请一些实施例中的一种声音处理方法的流程。在图22中,电子设备100与各个扬声器之间可以但不限于通过蓝牙建立连接。图22中所示的方法可以但不限于应用在图20的(A)或(B)所示的场景下。图22中所示的方法的执行主体可以是电子设备100。如图22所示,该声音处理方法包括以下步骤:
S2201、电子设备100确定其在目标空间中的目标位置,目标空间中配置有至少一个扬声器。
本实施例中,电子设备100可以基于用户所处的空间中摄像头采集到的图像,确定其在目标空间中的位置,也可以由其与各个扬声器间的无线通信信号,确定其在目标空间中的位置。
S2202、电子设备100根据目标位置,构建与目标空间匹配的虚拟空间,虚拟空间的体积小于目标空间的体积。
本实施例中,电子设备100可以将目标位置置于预先设定的空间模型中,并在空间模型中将目标位置与目标空间中某个部件或区域相关联,即在空间模型中将目标位置作为目标空间中某个部件或区域的位置,以构建出与目标空间匹配的虚拟空间。其中,虚拟空间可以理解为是一个小型化的目标空间。示例性的,该虚拟空间可以将目标空间按一定比例缩小形成。该虚拟空间可以是预先设定,且能够将用户围绕在其中的空间。例如,在图20的(A)所示的场景中,空间模型可以为一个小型的虚拟车辆,在该虚拟车辆中可以将目标位置置于车辆200中车机的显示屏的位置。在图20的(B)所示的场景中,空间模型可以为一个小型的虚拟房间,在该虚拟房间中可以将目标位置置于房间中用户U11正前方的墙体的位置。
S2203、电子设备100根据目标空间中各个扬声器的位置,在虚拟空间中构建虚拟扬声器组,虚拟扬声器组中包括与目标空间中各个扬声器对应的虚拟扬声器。
本实施例中,电子设备100可以基于虚拟空间与目标空间的比例,在虚拟空间出确定出与目标空间中的各个扬声器对应的虚拟扬声器的位置。
举例来说,以图20的(A)所示的场景为例,如图23所示,此时虚拟空间为虚拟车辆41,电子设备100所处的位置为虚拟车辆2301中车机的显示屏的位置。在车辆200中车机的显示屏与各个扬声器间的距离和角度均是固定的。若虚拟车辆2301与车辆200间的比例为1:10,在车辆200中扬声器SP1与车机的显示屏210间的距离为d1,角度为α,则可以在虚拟车辆2301中,在与电子设备100相距d1/10,且角度为α的位置处,布置一个虚拟扬声器 VSP1。对于在虚拟车辆2301中布置其他的虚拟扬声器的方式,可以参考布置虚拟扬声器VSP1的方式,此处不再赘述。
在虚拟空间中确定出各个虚拟扬声器的位置后,可以由虚拟扬声器与目标空间中扬声器间的距离,确定出目标空间中扬声器对应的音频信号所需调整的增益,并可以对目标空间中各个扬声器对应的音频信号的增益进行调整,以构建出虚拟扬声器组,从而将目标空间中的各个扬声器映射到虚拟空间中。其中,虚拟扬声器组中包括与目标空间中各个扬声器对应的虚拟扬声器。示例性的,虚拟扬声器可以理解为是虚拟的扬声器,车辆200中配置的扬声器可以理解为是真实的扬声器。
在一些实施例中,电子设备100可以通过目标空间中扬声器与虚拟空间中虚拟扬声器间的距离和预先设定的距离模型,确定出各个扬声器对应的音频信号所需调整的增益。
举例来说,以图20的(A)所示的场景为例,继续参阅图23,若预先设定的距离模型为g=k*d+b,g为扬声器对应的音频信号所需调整的增益,k和b为常数,d为虚拟扬声器与真实的扬声器间的距离。若虚拟扬声器VSP1和扬声器SP1间的距离为d2,则扬声器SP1对应的音频信号所需调整的增益为g1=k*d2+b,此时,电子设备100对其待播放的音频数据中扬声器SP1对应的音频信号的增益进行调整,且调整值为g1,即可以将扬声器SP1映射至虚拟车辆41中,从而在虚拟车辆41中构建出与扬声器SP1对应的虚拟扬声器。利用剩余的各个扬声器与相应的虚拟扬声器间的距离和距离模型,可以在虚拟车辆41中构建出与剩余的各个扬声器对应的虚拟扬声器。另外,电子设备100也可以记录下每个扬声器对应的音频信号所需调整的增益的值,并在后续再对待播放的音频数据进行调整。
S2204、电子设备100利用虚拟扬声器组播放目标音频数据,目标音频数据中各个声道的增益由在构建虚拟扬声器组过程中、基于目标空间中各个扬声器对应的音频信号所需调整的增益得到。
本实施例中,在构建出虚拟扬声器组后,电子设备100可以利用该虚拟扬声器组播放目标音频数据。例如,电子设备100可以将目标音频数据所包含的不同声道对应的音频信号传输至相应的扬声器进行播放。其中,目标音频数据中各个声道的增益由在构建虚拟扬声器组过程中、基于目标空间中各个扬声器对应的音频信号所需调整的增益得到。
这样,在用户使用外部的扬声器播放器所使用的电子设备上的音频数据时,使得用户听到的声音近似于是从电子设备上产生,且是围绕在用户周围,从而使得电子设备所播放的画面与声音同步,提升了用户的听感和视感一致性体验。
示例性的,图24示出了本申请一些实施例中的另一种声音处理方法的流程。在图24中,电子设备100与各个扬声器之间可以但不限于通过蓝牙建立连接。图24中所示的方法可以但不限于应用在图20的(A)或(B)所示的场景下。图24中所示的方法的执行主体可以是电子设备100。如图24所示,该声音处理方法包括以下步骤:
S2401、电子设备100确定其与用户的头部间的第一距离,以及确定用户的头部在目标空间中的第一位置,目标空间中配置有至少一个扬声器。
本实施例中,电子设备100可以基于用户所处的空间中摄像头采集到图像,确定其与用户的头部间的第一距离,以及确定用户的头部在目标空间中的第一位置。
S2402、电子设备100根据第一距离、第一位置和目标空间中各个扬声器的位置,构建虚拟扬声器组,虚拟扬声器组中包括与目标空间中各个扬声器对应的虚拟扬声器,各个虚拟扬 声器均处于以第一位置为圆心且以第一距离为半径的圆上。
本实施例中,电子设备100可以以第一距离为半径,并以第一位置为圆心构建一个圆,以及将目标空间中的各个扬声器均虚拟至该圆上。在一些实施例中,电子设备100可以基于第一位置与各个扬声器的位置间的距离,将目标空间中的各个扬声器均虚拟至其构建的圆上。示例性的,可以预先设定有距离模型,将第一位置与各个扬声器的位置间的距离输入至该距离模型中,可以得到各个扬声器对应的音频信号所需调整的增益,并可以对目标空间中各个扬声器对应的音频信号的增益进行调整,以构建出虚拟扬声器组。
举例来说,以图20的(A)所示的场景为例,请参阅图25,若预先设定的距离模型为g=k*d+b,g为扬声器对应的音频信号所需调整的增益,k和b为常数,d为用户的头部的位置与真实的扬声器间的距离。若用户U11的头部的位置与车辆200中的扬声器SP1间的距离为d1,则该扬声器SP1对应的音频信号所需调整的增益为g1=k*d1+b,此时,电子设备100对其待播放的音频数据中该扬声器SP1对应的音频信号的增益进行调整,且调整值为g1,即可以将该扬声器SP1虚拟至其构建的圆上,即得到虚拟扬声器VSP1。基于同样的实现方式,电子设备100可以将车辆200中的其他的扬声器虚拟到其构建的圆上,即构建出虚拟扬声器组。
S2403、电子设备100利用虚拟扬声器组播放目标音频数据,目标音频数据中各个声道的增益由在构建虚拟扬声器组过程中基于目标空间中各个扬声器对应的音频信号所需调整的增益得到。
本实施例中,在构建出虚拟扬声器组后,电子设备100可以利用该虚拟扬声器组播放目标音频数据。例如,电子设备100可以将目标音频数据所包含的不同声道对应的音频信号传输至相应的扬声器进行播放。其中,目标音频数据中各个声道的增益由在构建虚拟扬声器组过程中基于目标空间中各个扬声器对应的音频信号所需调整的增益得到。
在一些实施例中,在确定出虚拟扬声器组后,还可以利用向量基础振幅平移(vector base amplitude panning,VBAP)算法,从虚拟扬声器组中构建出另一个虚拟扬声器组。其中,构建虚拟扬声器组的方式可以参见前述3.1的场景中的相关描述,此处不再赘述。最新构建出的虚拟扬声器组中可以由M个虚拟扬声器组成,M的值与构建空间环绕声所需的扬声器的数量相等,且虚拟扬声器组中各个虚拟扬声器的布置方式与构建空间环绕声所需的扬声器的布置方式相同。在构建出最新的虚拟扬声器组后,电子设备100可以利用该虚拟扬声器组播放目标音频数据。这样,用户可以享受到空间环绕声,提升了用户体验。
这样,在用户使用外部的扬声器播放器所使用的电子设备上的音频数据时,使得用户听到的声音近似于是从电子设备上产生,且是围绕在用户周围,从而使得电子设备所播放的画面与声音同步,提升了用户的听感和视感一致性体验。
示例性的,图26示出了本申请一些实施例中的又一种声音处理方法的流程。在图26中,电子设备100与各个扬声器之间可以但不限于通过蓝牙建立连接。图26中所示的方法可以但不限于应用在图20的(A)或(B)所示的场景下。图26中所示的方法的执行主体可以是音响设备控制系统,该系统可以用于控制各个扬声器。在图26中,S802和S803可以参见图22中S2202和S2203中的描述;另外,在S2603中音响设备控制系统是记录每个扬声器对应的音频信号所需调整的增益,而在S2203中电子设备100既可以记录每个扬声器对应的音频信号所需调整的增益,也可以直接对待播放的音频数据中相应的声道的增益进行调整。如图26 所示,该声音处理方法包括以下步骤:
S2601、音响设备控制系统确定电子设备100在目标空间中的目标位置,目标空间中配置有至少一个扬声器。
本实施例中,音响设备控制系统可以基于用户所处的空间中摄像头采集到图像,确定电子设备100在目标空间中的位置,也可以由电子设备100与各个扬声器间的无线通信信号,确定电子设备100在目标空间中的位置。
S2602、音响设备控制系统根据目标位置,构建与目标空间匹配的虚拟空间,虚拟空间的体积小于目标空间的体积。
S2603、音响设备控制系统根据目标空间中各个扬声器的位置,在虚拟空间中构建虚拟扬声器组,虚拟扬声器组中包括与目标空间中各个扬声器对应的虚拟扬声器。
S2604、音响设备控制系统获取电子设备100发送的目标音频数据,以及利用各个扬声器对应的音频信号所需调整的增益,对目标音频数据中的各个声道的增益进行调整,并播放调整后的目标音频数据。
本实施例中,音响设备控制系统获取到电子设备100发送的目标音频数据后,可以根据其构建虚拟扬声器组过程中记录的各个扬声器对应的音频信号所需调整的增益,对目标音频数据中各个声道的增益进行调整,并播放调整后的目标音频数据。
这样,在用户使用外部的扬声器播放器所使用的电子设备上的音频数据时,使得用户听到的声音近似于是从电子设备上产生,且是围绕在用户周围,从而使得电子设备所播放的画面与声音同步,提升了用户的听感和视感一致性体验。
示例性的,图27示出了本申请一些实施例中的再一种声音处理方法的流程。在图27中,电子设备100与各个扬声器之间可以但不限于通过蓝牙建立连接。图27中所示的方法可以但不限于应用在图20的(A)或(B)所示的场景下。图27中所示的方法的执行主体可以是音响设备控制系统,该系统可以用于控制各个扬声器。在图27中,S902可以参见图24中S2402的描述,S2703可以参见图26中S2604的描述。如图27所示,该声音处理方法包括以下步骤:
S2701、音响设备控制系统确定电子设备100与用户的头部间的第一距离,以及确定用户的头部在目标空间中的第一位置,目标空间中配置有至少一个扬声器。
本实施例中,音响设备控制系统可以基于用户所处的空间中摄像头采集到图像,确定电子设备100与用户的头部间的第一距离,以及确定用户的头部在目标空间中的第一位置。
S2702、音响设备控制系统根据第一距离、第一位置和目标空间中各个扬声器的位置,构建虚拟扬声器组,虚拟扬声器组中包括与目标空间中各个扬声器对应的虚拟扬声器,各个虚拟扬声器均处于以第一位置为圆心且以第一距离为半径的圆上。
S2703、音响设备控制系统获取电子设备100发送的目标音频数据,以及利用各个扬声器对应的音频信号所需调整的增益,对目标音频数据中的各个声道的增益进行调整,并播放调整后的目标音频数据。
在一些实施例中,在确定出虚拟扬声器组后,还可以利用向量基础振幅平移(vector base amplitude panning,VBAP)算法,从虚拟扬声器组中构建出另一个虚拟扬声器组。其中,构建虚拟扬声器组的方式可以参见前述3.1的场景中的相关描述,此处不再赘述。最新构建出的虚拟扬声器组中可以由M个虚拟扬声器组成,M的值与构建空间环绕声所需的扬声器的数 量相等,且虚拟扬声器组中各个虚拟扬声器的布置方式与构建空间环绕声所需的扬声器的布置方式相同。在构建出最新的虚拟扬声器组后,音响设备控制系统可以利用该虚拟扬声器组播放目标音频数据。这样,用户可以享受到空间环绕声,提升了用户体验。
这样,在用户使用外部的扬声器播放器所使用的电子设备上的音频数据时,使得用户听到的声音近似于是从电子设备上产生,且是围绕在用户周围,从而使得电子设备所播放的画面与声音同步,提升了用户的听感和视感一致性体验。
示例性的,图28示出了本申请一些实施例中的再一种声音处理方法的流程。在图28中,电子设备100与各个扬声器之间可以但不限于通过蓝牙建立连接。图28中所示的方法可以但不限于应用在图20的(C)所示的场景下。图28中所示的方法的执行主体可以是电子设备100。在图28中,S2803至S2804可以参考前述图22中S2203至S2204中的描述,此处不再赘述。如图28所示,该声音处理方法包括以下步骤:
S2801、电子设备100确定基于其产生的画面在目标空间中所处的目标位置,目标空间中配置有至少一个扬声器。
本实施例中,电子设备100可以通过摄像头拍摄的图像获取其产生的画面在目标空间中所处的目标位置。另外,用户也可以预先在电子设备100中出配置目标位置,具体可根据实际情况而定,此处不做限定。
S2802、电子设备100根据目标位置,构建与目标空间匹配的虚拟空间,虚拟空间的体积小于目标空间的体积。
本实施例中,电子设备100可以将目标位置置于预先设定的空间模型中,并在空间模型中将目标位置与目标空间中某个部件或区域相关联,即在空间模型中将目标位置作为目标空间中某个部件或区域的位置,以构建出与目标空间匹配的虚拟空间。其中,虚拟空间可以理解为是一个小型化的目标空间。示例性的,该虚拟空间可以将目标空间按一定比例缩小形成。该虚拟空间可以是预先设定的空间。例如,在图20的(C)所示的场景中,空间模型可以为一个小型的虚拟房间,在该虚拟房间中可以将目标位置置于房间中用户U11正前方的墙体500上的某个位置。
S2803、电子设备100根据目标空间中各个扬声器的位置,在虚拟空间中构建虚拟扬声器组,虚拟扬声器组中包括与目标空间中各个扬声器对应的虚拟扬声器。
S2804、电子设备100利用虚拟扬声器组播放目标音频数据,目标音频数据中各个声道的增益由在构建虚拟扬声器组过程中、基于目标空间中各个扬声器对应的音频信号所需调整的增益得到。
这样,在用户使用投影设备观看电子设备上的画面,且使用外部的扬声器播放器所使用的电子设备上的音频数据时,使得用户听到的声音近似于是从投影设备投影出的画面上产生,从而使得基于电子设备产生的画面与声音同步,提升了用户的听感和视感一致性体验。
示例性的,图29示出了本申请一些实施例中的再一种声音处理方法的流程。在图29中,电子设备100与各个扬声器之间可以但不限于通过蓝牙建立连接。图29中所示的方法可以但不限于应用在图20的(C)所示的场景下。图29中所示的方法的执行主体可以是音响设备控制系统,该系统可以用于控制各个扬声器。在图29中,S2901至S2903可以参考前述图28中S2801至S2803中的描述,图29中的S2904可以参考前述图26中的S2604中的描述,此 处不再赘述。如图29所示,该声音处理方法包括以下步骤:
S2901、音响设备控制系统确定基于电子设备100产生的画面在目标空间中所处的目标位置,目标空间中配置有至少一个扬声器。
S2902、音响设备控制系统根据目标位置,构建与目标空间匹配的虚拟空间,虚拟空间的体积小于目标空间的体积。
S2903、音响设备控制系统根据目标空间中各个扬声器的位置,在虚拟空间中构建虚拟扬声器组,虚拟扬声器组中包括与目标空间中各个扬声器对应的虚拟扬声器。
S2904、音响设备控制系统获取电子设备100发送的目标音频数据,以及利用各个扬声器对应的音频信号所需调整的增益,对目标音频数据中的各个声道的增益进行调整,并播放调整后的目标音频数据。
这样,在用户使用投影设备观看电子设备上的画面,且使用外部的扬声器播放器所使用的电子设备上的音频数据时,使得用户听到的声音近似于是从投影设备投影出的画面上产生,从而使得基于电子设备产生的画面与声音同步,提升了用户的听感和视感一致性体验。
在一些实施例中,当存在用户与扬声器间的距离大于预设距离阈值,和/或,用户与基于电子设备100所产生的画面间的距离大于预设距离阈值时,还可以分别确定目标空间中各个扬声器对应的时延,以便用户看到的画面和听到的声音相匹配,从而提升用户体验。示例性的,可以以用户与基于电子设备100所产生的画面间的目标距离为基准,并由该目标距离确定出目标空间中各个扬声器的时延。例如,若目标距离为d1,电子设备100与目标空间中一个扬声器间的距离为d2,则该扬声器的时延delay=(d2-d1)/v,其中,v为声音在空气中的传播速度。示例性的,在图20的(A)或(B)所示的场景中,用户与基于电子设备100所产生的画面间的距离可以为:用户U11和电子设备100间的距离;在图20的(C)所示的场景中,用户与基于电子设备100所产生的画面间的距离可以为:用户U11与墙体500间的距离,该距离可以但不限于通过房间中的摄像头300获得。示例性的,当计算得到的某个扬声器的时延delay为正数时,表明该扬声器离用户更远,所以此时可以控制该扬声器提前播放,比如:提前的时间可以为确定出的时延delay;当计算得到的某个扬声器的时延delay为负数时,表明该扬声器离用户更近,所以此时可以控制该扬声器延迟播放,比如:延迟的时间可以为确定出的时延delay的绝对值。
在确定出各个扬声器对应的时延后,电子设备100或者音响设备控制系统可以控制各个扬声器按照对应的时延提前或延迟播放音频数据。由此以使得用户看到的画面和听到的声音相匹配,从而提升用户体验。
进一步地,还可以从确定出的目标设备产生的画面与目标空间中的各个扬声器间的距离中,选取一个距离作为基准距离;并根据该基准距离,确定目标设备产生的画面的出现时间。由此以提升音画同步的效果。示例性的,该基准距离可以为确定出的目标设备产生的画面与目标空间中的各个扬声器间的距离中的最大的一个距离。示例性的,可以基于该基准距离和声音的传播速度,确定出产生的画面相对于该基准距离对应的扬声器产生的声音出现的延时时间;然后,在控制目标设备在该基准距离对应的扬声器播放相应的音频数据的时刻之后,且达到该延时时间时,在显示出相应的画面。例如,若确定出的延时时间为3s,该基准距离对应的扬声器播放相应的音频数据的时刻为t,则目标设备产生的画面出现的时刻为(t+3)。
4、控制新能源车辆加速行驶的场景。
一般地,用户在使用新能源车辆(以下简称“车辆”),车辆可以根据其自身的行驶状态循环播放声浪声音。比如:车辆在加速时可以将其扬声器的音量逐步增加至最大值,在减速时可以将其扬声器的音量逐步降低至最小值。但这种方式,声浪声音仅有音量变化,而没有在空间上的变化,即没有形成空间化声浪,这使得车辆所播放的声浪声音与真实驾驶状态相符相差较大。
另外,还可以通过将声浪音频切分成以毫秒为单位的极短片段,并根据车辆的速度等,选取对应的片段。以及,将选取的片段进行叠加合成,并播放合成后的数据,以还原真实的声浪效果。但这种方式,声浪声音仍然是仅有音量变化,而没有在空间上的变化,即没有形成空间化声浪,用户体验较差。
为解决上述问题,本申请实施例提供了一种声音处理方法,该方法可以在用户使用车辆过程中,使声浪声音产生空间上的变化,以使车辆的内部出现多普勒效应,从而使得车辆所播放的声浪声音与真实驾驶状态相符,进而使得听感更真实,提升了用户体验。
示例性的,图30示出了一种车辆的硬件结构。如图30所示,该车辆200中可以配置有电子设备100和扬声器210。其中,电子设备100可以将声浪声音传输至扬声器210,以通过扬声器210播放。示例性的,电子设备100可以但不限于为车载终端。扬声器210的数量和位置可以根据需求配置,此处不做限定。
另外,车辆200中还可以配置有其正常运行所必须的部件,比如各类传感器等,此处不做限定。在一些实施例中,车辆200中可以配置有用于感知车辆运动状态的传感器,比如:速度传感器、加速度传感器等。
示例性的,图31示出了一种声音处理方法。可以理解的是,该方法可以但不限于由车辆中配置的电子设备(比如车载终端等)执行。如图31所示,该声音处理方法可以包括以下步骤:
S3101、电子设备100确定车辆200当前的行驶参数,行驶参数包括行驶速度、转速和加速踏板的开度中的一项或多项。
本实施例中,车辆200中的传感器感知到车辆200的行驶参数后,可以将该行驶参数传输至电子设备100。
S3102、电子设备100根据行驶参数,确定与行驶速度对应的第一音频数据。
本实施例中,电子设备100可以根据行驶参数和预配置的原始音频数据,确定出与行驶参数对应的第一音频数据。
示例性的,电子设备100可以先获取到由原始音频数据得到的音频粒子。其中,每个音频粒子均可以与车辆的一个行驶速度相对应。示例性的,音频粒子可以理解为是将原始音频数据分成极短片段(比如以毫秒为单位的片段等)所形成的数据。示例性的,原始音频数据可以默认的音频数据,也可以是由用户自行选择的音频数据,此处不做限定。当原始音频数据是由用户自行选择的音频数据时,在电子设备100上可以配置有选择入口,以供用户进行选择。
然后,电子设备100可以由行驶参数和音频粒子间的映射关系,确定出在当前的行驶参数对应的音频粒子。最后,再利用车辆200当前的加速度,对确定出的音频粒子进行伸缩变换,以调整音频粒子的数据长度,从而使得音频粒子的播放速度与当前的驾驶状态相匹配。 其中,第一音频数据为进行伸缩变换后的音频粒子。
举例来说,当行驶参数为行驶速度时,若行驶速度和音频粒子间的映射关系为:速度为a1时,音频粒子为音频粒子b1;速度为a2时,音频粒子为音频粒子b2。当在t1时刻确定出的车辆的行驶速度为a2时,由行驶速度和音频粒子间的映射关系可以确定出当前所需的音频粒子为音频粒子b2。若t0时刻确定出车辆200的行驶速度为a0,在t1时刻车辆200的加速度则为(a2-a0)/(t1-t0)。接着,可以利用确定出的加速度,查询预设的加速度和伸缩变化值之间的映射关系,确定出与当前的加速度对应的伸缩变换值。最后,可以基于该伸缩变化值,并通过时域压扩(time-scale modificatio,TSM)算法对音频粒子b2进行处理,以完成对音频粒子b2的伸缩变换,进而得到第一音频数据。
作为一种可能的实现方式,可以先利用不同的伸缩变换值,对原始音频数据进行伸缩变换。然后,再分别对伸缩变换后的音频数据进行切分。这样切分得到的音频粒子均可以与原始音频数据中的某个音频粒子对应,且,原始音频数据中的每个音频粒子均对应有一个粒子组,该粒子组中包括至少一个进行伸缩变换后的音频粒子,且在该粒子组中不同的音频粒子与不同的伸缩变化值对应。由于每个行驶速度均可以与原始音频数据中的一个音频粒子对应,因此,每个行驶速度均可以与前述的一个粒子组对应。示例性的,一个音频粒子可以与一个速度区间对应,即在该速度区间中的速度均对应同一个粒子。例如,音频粒子a对应的速度区间可以为(20km/h,25km/h)。
另外,除了先对原始音频数据进行伸缩变换,再进行切分外,还可以先对原始音频数据进行切分,然后再利用不同的伸缩变换值,对切分后得到的音频粒子进行伸缩变换。具体可根据实际情况而定,此处不做限定。
举例来说,若利用伸缩变化值x1和x2分别对原始音频数据进行伸缩变换,并对伸缩变换后的音频数据进行切分,则对于原始音频数据中的音频粒子b0,可以与利用伸缩变化值x1进行伸缩变换后的音频粒子b1和利用伸缩变化值x2进行伸缩变换后的音频粒子b2对应,此时音频粒子b0对应的粒子组由音频粒子b1和b2组成。其中,音频粒子b0、b1和b2对应的时间点相同;另外,也可以理解为,音频粒子b1是利用伸缩变化值x1对音频粒子b0进行伸缩变换得到,音频粒子b2是利用伸缩变化值x2对音频粒子b0进行伸缩变换得到。
进一步地,在确定出车辆200当前的行驶速度后,可以根据该行驶速度,确定出一个粒子组。然后,再根据当前的加速度,查询预设的加速度和伸缩变化值之间的映射关系,确定出与当前的加速度对应的伸缩变换值。最后,可以基于该伸缩变化值,查询粒子组中各个音频粒子与伸缩变化值之间的关系,可以从粒子组中确定出所需的音频粒子,该音频粒子即为第一音频数据。
当行驶参数为转速或加速踏板的开度的情形,可以参考行驶参数是行驶速度的情形,此处不再赘述。
S3103、电子设备100根据行驶参数,调整第一音频数据中各个声道的增益,以得到第二音频数据。
本实施例中,电子设备100可以根据行驶速度和预先设定的增益调整模型,确定出所需调整的增益,并对第一音频数据中各个声道的增益进行调整,以得到第二音频数据。示例性的,增益调整模型可以为线性模型,比如:y=kx+b,y为需调整的增益,k和b为常数,x为加速度。其中,线性模型中的加速度可以由行驶速度、时间和加速度之间的关系确定。此时,可以理解为是先根据行驶速度确定出车辆的加速度,然后,再根据加速度,调整第一音频数 据中各个声道的增益。
在一些实施例中,为防止出现音量突变的情况,可以设定每次增益调整的范围。当确定出的所需调整的增益超过预设的范围时,可以将预设的范围的最大值作为此次的调整的增益。
在一些实施例中,为防止出现音量忽大忽小的情况,可以设定一个调整增益的条件,比如当行驶速度的变化值超过预设速度值(比如3km/h等)时,可以调整增益,否则不调整增益。换言之,当车辆200的行驶速度的变化值超过一定速度值时,可以调整增益。
S3104、电子设备100根据行驶参数,确定声场向目标方向移动的目标速度。
本实施例中,电子设备100可以利用车辆200的行驶速度,确定出车辆200的加速度。然后,再利用确定出的加速度,查询预先设定的加速度和声场向目标方向移动的速度间的映射关系,确定出声场向目标方向移动的目标速度。在一些实施例中,目标方向可以为车辆200的前部朝向后部的方向。
S3105、电子设备100利用目标扬声器组中的扬声器播放第二音频数据,以及,根据目标速度,调整第二音频数据中各个声道的增益,目标扬声器组中包括至少两个扬声器,目标扬声器组用于控制声场在目标方向上以目标速度移动。
在一些实施例中,可以预先设定声场的初始位置,比如车辆200初始位置可以为在车辆200中位于驾驶员前方的某个位置。在播放第二音频数据时,可以按照目标速度逐渐控制声场的位置由初始位置向车辆200的后部移动。示例性的,声场的位置可以理解为用户感知到的声源的位置。
举例来说,如图32所示,以车辆200中位于驾驶员前方布置有两个扬声器,且车辆200加速行驶为例,扬声器SP1布置在驾驶员的左前方,扬声器SP2布置在驾驶员的右前方。在图32的(A)中,位置3201所处的区域可以是声场的初始位置,该初始位置可以为默认的扬声器SP1和SP2对应的音频信号的增益,且两者播放声音时声场的位置。在车辆200加速行驶过程中,可以由声场移动的目标速度确定出下一时刻声场的位置,比如在图32的(B)中的位置3202所处的区域。此时,通过调整扬声器SP1和SP2对应的音频信号的增益可以在位置3202处虚拟出一个虚拟的扬声器VSP1。同时,利用扬声器SP1对应的音频信号所需调整的增益对第二音频数据中相应的声道的增益进行调整;以及利用扬声器SP2对应的音频信号所需调整的增益对第二音频数据中相应的声道的增益进行调整,从而完成对第二音频数据中各个声道的增益的调整。接着,电子设备100可以利用扬声器SP1和SP2播放该第二音频数据。这样,驾驶员听到声音等效于是在位置3202处播放。由此即实现了声场在空间中的移动。其中,扬声器SP1和扬声器SP2即为目标扬声器组。另外,电子设备100除了基于各个扬声器对应的音频信号所需调整的增益,对第二音频数据进行处理外,还可以按照各个扬声器对应的音频信号所需调整的增益,分别调整相应的扬声器的音量,并播放第二音频数据,由此以实现声场的移动。在一些实施例中,在由多个真实的扬声器虚拟出一个扬声器时,可以但不限于利用向量基础振幅平移(vector base amplitude panning,VBAP)算法进行操作。其中,基于VBAP算法构建虚拟的扬声器的过程可以参见前述3.1的场景中的描述,此处不再赘述。另外,也可以由预先设定的距离增益模型,确定出目标扬声器组中各个扬声器对应的音频信号的增益,并基于该增益对第二音频数据进行调整,从而虚拟出一个扬声器。例如,继续参阅4的(B),若用户U11与位置3201间的距离为L1,用户U11与位置3202间的距离为L2,则在位置3202处虚拟出一个扬声器时,扬声器SP1和SP2对应的音频信号所需调整的增益可以为g=L2/L1。其中,距离增益模型为gi=x2/x1,x1为声场的初始位置与基准点之间的 距离,x2为声场的当前位置与基准点之间的距离,在图32的(B)中用户U11所处的位置为基准点。
在控制声场移动时,除了图32中所描述的方式外,还可以通过其他的方式进行移动,此处不做限定。例如,当车辆200的两侧均布置有多个扬声器时,可以分别在每侧均虚拟出一个虚拟扬声器,再利用该虚拟扬声器播放第二音频数据。
举例来说,如图33所示,以车辆200的两侧均布置有两个扬声器,且车辆200加速行驶为例,扬声器SP1布置在驾驶员的左前方,扬声器SP2布置在驾驶员的右前方,扬声器SP3布置在驾驶员的正左方,扬声器SP4布置在驾驶员的正右方。在图33的(A)中,位置3301所处的区域可以是声场的初始位置,该初始位置可以为默认的扬声器SP1、SP2、SP3和SP4对应的音频信号的增益,且两者播放声音时声场的位置。在车辆200加速行驶过程中,可以由声场移动的目标速度确定出下一时刻声场的位置,比如在图33的(B)中的位置3302所处的区域。此时,通过调整扬声器SP1和SP3对应的音频信号的增益可以在车辆200的左侧处虚拟出一个虚拟的扬声器VSP1;通过调整扬声器SP2和SP4对应的音频信号的增益可以在车辆200的右侧虚拟出一个虚拟的扬声器VSP2。其中,确定扬声器SP1、SP2、SP3和SP4对应的音频信号所需调整的增益的方式,可以参见图32中所描述的确定方式,比如,基于距离增益模型确定等,详见前述描述,此处不再赘述。进一步地,可以利用扬声器SP1对应的音频信号所需调整的增益对第二音频数据中相应的声道的增益进行调整;利用扬声器SP2对应的音频信号所需调整的增益对第二音频数据中相应的声道的增益进行调整;利用扬声器SP3对应的音频信号所需调整的增益对第二音频数据中相应的声道的增益进行调整;利用扬声器SP4对应的音频信号所需调整的增益对第二音频数据中相应的声道的增益进行调整。接着,电子设备100可以通过扬声器SP1、SP2、SP3和SP4播放第二音频数据。这样,驾驶员听到声音等效于是由虚拟的扬声器VSP1和VSP2播放。由此即实现了声场在空间中的移动。其中,扬声器SP1、扬声器SP2、扬声器SP3和扬声器SP4即为目标扬声器组。另外,电子设备100除了基于各个扬声器对应的音频信号所需调整的增益,对第二音频数据进行处理外,还可以按照各个扬声器对应的音频信号所需调整的增益,分别调整相应的扬声器的音量,并,由此以实现声场的移动。
在一些实施例中,可以先根据目标速度,确定目标音频数据的声源的虚拟位置。然后,在根据虚拟位置,从车辆中筛选出控制声场移动的扬声器。接着,可以根据该虚拟位置,确定筛选出的多个扬声器对应的音频信号的所需调整的目标增益,得到F个目标增益,F≥2。接着,可以根据F个目标增益,调整第二音频数据中各个声道的增益,以得到目标音频数据。最后,可以利用筛选出的扬声器播放该目标音频数据。其中,筛选出的扬声器即为目标扬声器组。
在一些实施例中,在控制声场移动过程中,还可以根据目标速度,用户的位置、声场的初始位置等,对第二音频数据进行多普勒处理,从而使得用户听到的声音存在声调变化的过程,提升用户体验。
由此,在用户驾驶车辆过程中,根据车辆中的扬声器控制车辆中声场的移动,使得声浪声音可以产生空间上的变化,从而使得车辆的内部可以出现多普勒效应,进而使得车辆所播放的声浪声音与真实驾驶状态相符,使得听感更真实,提升了用户体验。
在一些实施例中,为了能够带来视觉上的体验,还可以控制车辆200中氛围灯的颜色跟随车辆200的加速时长逐渐变化。例如,如图34所示,可以随着加速时长的增加,控制氛围 灯的颜色逐渐由浅色渐变为深色,比如:加速时氛围灯的颜色由淡黄逐渐变为深黄,最后变成变红等。在一些实施例中,可以控制氛围灯颜色颜色变化的速度与声场移动的目标速度相同,以使得车辆200中的空间听感和空间视感相对应,提升用户体验。在一些实施例中,车辆200中的氛围灯可以是可以呈现渐变色的灯带。
在一些实施例中,为了使得由目标音频数据产生的声音的听感更优美,还可以在不同的速度区间段添加不同的底噪(即背景噪音)。示例性的,可以在不同速度范围内选取不同的音频作为底噪混合播放。例如:在车辆的行驶速度小于50km/h时,将由音频1提取到的音频粒子作为底噪,并与目标音频数据混音播放;在车辆的行驶速度小于100km/h且大于50km/H时,将由音频2提取到的音频粒子作为底噪,并与目标音频数据混音播放。其中,音频1和音频2可以为预先设定的音频,不同的速度区间可以对应有不同的音频粒子,这些音频粒子主要是用于作为底噪使用。
在一些实施例中,前述的方法,除了可以由车辆中配置的电子设备(比如车载终端等)执行外,还可以由位于车辆中,且与车辆分离的电子设备(比如手机等)执行。当由与车辆分离的电子设备执行时,可以预先在该电子设备中配置好车辆中扬声器的布置位置,以使得该电子设备可以确定出各个扬声器对应的音频信号所需调整的增益。在这种实现方式中,车辆的行驶速度可以由车辆传输至电子设备,也可以通过电子设备自己去感知,此处不做限定。另外,在这种实现方式中,电子设备可以先调整第二音频数据中各个声道的增益,再将调整后的音频数据发送至车辆进行播放。
此外,前述的方法,也可以根据实际情况,选择部分由车辆或者集成在车辆中的电子设备(比如车载终端等)执行,另一部分由与车辆分离的电子设备(比如手机等)执行,即前述方法中各个步骤的执行主体可以根据需求进行适应性调整,且调整后的方案仍在本申请的保护范围之内。对于调整执行主体后的方案,可以参考前述方法中的描述,此处就不在一一赘述。
5、驾车,并利用车辆中的电子设备进行导航,且驾驶员出现驾驶疲劳的场景。
示例性的,图35示出了本申请一些实施例中的一种应用场景。如图35所示,在驾驶员A驾驶车辆200前往目的地的过程中,驾驶员A可以利用位于车辆200中的电子设备100导航至目的地。其中,当驾驶员A出现驾驶疲劳时,可以改变电子设备100导航播报的音频数据的特征参数(比如音调、增益等),使得驾驶员在听觉的冲击下提高注意力,实现安全驾驶。
在图35中,电子设备100位于车辆200中,其可以为集成在车辆200中的设备,比如车载终端,也可以为与车辆200分离的设备,比如驾驶员A的手机等,此处不做限定。当电子设备100集成在车辆200中时,电子设备100可以直接利用车辆200中的扬声器播报其所需播报的音频数据。当电子设备100与车辆200分离布置时,电子设备100与车辆200间可以但不限于通过短距通信(比如蓝牙等)的方式建立连接。其中,当电子设备100与车辆200间分离布置时,电子设备100可以将其所需播报的音频数据传输至车辆200,并通过车辆200上的扬声器进行播报,或者,电子设备100可以通过其内置的扬声器播报其所需播报的音频数据。
另外,车辆200的内部可以设置有摄像头等图像采集装置,以采集驾驶员的面部数据。此外,车辆200的内部还可以设置有扬声器,电子设备100中所需播报的导航音可以通过车辆200上的扬声器播报。在一些实施例中,在车辆200的外部可以设置有用于采集路况信息 的传感器(比如雷达、摄像头等)。
示例性的,图36示出了本申请一些实施例中的一种声音处理方法。在图36中,电子设备100可以为集成在车辆200中的设备,比如车载终端,也可以为与车辆200分离的设备,比如驾驶员A的手机等。如图36所示,该方法可以包括以下步骤:
S3601、电子设备100确定驾驶员的疲劳等级。
本实施例中,在车辆200中可以配置有图像采集装置,比如摄像头等。通过该图像采集装置可以实时采集或者周期性采集(比如每隔2秒,3秒或5秒采集一次等)驾驶员A的面部数据,比如:眼睛、嘴巴等。其中,车辆200可以将其上图像采集装置采集到的一定时长(比如5秒等)的驾驶员A的面部数据传输至电子设备100。示例性的,车辆200可以基于动态滑窗的方式缓存短时间(5s或者10s)采集到的面部数据。比如,车辆200可以将其采集到的视频中的某个时间段(比如:1s至5s,2s至6s,或者3s至7s等)的数据,作为所需的驾驶员A的面部数据。
电子设备100获取到驾驶员A的面部数据后,可以将其获取到的面部数据输入至预先训练的疲劳监测模型中,以由疲劳监测模型输出驾驶员A的疲劳等级。在一些实施例中,疲劳监测模型可以但不限于是基于卷积神经网络(convolutional neural network,CNN)训练得到。
作为一种可能的实现方式,可以基于预设时长内驾驶员A的眨眼次数、打哈欠次数或者点头次数等与疲劳等级间的映射关系,确定出驾驶员A的疲劳等级。
举例来说,以眨眼次数为例,如表1所示,表1中示出的是眨眼次数与疲劳等级之间的映射关系。其中,当基于驾驶员A的面部数据检测出10秒内驾驶员A的眨眼次数为10次时,由表1中可以确定出驾驶员A的疲劳等级为3级。可以理解,疲劳等级越高,表示驾驶员A在预设时间段内越疲劳。
表1
S3602、电子设备100根据疲劳等级,确定第一特征参数的目标调整值,第一特征参数为当前所需播放的音频数据的特征参数。
本实施例中,电子设备100确定出驾驶员A的疲劳等级后,可以查询预先设定的疲劳等级与特征参数的调整值之间的映射关系,确定出当前所需播放的音频数据的特征参数的目标调整值。在一些实施例中,特征参数可以包括:音调和/或响度等。
作为一种可能的实现方式,电子设备100可以根据疲劳等级和预先设定的特征参数对应的关系表达式,确定出目标调整值。
示例性的,疲劳等级越高,音调的目标调整值越高,带给用户的听感刺激性越强。疲劳等级越高,响度的目标调整值越高,听感响度越大。举例来说,若音调对应的关系表达式可以为S=0.2*x2+1,响度对应的关系表达式可以为G=0.5*x+1,其中,x为疲劳等级,则当疲劳等级为1级时,音调的目标调整值为1.2,响度的目标调整值为1.5。
S3603、电子设备100根据目标调整值,对当前所需播放的音频数据进行处理,得到目标音频数据,其中,目标音频数据的特征参数的值高于第一特征参数的值。
本实施例中,电子设备100可以调整当前所需播报的导航音的音频数据的音调和/或响度等,以得到目标音频数据。其中,目标音频数据的特征参数的值高于第一特征参数的值。例如,当特征参数为响度,且响度的单位用标准化值(比如放大倍数等)表示时,若目标调整值为1.5,电子设备100可以对当前所需播报的导航音的音频数据的响度进行调整,且调整后的响度为原始的增益的1.5倍;若目标调整值为10,且响度的单位用分贝表示时,电子设备100可以对当前所需播报的导航音的音频数据的响度进行调整,且调整后的响度为原始的音量响度和目标调整值之和。当特征参数为音调时,若目标调整值为1.2,电子设备100可以基于变调算法将当前所需播报的导航音的音频数据的音调升高至原始的音调的1.2倍。示例性的,变调算法可以为同步波形叠加法(synchronized overlap-add,SOLA)、固定同步波形叠加法(synchronized overlap-add and fixed synthesis,SOLAFS)、时域基音同步叠加法(time-domain pitch synchronized overlap-add,TD-PSOLA)、波形相似叠加法(waveform similarity overlap-and-add,WSOLA)等时域法,也可以为基音同步波形叠加算法(pitch-synchronized overlap-add,PSOLA)等频域法。
在一些实施例中,为了保证当前所需播报的导航音的清晰度,可以采用变调不变速的方式对当前所需播报的导航音的音频数据的音调进行处理。
为便于理解,下面以采用时域法实现变调不变速为例进行说明。在采用时域法调整时,一般可以采用“变速不变调+重采样”的方式来达到变调不变速的效果。其中,可以先对当前所需播报的导航音的音频数据进行变速不变调处理,然后再进行重采样处理。
对于变速不变调处理,如图37的(a)所示,可以先在原始时域x上对当前所需播报的导航音的音频数据进行分帧处理。接着,如图37的(b)所示,可以取出一帧数据(即xm),以及将该帧数据添加至时域y上。然后,如图37的(c)所示,可以间隔固定的采样点数Ha取出另一帧数据(即xm+1)。最后,如图37的(d)所示,可以将图37的(b)中取出的一帧数据(即xm)和图37的(c)中取出的另一帧数据(即xm+1)进行波形叠加,即可以得到由xm和xm+1重建的语音,即时域y上的音频数据。应理解的是,在重建语音过程中,可以每间隔固定的采样点数Ha均取一帧数据,并将取出的数据进行叠加,从而得到重建后的所需播报的导航音的音频数据。其中,Ha的值可以预先设定。另外,通过上述方式重建的音频数据时,重建后的音频数据所包含的帧数会减少,且采样点减少,但采样率与原始的音频数据的采样率一样,因此,在播放时声音的速度会变快,从而达到了变速不变调的目的。
对于重采样,可以选定相应的重采样因子P/Q,实现P/Q倍的重采样,从而使得重采样后的语速和音调变为原来的Q/P倍。其中,P为上采样因子,Q为下采样因子。重采样的过程可以包括上采样过程和下采样过程。其中,上采样的过程是:向原始信号中各个相邻的两个采样点间均内插(P-1)个采样点,从而使得原始信号的基音周期变为原来的P倍,时长变为原来的P倍,即基频变为原来的1/P倍,音调降为原来的1/P倍,语速变为原来的1/P倍。下采样的过程是:在原始信号中,每间隔(Q-1)个采样点抽取一个采样点,从而使得基音周期长度变为原来的1/Q倍,时长变为原来的1/Q倍,即基频变为原来的Q倍,音调升为原来的Q倍,语速变为原来的Q倍。通过按照重采样因子P/Q对变速不变调后的音频数据进行重采样,即可以将音频数据的语速和音调均调制为原始的Q/P倍。其中,重采样因子P/Q可以由音调对应的调整值得到,例如,当音调对应的目标调整值为1.5时,则重采样因子P/Q=1/1.5=2/3。
S3604、电子设备100播放目标音频数据。
本实施例中,电子设备100在由当前所需播报的导航音的音频数据得到目标音频数据后,即可以播放目标音频数据。由于目标音频数据的特征参数的值高于第一特征参数(即当前所需播报的导航音的音频数据的特征参数)的值,因此可以达到提醒驾驶员的目的。例如,当播放的音频数据的音调较高和/或声音响度较高时,驾驶员听到的声音将比较刺耳,这样即可以达到刺激驾驶员的目的,从而提高驾驶员的注意力。在一些实施例中,当电子设备100未集成在车辆200中时,电子设备100可以通过其自身的扬声器播放该目标音频数据,也可以将该目标音频数据传输至车辆200,并由车辆200的扬声器进行播放。当电子设备100集成在车辆200中时,电子设备100可以通过车辆200的扬声器播放目标音频数据。
由此,当检测到驾驶员出现驾驶疲劳时,可以根据驾驶员的疲劳等级改变电子设备100导航播报的音频数据的特征参数(比如音调、响度等),从而使得播放的音频数据能够在听觉上对驾驶员产生冲击,进而提高驾驶员的注意力,实现安全驾驶。
在一些实施例中,电子设备100还可以根据疲劳等级确定出相应的提示语音,以及基于预先设定的播报顺序播报目标音频数据和提示语音,从而使得播报方式和语言更具生活化和人性化,提升用户体验。另外,若当前不需要播报导航音,但已根据疲劳等级等级确定出相应的提示语音,电子设备100则可以直接播放提示语音。示例性的,电子设备100可以根据疲劳等级查询预先设定的疲劳等级和提示语音之间的映射关系,确定出当前所需的提示语音。可选的,各个疲劳等级对应的提示语音可以是用户预先设定的,也可以是电子设备100中预设的模板语句。
举例来说,如表3所示,当疲劳等级为2级时,可以确定出提示语音为“注意!驾驶员已中度疲劳,请开窗通风”。此时,若目标音频数据为“前方50米请左转”,电子设备100所需播报的音频数据可以为“前方50米请左转,注意!驾驶员已中度疲劳,请开窗通风”。
表2
另外,若当前没有所需播报的导航音,电子设备100可以基于疲劳等级,确定出相应的提示语音,并播报该提示语音。
此外,电子设备100还可以根据疲劳等级和导航中的地图信息,确定出所需播报的提示语音。例如,继续参阅表3,当疲劳等级为3级时,所需播报的提示语音为“注意!注意!驾驶人员已极度疲劳,可于xxx米远的xxx路口/超市/中转站停车休息”。当电子设备100根据导航中的地图信息确定出在500米远的位置存在服务区时,电子设备100可以确定所需播报的提示语音为“注意!注意!驾驶人员已极度疲劳,可于500米远的服务区停车休息”。其中,电子设备100可以将由导航中的地图信息确定出的“500米远的服务区”,与根据疲劳等级确定出的提示语音“注意!注意!驾驶人员已极度疲劳,可于xxx米远的xxx路口/超市/中转站停车休息”进行拼接,以得到其最终所需播报的提示语音。
示例性的,对于音频拼接过程,将一段音频数据的脉冲编码调制(pulse code modulation,PCM)数据,插入到另一段音频数据的PCM数据中的某个时间点上,即完成两个音频数据的拼接。举例来说,假设一段音频数据A为[1,2,3,4,5],另一段音频数据B为[7,8,9],若需要 将音频数据B插入到音频数据A中的“3”和“4”之间,则可以将“7”、“8”、“9”插入到到“3”和“4”之间,从而将音频数据A和B拼接在一起。
在一些实施例中,为进一步达到提高驾驶员的注意力的目的,电子设备100还可以根据疲劳等级,确定出设置于车辆200中的信号灯的颜色和/或闪烁频率等;以及,控制车辆200中的信号灯以确定出的颜色和/或闪烁频率工作,从而在视觉上对驾驶员产生冲击,进而提高驾驶员的注意力,实现安全驾驶,并与导航声音实现视觉和听觉的同步告警。示例性的,电子设备100可以基于确定出的疲劳等级,查询预先设定的疲劳等级与信号灯间的映射关系,确定出信号灯的颜色和/或闪烁频率等。示例性的,疲劳等级越高,信号灯的色彩可以愈发明亮鲜艳,信号灯的闪烁频率可以越高。例如,如表2所示,表2示出了疲劳等级与信号灯的颜色和闪烁频率之间的映射关系,当确定出的疲劳等级为2级时,可以确定出信号灯的颜色为黄色,闪烁频率为每分钟闪烁60次。
表3
在一些实施例中,当车辆200处于自动驾驶状态时,此时一般不需要驾驶员集中注意力。但当路况较差(比如事故多发路段)、或者处于需要提醒用户的关键路段(比如需转弯的路口等)时,往往需要驾驶员操控车辆。因此,为了提升车辆在自动驾驶时的安全性,电子设备100在车辆200处于自动驾驶状态时,可以结合车辆200外部的路况信息,播报目标音频数据。另外,当车辆200未处于自动驾驶状态时,若车辆200当前行驶的路况较差或者处于需要提醒用户的关键路段时,电子设备100也可以播放目标音频数据,以提醒驾驶员集中注意力。
作为一种可能的实现方式,当驾驶员在车辆200上触发自动驾驶功能后,车辆200可以将其处于自动驾驶状态的信息通知电子设备100。这样,电子设备100即可以获知到车辆200处于自动驾驶状态。
另外,车辆200可以利用其外部的传感器(比如雷达、摄像头等)采集其外部的路况信息,以及将采集到的信息传输至电子设备100。电子设备100获取到车辆200外部的路况信息后,可以在路况较差时再播报目标音频数据。
示例性的,图38示出了本申请一些实施例中的一种声音处理方法。在图38中,电子设备100为与车辆200分离的设备,比如手机等,电子设备100和车辆200之间通过蓝牙等短距通信方式建立连接。在图38中,驾驶员使用电子设备100进行导航。在图38中,S3801、S3802、S3804、S3805可以参见图36中的相关描述,此处不再赘述。如图38所示,该方法可以包括以下步骤:
S3801、车辆200获取驾驶员的面部数据。
S3802、车辆200根据驾驶员的面部数据,确定驾驶员的疲劳等级。
S3803、车辆200将驾驶员的疲劳等级发送至电子设备100。
本实施例中,车辆200确定出驾驶员的疲劳等级后,可以将该疲劳等级发送至电子设备100。
另一些实施例中,车辆200也可以直接将步骤S3801中获取的驾驶员的面部数据发送给电子设备100。进一步地,电子设备100可以根据驾驶员的面部数据,确定驾驶员的疲劳等级。
S3804、电子设备100根据疲劳等级,确定第一特征参数的目标调整值,第一特征参数为当前所需播放的音频数据的特征参数。
S3805、电子设备100根据目标调整值,对当前所需播放的音频数据进行处理,得到目标音频数据,其中,目标音频数据的特征参数的值高于第一特征参数的值。
S3806、电子设备100将目标音频数据发送至车辆200。
本实施例中,电子设备100确定出目标音频数据后,可以将该目标音频数据发送至车辆200。S3807、车辆200播放目标音频数据。本实施例中,车辆200获取到目标音频数据后,可以播放该目标音频数据。
另一些实施例中,步骤S3806中,电子设备100也可以通过其自身的扬声器播放该目标音频数据,即电子设备100不需要将目标音频数据发送至车辆200。
由此,当检测到驾驶员出现驾驶疲劳时,可以根据驾驶员的疲劳等级改变电子设备100导航播报的音频数据的特征参数(比如音调、响度等),从而使得播放的音频数据能够在听觉上对驾驶员产生冲击,进而提高驾驶员的注意力,实现安全驾驶。
示例性的,图39示出了本申请一些实施例中的一种声音处理方法。在图39中,电子设备100为与车辆200分离的设备,比如手机等,电子设备100和车辆200之间通过蓝牙等短距通信方式建立连接。在图39中,驾驶员使用电子设备100进行导航。在图39中,S3901至S3906,可以参见前述的相关描述,此处不再赘述。如图39所示,该方法可以包括以下步骤:
S3901、车辆200获取驾驶员的面部数据。
S3902、车辆200根据驾驶员的面部数据,确定驾驶员的疲劳等级。
S3903、车辆200根据疲劳等级,确定第一特征参数的目标调整值,第一特征参数为当前所需播放的音频数据的特征参数。
S3904、电子设备100将待播放的音频数据发送至车辆200。
S3905、车辆200根据目标调整值,对待播放的音频数据进行处理,得到目标音频数据,其中,目标音频数据的特征参数的值高于第一特征参数的值。
S3906、车辆200播放目标音频数据。
另一些实施例中,步骤S3906中,车辆200也可以将目标音频数据发送至电子设备100,使得电子设备100播放该目标音频数据。
由此,当检测到驾驶员出现驾驶疲劳时,可以根据驾驶员的疲劳等级改变电子设备100导航播报的音频数据的特征参数(比如音调、响度等),从而使得播放的音频数据能够在听觉上对驾驶员产生冲击,进而提高驾驶员的注意力,实现安全驾驶。
示例性的,图40示出了本申请一些实施例中的一种声音处理方法。在图40中,电子设备100为与车辆200分离的设备,比如手机等,电子设备100和车辆200之间通过蓝牙等短距通信方式建立连接。在图40中,驾驶员使用电子设备100进行导航。在图40中,S4001至S4007,可以参见前述的相关描述,此处不再赘述。如图40所示,该方法可以包括以下步骤:
S4001、车辆200获取驾驶员的面部数据。
S4002、车辆200根据驾驶员的面部数据,确定驾驶员的疲劳等级。
S4003、车辆200根据疲劳等级,确定第一特征参数的目标调整值,第一特征参数为当前所需播放的音频数据的特征参数。
S4004、车辆200将目标调整值发送至电子设备100。
S4005、电子设备100根据目标调整值,对待播放的音频数据进行处理,得到目标音频数据,其中,目标音频数据的特征参数的值高于第一特征参数的值。
S4006、电子设备100将目标音频数据发送至车辆200。
S4007、车辆200播放目标音频数据。
另一些实施例中,步骤S4006中,电子设备100也可以通过其自身的扬声器播放该目标音频数据,即电子设备100不需要将目标音频数据发送至车辆200。
由此,当检测到驾驶员出现驾驶疲劳时,可以根据驾驶员的疲劳等级改变电子设备100导航播报的音频数据的特征参数(比如音调、响度等),从而使得播放的音频数据能够在听觉上对驾驶员产生冲击,进而提高驾驶员的注意力,实现安全驾驶。
可以理解的是,上述图38至图40所示的实施例中,电子设备100和车辆200之间可以进行交互的数据包括但不限于驾驶员的面部数据、驾驶员的疲劳等级、第一特征参数的目标调整值、待播放的音频数据、目标音频数据等。也可以理解为,上述确定驾驶员的疲劳等级、确定第一特征参数的目标调整值、对待播放的音频数据进行处理等过程可以在电子设备100上完成,也可以在车辆在200上完成。例如,车辆200获取驾驶员的面部数据后,可以由车辆200确定驾驶员的疲劳等级,车辆200也可以将驾驶员的面部数据发送给电子设备100,由电子设备确定驾驶员的疲劳等级。又例如,车辆200可以根据驾驶员的疲劳等级,确定第一特征参数的目标调整值,并将该目标调整值发送给电子设备100,电子设备100也可以自己根据驾驶员的疲劳等级确定第一特征参数的目标调整值。本申请不一一列举。在一些可能的实现方式中,上述实施例中的各步骤可以根据实际情况适应性调整执行主体,调整后的方案仍在本申请的保护范围之内。
6、用户选择多种音频数据叠加播放的场景。
一般地,人们在休息时,可以通过播放白噪音的方式,以达到助眠的效果。但单一的播放白噪音,给用户带来的听觉体验较差。因此,在播放白噪音时,可以同时播放一些其他的声音,比如,同时播放用户喜欢的歌曲等。但目前,在同时播放白噪音和其他的声音时,均是简单的将两者进行混音,这使得两者的融合效果较差,进而给用户带来的听觉体验也相对较差。
有鉴于此,本申请实施例提供了一种声音处理方法。该方法可以基于用户选择的背景音(即前述的其他的声音),对用户所选择的白噪音进行改造,从而使得两者可以更自然的融合在一起,进而给用户带来更好的听觉体验。
示例性的,图41示出了一种声音处理方法。可以理解,该方法可以通过任何具有计算、处理能力的装置、设备、平台、设备集群来执行,比如,可以但不限于通过音箱、手机等执行。在一些实施例中,该方法可以是在用户开启目标功能(比如:白噪音功能等),且用户有播放音频数据的需求的情况下执行。例如,当该方法通过手机等具有显示屏的设备执行时,用户可以在设备的系统或者设备上的某个应用程序(application,APP)中开启目标功能,且用户可以使用该设备播放歌曲。当该方法通过音箱等不具有显示屏的设备执行时,用户可 以通过与音箱等设备相连的其他的设备,对音箱等设备进行控制,以开启音箱等设备上的目标功能,且用户可以使用音箱等设备播放歌曲。
如图41所示,该声音处理方法可以包括以下步骤:
S4101、获取第一音频数据和第二音频数据。
本实施例中,第一音频数据可以为背景音,第二音频数据可以为白噪音。示例性的,背景音可以但不限于为某首歌曲。
在用户选定背景音和白噪音时,可以基于用户的选择,从网络上或者本地数据库中获取到第一音频数据和第二音频数据。示例性的,在电子设备(比如手机等)上可以配置有与播放音频数据相关的应用程序(application,APP),用户可以在该APP上选择背景音和白噪音。
另外,在用户选定背景音时,可以基于用户的选择,从网络上或者本地数据库中获取到该背景音。同时,还可以基于该背景音,查询预先设定的背景音与白噪声之间的映射关系,从网络上或者本地数据库中获取到与该背景音适配的白噪音。
在一些实施例中,第一音频数据的第一时长可以与第二音频数据的第二时长相等,这样两者可以同步播放。
其中,当第二音频数据的第二时长大于第一音频数据的第一时长时,可以从第二音频数据中截取出与第一时长相等的时长的数据,并将截取到的数据作为所需的第二音频数据。例如,当第一时长为10秒,第二时长为20秒时,可以将第二音频数据中前10秒的数据作为所需的数据,或者,将第二音频数据中第5秒至第15秒的数据作为所需的数据。
当第二音频数据的第二时长小于第一音频数据的第一时长时,可将多个第二音频数据进行拼接,并从拼接后的数据中截取出与第一时长相等的时长的数据,并将截取到的数据作为所需的第二音频数据。
S4102、获取第一音频数据的目标音频特征,目标音频特征包括:各个时刻的响度和各个节拍的位置点。
本实施例中,对于各个时刻的响度,可以由第一音频数据在时域上的波形图,确定出各个时刻的波形的幅值,进而确定出各个时刻的响度。其中,一个幅值为一个时刻的响度。
对于各个节拍的位置点,可以将第一音频数据输入至预先训练得到的机器学习模型,以得到各个节拍的位置点;其中,机器学习模型可以基于深度学习神经网络训练得到。另外,还可以基于节拍检测算法(比如librosa等),对第一音频数据进行处理,以得到第一音频数据中各个节拍的位置点。
S4103、根据目标音频特征,对第二音频数据进行处理,以得到第三音频数据。
本实施例中,可以基于第一音频数据中各个时刻的响度,并结合预先设定的噪音响度与音乐响度之间的比例关系,确定出第二音频数据中各个时刻对应的目标响度。进一步地,可以对第二音频数据中各个时刻的响度进行调整,以将其各个时刻的响度均调整至确定出的各个时刻对应的目标响度。例如,若第一音频数据中第一时刻的响度为10分贝,预先设定的噪音响度与音乐响度之间的比值为1/2,由此可以确定出第二音频数据中第一时刻的目标响度为5分贝。进一步地,可以将第二音频数据在第一时刻的响度调整至5分贝。
另外,可以基于各个节拍的位置点,对第二音频数据的音调进行调整,以使得第二音频数据的音调与第一音频数据的节奏相匹配。例如,当第一音频数据在某一时间段舒缓的时候,可以降低第二音频数据在该时间段的音调,从而使得第二音频数据也逐渐舒缓。
作为一种可能的实现方式,可以基于第一音频数据中相邻的两个节拍间的时间间隔,和, 预先设定的基准节奏,确定出是否调整第二音频数据的音调,以及,在需要调整第二音频数据的音调时,确定是升高音调还是降低音调。
举例来说,假设预先设定的基准节奏为:每分钟的节拍数为30下。当第一音频数据中相邻的两个节拍间的时间间隔为1秒时,可以确定出这两个相邻的节拍对应的节奏为每分钟的节拍数为60下。此时,确定出的节奏大于基准节奏,表明在这两个相邻的节拍间第一音频数据的节奏较快。因此,可以第二音频数据中相同的时间段内的音调升高,从而使得在该时间段内第一音频数据和第二音频数据所表达的情感相同。
进一步地,在确定出需要调整第二音频数据的音调后,可以由相邻的两个节拍确定出的节奏,和,预先设定的节奏与音调调整间的映射关系,确定出第二音频数据中在这两个节拍对应的位置点内的音调所需调整的目标音调调整值。接着,可以基于该目标音调调整值,并利用变调算法,对第二音频数据中在这两个节拍对应的位置点内的数据的音调进行调整。示例性的,当目标音调调整值为0.8时,可以基于变调算法,将第二音频数据中在这两个节拍对应的位置点内的数据的音调降低至原始的音调的0.8倍。在一些实施例中,可以通过采用上采样的方式从所需调整的数据中抽取一定数量的采样点,以完成升高音调的目的。另外,也可以通过采用上采样的方式从所需调整的数据中插入一定数量的采样点,以完成降低音调的目的。
作为又一种可能的实现方式,可以基于第一音频数据中相邻的两个节拍间的时间间隔,和,预先设定的基准节奏,确定出是否调整第二音频数据的音速(即音频播放速度),以及,在需要调整第二音频数据的音速时,确定是升高音速还是降低音速。对于确定升高音速还是降低音速的方式,可以参见前述的确定是升高音调还是降低音调的方式,此处不再赘述。
进一步地,在确定出需要调整第二音频数据的音速后,可以由相邻的两个节拍确定出的节奏,和,预先设定的节奏与音速调整间的映射关系,确定出第二音频数据中在这两个节拍对应的位置点内的音速所需调整的目标音速调整值。接着,可以基于该目标音速调整值,对第二音频数据中在这两个节拍对应的位置点内的数据的音速进行调整。示例性的,当目标音速调整值为0.8时,可以将第二音频数据中在这两个节拍对应的位置点内的数据的音速降低至原始的音速的0.8倍。在一些实施例中,可以通过采用上采样的方式从所需调整的数据中抽取一定数量的采样点,以完成升高音速的目的。另外,也可以通过采用上采样的方式从所需调整的数据中插入一定数量的采样点,以完成降低音速的目的。
可以理解的是,对于第二音频数据的音调和音速,可以同时调整,也可以择一调整,此处不做限定。
在基于目标音频特征,对第二音频数据进行处理后,即可以得到第三音频数据,并可以执行S4104。
S4104、播放目标音频数据,目标音频数据基于第一音频数据和第三音频数据得到。
本实施例中,可以通过混音算法对第一音频数据和第三音频数据进行混音处理,以得到目标音频数据。示例性的,当第一音频数据和第三音频数据的类型均为浮点(float)型时,可以直接将第一音频数据和第三音频数据叠加混合,以得到目标音频数据。当第一音频数据和第三音频数据的类型不是float型时,可以采用自适应加权混音算法、线性叠加求平均等混音算法,对第一音频数据和第三音频数据进行处理,以得到目标音频数据。
由此,基于第一音频数据的音频特征,对第二音频数据进行改造,从而使得两者可以更自然的融合在一起,进而给用户带来更好的听觉体验。
7、制作视频或动态图片的场景。
一般地,在制作视频或者动态图片的过程中,可以为视频或者动态图片中的对象增加空间音效,以使得用户在后续观看视频或动态图片时可以沉浸式地体验近似真实世界中的声音,从而带来更好的观看体验。在一些实施例中,制作视频可以是对原始的视频进行编辑,也可以是由多张图片生成一段视频,此处不做限定。动态图片可以理解为是图形交换格式(graphics interchange format,GIF)的文件。
有鉴于此,本申请实施例中还提供了一种声音处理方法,当用户在电子设备上制作视频或者动态图片时,可以根据自身需求为视频或动态图片中的目标对象添加空间音频,从而使得在视频或动态图片中目标对象的声音可以随着目标对象的运动而移动,进而使得用户听感更加真实,提升了观看体验。该声音处理方法对环境和信息采集设备无要求,且视频或动态图片中对象的音频位置与该对象的音频的实际位置相符,使得后续用户观看视频时不会出现听感与观感割裂的情况,提升了用户体验。
示例性的,图42示出了一种声音处理方法。可以理解,该方法可以通过任何具有计算、处理能力的装置、设备、平台、设备集群来执行。如图42所示,该方法可以包括以下步骤:
S4201、确定N张图片,N≥2。
本实施例中,N张图片可以是用户选择的图片。例如:用户可以从手机等电子设备中选取N张图片,以由这N张图片制作视频。
N张图片也可以是一段时间内用户拍摄的图片。例如,当用户利用手机等电子设备拍摄图片后,可以将一周、一个月或者一年的图片确定为所需的N张图片。
N张图片亦可以是按照预设的采样频率从用户选择的目标视频中抽取到的图片。示例性的,在从目标视频中抽取N张图片的过程中,可以记录抽取到的每张图片对应的时刻。例如,若采样频率是每1秒(second,S)采集一张图片,且采集到的第一张图片的时刻为0s,则采集到的第二张图片的时刻为1s,采集到的第三张图片的时刻为2s,等等。
N张图片还可以是从动态图片中提取到的图片。在一些实施例中,动态图片可以理解为是由多张图片拼接形成,因此,N张图片可以是组成动态图片的多张图片。
S4202、确定N张图片中各张图片在目标视频中出现的时刻,目标视频基于N张图片得到。
本实施例中,可以基于默认的播放N张图片所需的播放时长,或者,基于用户设定的播放时长,并按照预设顺序,将N张图片在该播放时长上均匀布置,以等间隔播放各张图片。例如,若N=10,且用户设定的播放时长为9s,则可以在0s、1s、2s,…,9s处分别放置一张图片播放。示例性的,预设顺序可以是基于拍摄图片或者提取到图片的时间顺序,也可以是用户指定的顺序,等等。可选地,目标视频的时长可以为默认的播放N张图片所需的播放时长,或者,用户设定的播放时长。
在一些实施例中,当N张图片是从用户选取的视频中抽取得到时,可以将每张图片在视频中对应的时刻,作为各张图片在目标视频中出现的时刻。此时,目标视频可以与用户选取的视频相同。
在一些实施例中,当N张图片是从动态图片中提取得到时,可以将每张图片在动态图片中对应出现的时刻,作为各张图片在目标视频中出现的时刻。此时,目标视频可以理解为是该动态图片。另外,还可以单独为各个图片设定一个时刻,并在后续将这些图片重新制作为视频或动态图片。可选地,目标视频的时长可以为播放动态图片所需的时长。
在一些实施例中,可以基于N张图片,筛选出一个与这些图片适配的音频数据。以及,根据确定出的音频数据,确定N张图片中各张图片在目标视频中出现的时刻。示例性的,可以将N张图片输入至人工智能(artificial intelligence,AI)模型(比如机器学习模型、神经网络模型等),以由AI模型对N张图片进行处理,从而得到与这些图片适配的音频数据。其中,该音频数据可以是本地数据库中存储的数据,也可以是网络上的音频数据,此处不做限定。可选地,目标视频的时长可以为筛选出的音频数据的时长。
当获取到的音频数据的时长较长时,可以从中截取一段数据作为所需的音频数据。其中,可以但不限于将音频数据中的高潮部分作为所需的音频数据。
在获取到所需的音频数据后,可以对该音频数据进行分析,以确定出该音频数据中各个节拍的位置点,和/或,每个小节的位置点。其中,各个节拍的位置点可以理解为是各个节拍的起始位置的时间点,每个小节的位置点可以理解为是每个小节的起始位置的时间点。示例性的,可以通过AI模型、节拍提取算法等,提取到该音频数据中各个节拍的位置点,和/或,每个小节的位置点。
接着,可以获取确定出的音频数据的播放时长,并将N张图片等间隔均匀布置在该播放时长上。以及,基于确定出的各个节拍的位置点,和/或,每个小节的位置点,对N张图片中至少一部分图片的出现时刻进行调整,从而使得N张图片中的至少一部分图片出现的时刻可以与某些节拍的位置点或者某些小节的位置点一致,使得在听感的关键点处呈现视觉的冲击变化,即在听感的关键点处用户可以观看到图片,从而在视听上产生一致的冲击感,进而提升用户体验。
其中,当采用各个节拍的位置点对N张图片中至少一部分图片的出现时刻进行调整时,对于任意一张图片,当与该图片出现的时刻距离最近的一个节拍的位置点上未设置图片时,可以将该图片出现的时刻调整至该位置点上。
举例来说,如图43所示,假设总共确定出5个节拍的位置点,且有4张图片。如图43的(A)所示,在等间隔对4张图片进行布置后,图片1出现的时刻位于节拍0的位置点,图片4出现的时刻位于节拍5的位置点,图片2和图片3均未出现在相应的节拍的位置点上,且图片2距离节拍2的位置点最近,图片3距离节拍3的位置点最近。因此,如图43的(B)所示,可以将图片2出现的时刻调整至节拍2的位置点,以及,将图片3出现的时刻调整至节拍3的位置点。
另外,当相邻的两个节拍的位置点上均布置有图片,且这两个节拍的位置点之间仍存在其他的图片时,可以不调整这些图片出现的时刻,也可以在这两个节拍的位置点之间均匀布置这些图片,具体可根据实际情况而定,此处不做限定。
当采用每个小节的位置点对N张图片中至少一部分图片的出现时刻进行调整时,对于任意一张图片,当与该图片出现的时刻距离最近的一个小节开始的位置点上未设置图片时,可以将该图片出现的时刻调整至该位置点上。另外,当该图片出现的时刻与该小节结束的位置点间的距离,小于,该图片出现的时刻与该小节开始的位置点间的距离时,还可以将该图片出现的时刻调整至该小节结束时的位置点上。具体调整方式可以参考采用各个节拍的位置点时的调整方式,此处不再赘述。
在一些实施例中,除了可以由N张图片筛选出所需的音频数据外,还可以将用户指定的某个音频数据作为所需的音频数据。以及,按照前述描述的方式,确定N张图片中各张图片在目标视频中出现的时刻。
S4203、确定N张图片中各张图片内包含的目标对象,以得到M个目标对象。
本实施例中,可以将N张图片中的各张图片分别输入到预先训练得到的目标检测模型中,以通过目标检测模型对各张图片所包含的目标对象进行检测,从而获取到各张图片中包含的目标对象。示例性的,目标检测模型可以但不限于基于卷积神经网络(convolutional neural networks,CNN)训练得到。示例性的,目标对象可以理解为是图片中能够产生声音的对象,比如,当图片中包括飞机时,该图片中的目标对象可以为飞机。
作为一种可能的实现方式,还可以基于目标检测算法(比如YOLOv4等),对各张图片进行处理,从而获取到各张图片中包含的目标对象。
作为另一种可能的实现方式,当基于目标检测模型或目标检测算法,获取到目标对象后,还可以向用户展示目标对象的选择界面,以供用户选择出其所需的目标对象。此时,目标对象为用户选择出的其所需的目标对象。
作为又一种可能的实现方式,还可以基于用户在图片上的选取操作,获取到各张图片中包含的目标对象。示例性的,在确定出N张图片后,可以向用户展示各张图片。用户在观看到某张图片时,其可以通过手动标记的方式在该图片中标记出目标对象。
S4204、确定M个目标对象在每张图片中的空间位置,以得到(M*N)个空间位置,以及,确定各个目标对象在目标视频中出现的时长,以得到M个第一时长。
本实施例中,对于确定M个目标对象在每张图片中的空间位置,可以以拍摄图片的设备的位置为中心构建一个三维坐标系。N张图片中的每张图片的中心位置即为三维坐标系的原点。在三维坐标系中,x轴和y轴组成的平面可以是图片所在的平面。在三维坐标系中,z轴可以表示深度,其描述的是目标对象到拍摄图片的设备的实际距离。目标对象在三维坐标系中的位置可以为(xi,yi,zi)表示。其中,在坐标系确定后,xi和yi的值即可以确定出。对于zi可以通过拍摄图片的设备上的飞行时间(time of flight,ToF)摄像头获取,或者,通过预先训练的深度检测模型获取。示例性的,深度检测模型可以但不限于基于卷积神经网络训练得到。应理解的是,本实施例中,目标对象的空间位置可以是指目标对象在三维坐标系中的位置。
在一些实施例中,当N张图片可以是用户选择的图片,或者,是一段时间内用户拍摄的图片时,可以按照每张图片的拍摄时间,对N张图片进行排序。然后,可以按照时间由远及近的方式,依次确定每张图片中所包含的目标对象,在各张图片中的空间位置。
其中,对于第i张图片中的第k个目标对象。当在第i张图片之前的图片中均不存在该第k个目标对象时,可以认为该第k个目标对象在第i张图片之前的每张图片中的空间位置均处于无穷远的位置处。应理解的是,在本实施例中,第i张图片可以是N张图片中的任意一张图片,第k个目标对象可以是第i张图片中的任意一个目标对象。
举例来说,参阅图44,图44的(A)所示的为第(i-1)张图片,图44的(B)所示的为第i张图片,图44的(C)所示的为第(i+1)张图片,同时,确定出的目标对象为图44的(B)中所示的小鸟4301。在图44的(A)所示的图片中不存在小鸟4301,且该图片的拍摄时间在图44的(B)所示的图片的拍摄时间之前,图44的(A)所示的图片之前不存在其他图片,因此,可以将小鸟4301在图44的(A)所示的图片的空间位置置于无穷远的位置处。
当在第(i+1)张图片中不存在该第k个目标对象时,可以将第(i+1)张图片上的某个边界上的位置作为该第k个目标对象在第(i+1)张图片中的空间位置。示例性的,边界上的位置可以是指定的某个边界上的某个位置,也可以是由第i张图片中确定出的目标对象的 朝向上的边界上的某个位置。
举例来说,继续参阅图44,在图44的(C)所示的图片(即第(i+1)张图片)中不存在小鸟4301,且该图片的拍摄时间在图44的(B)所示的图片的拍摄时间之后,因此,可以将小鸟4301在图44的(C)所示的图片的空间位置置于该图片的某个边界位置处。由于在图44的(B)中,小鸟4301是朝向图片的左上方移动,因此,可以将图44的(C)所示的图片的左上方的某个边界处的位置(比如区域4302所示的位置)作为小鸟4301的空间位置。
当在第i张图片和第(i+1)张图片中均存在该第k个目标对象,且在第(i+2)张图片中不存在该第k个目标对象时,可以由第k个目标对象在第i张图片和第(i+1)张图片中的空间位置,确定出其移动方向,并将第(i+2)张图片中在该移动方向上的边界处的位置,作为该第k个目标对象在第(i+2)张图片中的空间位置。
举例来说,参阅图45,图45的(A)所示的为第i张图片,图45的(B)所示的为第(i+1)张图片,图45的(C)所示的为第(i+2)张图片,同时,确定出的目标对象为图45的(A)和(B)中所示的小鸟4501。在图45的(A)和(B)中均存在小鸟4501,但在图45的(C)中不存在小鸟4501。由图45的(A)和(B)可以确定出小鸟4501的移动方向是图4中箭头所指的方向。在图45的(C)中箭头所指的方向上边界的位置为区域42,因此可以将区域42作为该小鸟4501在第(i+2)张图片中的空间位置。
进一步地,当在第(i+3)张图片中也不存在该第k个目标对象时,可以根据目标对象的移动方向,移动速度,以及第(i+2)张图片和第(i+3)张图片间的时间间隔,在第(i+3)张图片之外确定出一个位置,并将该位置作为该第k个目标对象在第(i+3)张图片处的空间位置。
举例来说,继续参阅图45,图45的(D)所示的为第(i+3)张图片。在图45的(D)中也不存在小鸟4501。由图45的(A)和(B)可以确定出小鸟4501的移动方向(即图中箭头所指的方向)和移动速度。然后,由移动方向和移动速度,以及图45的(C)和(D)所示的图片间的时间间隔,可以确定出在图45的(D)中,小鸟4501能够移动到区域43所示的位置。因此,可以将区域43所示的位置作为小鸟4501在第(i+3)张图片处的空间位置。其中,对于相邻的两张图片间的时间间隔,详见下文描述。
在一些实施例中,对于第i张图片中的第k个目标对象。当在第i张图片之前的图片中均不存在该第k个目标对象时,除了可以将该第k个目标对象在第i张图片之前的每张图片中的空间位置均处于无穷远的位置处,还可以采用前述的确定第k个目标对象在第(i+j)张图片处的空间位置的方式确定,j≥1。具体地,对于第k个目标对象在第(i-1)张图片处的空间位置,可以将其置于第(i-1)张图片的边界的某个位置处,比如,可以将在第(i-1)张图片中,且位于第k个目标对象在第i张图片中朝向的反方向上的某个边界处的位置,作为第k个目标对象在第(i-1)张图片处的空间位置,详见前述的确定第k个目标对象在第(i+1)张图片处的空间位置的方式,此处不再赘述。
在一些实施例中,对于第i张图片中的第k个目标对象。当第(i+1)张图片至第(i+j)张图片中不存在第k个目标对象,j≥1,且第(i+j+1)张图片中存在第k个目标对象时,可以以第i张图片为基准,以及通过前述的确定第k个目标对象在第(i+j)张图片处的空间位置的方式,确定出第k个目标对象在第(i+1)张图片至第(i+j)张图片中各张图片中的空间位置,并得到一个空间位置集合{Pi+1,...,Pi+j}。其中,Pi+j为第k个目标对象在第(i+j)张图片中的空间位置。
同时,可以以第(i+j+1)张图片为基准,以及通过前述的确定第k个目标对象在第(i+j)张图片处的空间位置的方式,确定出第k个目标对象在第(i+1)张图片至第(i+j)张图片中各张图片中的空间位置,并得到一个空间位置集合{P′i+1,...,P′i+j}。其中,P′i+j为第k个目标对象在第(i+j)张图片中的空间位置。
然后,可以根据空间位置集合{Pi+1,...,Pi+j}和空间位置集合{P′i+1,...,P′i+j},确定出第k个目标对象在第(i+1)张图片至第(i+j)张图片中各张图片中的空间位置。
作为一种可能的实现方式,可以对第k个目标对象在同一张图片中的两个空间位置进行加权平均,并将得到的结果作为第k个目标对象在该图片中的空间位置。例如:对于第k个目标对象在第(i+1)张图片中的空间位置,该位置可以为(Pi+1+P′i+1)/2。
作为另一种可能的实现方式,第k个目标对象在第(i+1)张图片至第(i+j)张图片中各张图片中均具有两个空间位置。因此,针对每张图片,均可以确定第k个目标对象在该张图片中的两个空间位置之间的距离,从而可以得到j个距离。
然后,可以从j个距离中选取出一个最短的距离(当然,也可以选用其他的距离,但最好是选用最短的距离,因为这样表明通过两种方式得到的空间位置相近,所以确定出的距离较为准确),并将该距离对应的图片作为目标图片。
接着,可以对第k个目标对象在目标图片中的两个空间位置进行加权平均,并将得到的结果作为第k个目标对象在该目标图片中的空间位置。
接着,可以将第k个目标对象在第i张图片中的空间位置和在目标图片中的空间位置相连,得到目标连线,并在该目标连线上确定出第k个目标对象在第i张图片和目标图片之间的各张图片中的空间位置。例如,由第k个目标对象在第i张图片中的空间位置和在目标图片中的空间位置,以及,第i张图片和目标图片间的时间间隔,可以确定出第k个目标对象的移动速度。由该移动速度,以及,第i张图片和任意一张图片(即第i张图片和目标图片间的某张图片)间的时间间隔,可以确定出在该时间间隔内第k个目标对象的移动距离。以第k个目标对象在第i张图片中的空间位置为起点,并在目标连线上找到与该起点相距第k个目标对象的移动距离的位置点,该位置点即为第k个目标对象在该任意一张图片(即第i张图片和目标图片间的某张图片)上的空间位置。对于确定第k个目标对象在目标图片和第(i+j+1)张图片之间的各张图片中的空间位置,可以参考确定第k个目标对象在第i张图片和目标图片之间的各张图片中的空间位置的方式,此处不再赘述。
应理解的是,当N张图片是按照预设的采样频率从用户选择的目标视频中抽取到的图片,或者,是从动态图片中提取到的图片时,确定每张图片中所包含的目标对象在各张图片中的空间位置的方式,可以参考前述的N张图片是用户选择的图片的方式,此处不再赘述。
对于确定各个目标对象在目标视频中出现的时长,针对任意一个目标对象,可以将其首次出现的时刻与最后一张图片结束播放的时刻间的时长,作为其在目标视频中出现的时长。
此外,在确定各个目标对象在目标视频中出现的时长时,针对任意一个目标对象,还可以将目标视频的时长,作为其在目标视频中出现的时长。
S4205、根据M个目标对象在各张图片中的目标位置,以及N张图片中各个相邻的图片出现的时间间隔,确定各个目标对象在各个相邻的图片间的移动速度。
本实施例中,在确定出各个目标对象在每张图片中的空间位置,以及确定出N张图片中各张图片出现的时刻后,可以基于速度计算公式,由M个目标对象在各张图片中的目标位置,和N张图片中各个相邻的图片出现的时间间隔,确定出各个目标对象在各个相邻的图片间的 移动速度。
举例来说,若目标对象p在第i个图像中的位置为Pi(xi,yi,zi),在第(i+1)个图像中的位置为Pi+1(xi+1,yi+1,zi+1),第i个图像出现的时刻为ti,第(i+1)个图像出现的时刻为ti+1,则目标对象p在第i个图像和第(i+1)个图像间的移动速度可以为Vi=(pi+1-pi)/(ti+1-ti)。
S4206、根据M个目标对象,得到Q个第一音频数据,1≤Q≤M,其中,一个第一音频数据至少与一个目标对象相关联。
本实施例中,可以基于各个目标对象,查询预先设定的目标对象与音频数据间的映射关系,确定出各个目标对象对应的第一音频数据的标识;以及,基于确定出的各个第一音频数据的标识,从预先设定的音频库中筛选出各个目标对象对应的第一音频数据,以得到Q个第一音频数据;此时,Q=M。示例性的,音频库中可以包括至少一个音频数据。
作为一种可能的实现方式,用户还可以从M个目标对象中选择出Q个目标对象,以及,为这Q个目标对象添加它们各自相关联的第一音频数据。其中,用户添加的第一音频数据是用户基于自身需求所选择的音频数据,比如,用户可以为飞机添加火车发出的声音,也可以为飞机添加飞机发出的声音,等。另外,用户添加的第一音频数据可以是本地音频库中的数据,也可以是网络上的数据,此处不做限定。
S4207、将各个第一音频数据的第二时长均调整至与相应的目标对象对应的第一时长相等,以得到Q个第二音频数据。
本实施例中,针对任意一个第一音频数据,当该第一音频数据的第二时长大于该第一音频数据对应的目标对象在目标视频中出现的第一时长时,可以从该第一音频数据中截取出与第一时长相等的时长的数据,从而得到第二音频数据。例如,当第一时长为10秒,第二时长为20秒时,可以将第一音频数据中前10秒的数据作为第二音频数据,或者,将第一音频数据中第5秒至第15秒的数据作为第二音频数据。
当该第一音频数据的第二时长小于该第一音频数据对应的目标对象在目标视频中出现的第一时长时,可以将多个该第一音频数据进行拼接,并从拼接后得到的音频数据中截取出与第一时长相等的时长的数据,从而得到第二音频数据。
S4208、根据各个目标对象对应的空间位置,以及各个目标对象在各个相邻的图片间的移动速度,分别对各个目标对象对应的第二音频数据进行处理,以得到Q个第三音频数据。
本实施例中,针对任意一个目标对象,可以基于该目标对象对应的各个空间位置,其在各个相邻的图片间的移动速度,以及其对应的第二音频数据的音频参数,并通过头相关传递函数(head related transfer function,HRTF)和多普勒算法,对其对应的第二音频数据进行处理,从而得到该目标对象对应的第三音频数据。其中,该第三音频数据是具有空间音效的音频数据。第二音频数据的音频参数可以包括采样率、声道数、比特率等。
举例来说,以第i张图片中的第k个目标对象为例,假设该第k个目标对象在第i张图片之前的均未出现,且在第i张图片之后的是朝向远离三维坐标系中原点的方向移动。当将第k个目标对象在第i张图片之前的图片处的位置均被置为无穷远时,可以在第i张图片出现之前,不播放第k个目标对象对应的音频数据,而从第i张图片开始播放第k个目标对象对应的音频数据,且在第i张图片之后,控制该目标对象的声音按照一定的速度逐渐远去。
当将第k个目标对象在第i张图片之前的图片处的位置为无穷远时,在第i张图片出现之前,可以控制该目标对象的声音按照一定的速度逐渐向用户身边移动,且在第i张图片之后,控制该目标对象的声音按照一定的速度逐渐远去。
在一些实施例中,当第k个目标对象是首次出现时,对于该目标对象对应的音频数据的声音大小,可以预先设定,也可以基于该目标对象所在的图片中的空间位置确定。比如,可以基于该目标对象在图片中的空间位置和三维坐标系的原点之间的距离,查询预先设定的距离和声音大小间的映射关系,确定出该目标对象在该图片中对应的音频数据的声音大小。
S4209、根据Q个第三音频数据和N张图片,得到目标视频。
本实施例中,可以基于混音算法对Q个第三音频数据进行混音处理,从而得到与N张图片相关的空间环境音频。另外,当在前述的S4202中,需要筛选出与N张图片适配的音频数据,或者,用户指定了某个音频数据时,还可以将这些音频数据与Q个第三音频数据进行混音处理,以得到与N张图片相关的空间环境音频。
在得到与N张图片相关的空间环境音频后,可以通过ffmpeg技术或者javaCV技术将空间环境音频与N张图片结合,从而生成带有空间音频的视频,即得到目标视频。
在一些实施例中,当N张图片是从某个视频中抽取得到的时,还可以将得到的空间环境音频与N张图片对应的视频合成,以生成带有空间音效的视频,即得到目标视频。
这样,由于最终获取到的目标视频是具有空间音效的视频,因此,在播放过程中,用户听到的与目标对象相关的声音是跟随目标对象运动而移动,从而做到了音随画动,让人感觉身临其境。
接下来,基于前述所描述的内容,本申请实施例还提供了一种声音处理方法。
示例性的,图46示出了一种声音处理方法。可以理解,该方法可以通过任何具有计算、处理能力的装置、设备、平台、设备集群来执行。如图46所示,该方法可以包括以下步骤:
S4601、获取目标参数,目标参数包括与目标设备关联的环境信息和/或用户的状态信息。
本实施例中,与目标设备关联的环境信息可以包括以下一项或多项:
目标设备所处区域的环境数据;目标设备所处的环境中需同时播放第一音频数据和第二音频数据,且第一音频数据和第二音频数据均通过同一设备播放,其中,第一音频数据为第一时间段内持续性播放的音频数据,第二音频数据为第一时间段内偶发性播放的音频数据;目标设备在目标空间中的目标位置,目标空间中配置有至少一个扬声器;目标设备产生的画面在目标空间中的目标位置,目标空间中配置有至少一个扬声器;或者,搭载有目标设备的车辆的行驶速度。
与目标设备关联的用户的状态信息可以包括以下一项或多项:
目标设备与目标用户的头部间的目标距离,目标用户的头部在目标空间中的目标位置,其中,目标空间中配置有至少一个扬声器;用户的疲劳等级;用户选择的第一音频数据和第二音频数据;或者,用户选择的图片,视频,或者,用户为目标对象所添加的音频数据。
对于获取目标参数的方式可以参见前述的实施例中的描述,此处不再赘述。
S4602、根据目标参数,对原始音频数据进行处理,得到目标音频数据,目标音频数据与环境信息和/或状态信息相匹配。
本实施例中,获取到目标参数后,可以根据目标参数对原始音频数据进行处理,以使得原始音频数据能够与目标参数相匹配,由此以构建出与当前环境或当前用户的状态适配的待播放的音频数据,从而使得待播放的音频数据能够与当前环境或当前用户的状态相融合,提升了用户体验。其中,在对原始音频数据处理后,可以得到目标音频数据,该目标音频数据可以与环境信息和/或状态信息相匹配。
S4603、输出目标音频数据。
本实施例中,在获取到目标音频数据后,可以输出该目标音频数据。
这样,由于目标音频数据是与当前环境或当前用户的状态相适配,所以,目标音频数据能够与当前环境或当前用户的状态相融合,提升了用户体验。
可以理解的是,上述各个实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。此外,在一些可能的实现方式中,上述实施例中的各步骤可以根据实际情况选择性执行,可以部分执行,也可以全部执行,此处不做限定。本申请的任意实施例的任意特征的全部或部分在不矛盾的前提下,可以自由地、任何地组合。组合后的技术方案也在本申请的范围之内。
可以理解的是,本申请实施例中涉及的电子设备100可以是手机、平板电脑、桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本,以及蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、人工智能(artificial intelligence,AI)设备、可穿戴式设备、车载设备、智能家居设备和/或智慧城市设备等。电子设备的示例性实施例包括但不限于搭载iOS、android、Windows、鸿蒙系统(Harmony OS)或者其他操作系统的电子设备,其中,本申请实施例中对该电子设备的具体类型不作特殊限制。
下面介绍本申请实施例涉及的电子设备100。图47示出了电子设备100的结构示意图。请参阅图47,电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。在一些实施例中,音频模块170可以用于对音频信号编码和解码。在一些实施例中,音频模块170还可以用于对音频信号进行音频处理,比如,调整音频信号的增益等。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency  modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long termevolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,颜色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备100的各种功能应用以及数据处理。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有 导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142 的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控器件”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
在一些实施例中,电子设备100可以基于步行者航位推算(pedestrian dead reckoning,PDR)算法对至少一个传感器采集到的数据进行处理,以得到用户的运动状态,比如移动方向、移动速度等等。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设备100分离。
电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。
图48是本申请实施例的电子设备100的软件结构框图。分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。应用程序层可以包括一系列应用程序包。
如图48所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图48所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(media libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
应理解,本申请实施例可以适用于Android、IOS或者鸿蒙等等系统中。
可以理解的是,本申请的实施例中的处理器可以是中央处理单元(central processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、 专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmable rom,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。

Claims (48)

  1. 一种声音处理方法,其特征在于,所述方法包括:
    获取目标参数,所述目标参数包括与目标设备关联的环境信息和/或用户的状态信息;
    根据所述目标参数,对原始音频数据进行处理,得到目标音频数据,所述目标音频数据与所述环境信息和/或所述状态信息相匹配;
    输出所述目标音频数据。
  2. 根据权利要求1所述的方法,其特征在于,所述目标参数包括所述环境信息,所述环境信息包括所述目标设备所处区域的环境数据;
    所述根据所述目标参数,对原始音频数据进行处理,得到目标音频数据,具体包括:
    根据所述环境数据,确定与所述环境数据相关联的N个声音对象,N≥1;
    获取各个所述声音对象对应的白噪音,得到N个音频数据,每个所述音频数据均与一个所述声音对象关联;
    将所述N个音频数据合成,得到所述目标音频数据。
  3. 根据权利要求2所述的方法,其特征在于,所述获取各个所述声音对象对应的白噪音,得到N个音频数据,具体包括:
    基于所述N个声音对象,查询原子数据库,得到所述N个音频数据,其中,所述原子数据库中配置有各个单一对象在特定的一段时间内的音频数据。
  4. 根据权利要求2所述的方法,其特征在于,所述环境数据中包括环境声音;
    所述获取各个所述声音对象对应的白噪音,得到N个音频数据,具体包括:
    从所述环境声音中提取出M个所述声音对象的音频数据,以得到M个音频数据,0≤M≤N;
    其中,当M<N时,基于所述N个声音对象中剩余的声音对象,查询原子数据库,得到(N-M)个音频数据,其中,所述原子数据库中配置有各个单一对象在特定的一段时间内的音频数据。
  5. 根据权利要求4所述的方法,其特征在于,在得到所述M个音频数据之后,还包括:
    将所述M个音频数据中各个音频数据所包含的声道的增益均调整至目标值。
  6. 根据权利要求2-5任一所述的方法,其特征在于,每个所述音频数据所表达的情感均与所述环境数据所表达的情感相同。
  7. 根据权利要求1所述的方法,其特征在于,所述目标参数包括所述环境信息,所述环境信息包括所述目标设备所处的环境中需同时播放第一音频数据和第二音频数据,且所述第一音频数据和所述第二音频数据均通过同一设备播放,其中,所述第一音频数据为第一时间段内持续性播放的音频数据,所述第二音频数据为所述第一时间段内偶发性播放的音频数据;
    所述根据所述目标参数,对原始音频数据进行处理,得到目标音频数据,具体包括:
    获取待播放的所述第二音频数据;
    根据所述第二音频数据,从所述第一音频数据中提取出待播放的第三音频数据,以及,对所述第三音频数据进行目标处理,得到第四音频数据,其中,所述第二音频数据和所述第四音频数据对应的播放时间段相同,所述目标处理包括人声消除或人声降低;
    根据所述第二音频数据,确定所述第二音频数据所需调整的第一增益,以及,基于所述第一增益,对所述第二音频数据中各个声道的增益进行调整,得到第五音频数据;
    根据所述第四音频数据或者所述第五音频数据,确定所述第四音频数据所需调整的第二增益,以及,基于所述第二增益,对所述第四音频数据中各个声道的增益进行调整,得到第 六音频数据;
    基于所述第五音频数据和所述第六音频数据,得到所述目标音频数据。
  8. 根据权利要求7所述的方法,其特征在于,所述第二音频数据为第一数据,或者,所述第四音频数据为第一数据;
    其中,根据所述第一数据,确定所述第一数据所需调整的增益,具体包括:
    获取所述第一数据的音频特征,所述音频特征包括以下一项或多项:时域特征,频域特征,或者,乐理特征;
    根据所述音频特征,确定所述第一数据所需调整的增益。
  9. 根据权利要求7所述的方法,其特征在于,所述根据所述第五音频数据,确定所述第四音频数据所需调整的第二增益,具体包括:
    获取所述第五音频数据的最大响度值;
    根据所述第五音频数据的最大响度值和第一比例,确定所述第二增益,其中,所述第一比例为所述第二音频数据的最大响度值和所述第四音频数据的最大响度值间的比例。
  10. 根据权利要求7-9任一所述的方法,其特征在于,在确定出所述第二增益之后,所述方法还包括:
    基于所述第一增益,对所述第二增益进行修正。
  11. 根据权利要求7-10任一所述的方法,其特征在于,在确定出所述第二增益之后,所述方法还包括:
    确定所述第二增益大于预设增益值;
    将所述第二增益更新为所述预设增益值。
  12. 根据权利要求7-11任一所述的方法,其特征在于,所述基于所述第二增益,对所述第四音频数据中各个声道的增益进行调整,具体包括:
    在所述第四音频数据播放开始之后,且与所述第四音频数据播放开始的时刻相距第一预设时间的第一时长内,按照第一预设步长将所述第四音频数据中各个声道的增益逐渐调整至所述第二增益;
    以及,在所述第四音频数据播放结束之前,且与所述第四音频数据播放结束的时刻相距第二预设时间的第二时长内,按照第二预设步长将所述第四音频数据中各个声道的增益逐渐由所述第二增益调整至预设增益值。
  13. 根据权利要求7-11任一所述的方法,其特征在于,所述基于所述第二增益,对所述第四音频数据中各个声道的增益进行调整,具体包括:
    在所述第四音频数据播放开始之前,且与所述第四音频数据播放开始的时刻相距第一预设时间的第一时长内,按照第一预设步长将所述第四音频数据中各个声道的增益逐渐调整至所述第二增益;
    以及,在所述第四音频数据播放结束之后,且与所述第四音频数据播放结束的时刻相距第二预设时间的第二时长内,按照第二预设步长将所述第四音频数据中各个声道的增益逐渐由所述第二增益调整至预设增益值。
  14. 根据权利要求1所述的方法,其特征在于,所述目标参数包括所述环境信息,所述环境信息包括所述目标设备在目标空间中的目标位置,所述目标空间中配置有至少一个扬声器;
    所述根据所述目标参数,对原始音频数据进行处理,得到目标音频数据,具体包括:
    确定所述目标设备与N个扬声器间的距离,以得到N个第一距离,N为正整数,其中,所 述N个扬声器与所述目标设备处于同一空间中;
    根据所述N个第一距离和所述N个扬声器,构建目标虚拟扬声器组,所述目标虚拟扬声器组由M个目标虚拟扬声器组成,所述M个目标虚拟扬声器位于以所述目标设备所处的位置为中心,且以所述N个第一距离中的目标距离为半径的圆上,M的值与构建空间环绕声所需的扬声器的数量相等,所述M个目标虚拟扬声器的布置方式与构建空间环绕声所需的扬声器的布置方式相同,每个所述目标虚拟扬声器均通过调整所述N个扬声器中的至少一个扬声器对应的音频信号的增益得到;
    根据在所述N个扬声器中且与所述目标虚拟扬声器关联的扬声器对应的音频信号所需调整的增益,对所述原始音频数据中各个声道的增益进行调整,得到所述目标音频数据。
  15. 根据权利要求14所述的方法,其特征在于,所述目标距离为所述N个第一距离中的最小值。
  16. 根据权利要求14或15所述的方法,其特征在于,所述根据所述N个第一距离和所述N个扬声器,构建目标虚拟扬声器组,具体包括:
    以所述目标距离为基准,确定所述N个扬声器中除目标扬声器之外的各个扬声器对应的音频信号所需调整的增益,以构建出第一虚拟扬声器组,所述第一虚拟扬声器组为将所述N个扬声器均虚拟至以所述目标设备为中心,且以所述目标距离为半径的圆上得到的扬声器的组合,所述目标扬声器为所述目标距离对应的扬声器;
    根据所述第一虚拟扬声器组和构建空间环绕声所需的扬声器的布置方式,确定所述目标虚拟扬声器组,其中,所述目标虚拟扬声器组中的中置扬声器位于所述目标设备当前的朝向上的预设角度范围内。
  17. 根据权利要求14或15所述的方法,其特征在于,所述根据所述N个第一距离和所述N个扬声器,构建目标虚拟扬声器组,具体包括:
    根据所述N个扬声器,所述N个第一距离,构建空间环绕声所需的扬声器的布置方式,所述目标设备的朝向,以及所述目标设备所处的位置,构建第一虚拟扬声器组,所述第一虚拟扬声器组中包括M个第一虚拟扬声器,每个所述第一虚拟扬声器均通过调整所述N个扬声器中的至少一个扬声器对应的音频信号的增益得到;
    确定所述目标设备与各个所述第一虚拟扬声器间的第二距离,以得到M个第二距离;
    将所述M个第一虚拟扬声器均虚拟至以所述目标设备所处的位置为中心,且以所述第二距离中的一个距离为半径的圆上,以得到所述目标虚拟扬声器组。
  18. 根据权利要求14-17任一所述的方法,其特征在于,在所述确定所述目标设备与N个扬声器间的距离之前,所述方法还包括:
    根据所述目标设备所处空间中配置的扬声器,所述目标设备的朝向,所述目标设备所处的位置,以及构建空间环绕声所需的扬声器的布置方式,从所述目标设备所处空间中配置的扬声器中筛选出所述N个扬声器,所述N个扬声器用于构建空间环绕声。
  19. 根据权利要求14-18任一所述的方法,其特征在于,所述方法还包括:
    确定所述目标设备与所述目标空间中的各个扬声器间的距离;
    根据所述目标设备与所述目标空间中的各个扬声器间的距离,确定所述目标空间中的各个扬声器在播放音频数据时的延迟时间;
    控制所述目标空间中的各个扬声器按照相应的所述延迟时间播放音频数据。
  20. 根据权利要求1所述的方法,其特征在于,所述目标参数包括所述环境信息,所述环 境信息包括所述目标设备产生的画面在目标空间中的目标位置,所述目标空间中配置有至少一个扬声器;
    所述根据所述目标参数,对原始音频数据进行处理,得到目标音频数据,具体包括:
    根据所述目标位置,构建与所述目标空间匹配的虚拟空间,所述虚拟空间的体积小于所述目标空间的体积;
    根据所述目标空间中各个扬声器的位置,在所述虚拟空间中构建出目标虚拟扬声器组,所述目标虚拟扬声器组中包括至少一个目标虚拟扬声器,且每个所述目标虚拟扬声器均通过调整所述目标空间中的一个扬声器对应的音频信号的增益得到;
    根据在所述目标空间中且与所述目标虚拟扬声器关联的扬声器对应的音频信号所需调整的增益,对所述原始音频数据中各个声道的增益进行调整,得到所述目标音频数据。
  21. 根据权利要求20所述的方法,其特征在于,所述根据所述目标空间中各个扬声器的位置,在所述虚拟空间中构建出目标虚拟扬声器组,具体包括:
    根据所述虚拟空间和所述目标空间间的比例,在所述虚拟空间中确定出所述目标虚拟扬声器组中各个目标虚拟扬声器的位置;
    根据各个所述目标虚拟扬声器和与各个所述目标虚拟扬声器对应的目标扬声器间的距离,确定出各个所述目标扬声器对应的音频信号所需调整的增益,以得到所述目标虚拟扬声器组,所述目标扬声器为所述目标空间中的扬声器。
  22. 根据权利要求20或21所述的方法,其特征在于,所述方法还包括:
    确定所述目标设备产生的画面与所述目标空间中的各个扬声器间的距离;
    根据所述目标设备产生的画面与所述目标空间中的各个扬声器间的距离,确定所述目标空间中的各个扬声器在播放音频数据时的延迟时间;
    控制所述目标空间中的各个扬声器按照相应的所述延迟时间播放音频数据。
  23. 根据权利要求1所述的方法,其特征在于,所述目标参数包括所述用户的状态信息,所述用户的状态信息包括所述目标设备与目标用户的头部间的目标距离,所述目标用户的头部在目标空间中的目标位置,所述目标空间中配置有至少一个扬声器;
    所述根据所述目标参数,对原始音频数据进行处理,得到目标音频数据,具体包括:
    根据所述目标距离、所述目标位置和所述目标空间中各个扬声器的位置,构建目标虚拟扬声器组,所述目标虚拟扬声器组中包括至少一个目标虚拟扬声器,每个所述目标虚拟扬声器均通过调整所述目标空间中的一个扬声器对应的音频信号的增益得到,每个所述目标虚拟扬声器均处于以所述目标位置为圆心且以所述目标距离为半径的圆上;
    根据在所述目标空间中且与所述目标虚拟扬声器关联的扬声器对应的音频信号所需调整的增益,对所述原始音频数据中各个声道的增益进行调整,得到所述目标音频数据。
  24. 根据权利要求23所述的方法,其特征在于,所述根据所述目标距离、所述目标位置和所述目标空间中各个扬声器的位置,构建目标虚拟扬声器组之后,还包括:
    根据所述目标虚拟扬声器组,构建第一虚拟扬声器组,所述第一虚拟扬声器组由M个虚拟扬声器组成,所述M个虚拟扬声器位于以所述目标位置为中心,且以所述目标距离为半径的圆上,M的值与构建空间环绕声所需的扬声器的数量相等,所述M个虚拟扬声器的布置方式与构建空间环绕声所需的扬声器的布置方式相同,所述M个虚拟扬声器中每个虚拟扬声器均通过调整所述目标空间中的至少一个扬声器对应的音频信号的增益得到;
    所述根据在所述目标空间中且与所述目标虚拟扬声器关联的扬声器对应的音频信号所需 调整的增益,对所述原始音频数据中各个声道的增益进行调整,得到所述目标音频数据,具体包括:
    根据在所述目标空间中且与所述M个虚拟扬声器关联的扬声器对应的音频信号所需调整的增益,对所述原始音频数据中各个声道的增益进行调整,得到所述目标音频数据。
  25. 根据权利要求1所述的方法,其特征在于,所述目标设备位于车辆中,所述目标参数包括所述环境信息,所述环境信息包括所述车辆的行驶速度、转速和加速踏板的开度中的一项或多项;
    所述根据所述目标参数,对原始音频数据进行处理,得到目标音频数据,具体包括:
    根据所述行驶速度、所述转速和所述加速踏板的开度中的至少一个,从原始音频数据中,确定出第一音频数据,其中,所述第一音频数据为基于所述行驶速度对所述原始音频数据中的目标音频粒子进行伸缩变换得到;
    根据所述行驶速度,确定所述车辆的加速度,并根据所述加速度,调整所述第一音频数据中各个声道的增益,以得到第二音频数据,以及,确定所述车辆中的声场向目标方向移动的目标速度;
    根据所述目标速度,确定所述目标音频数据的声源的虚拟位置;
    根据所述虚拟位置,确定所述车辆中多个扬声器对应的音频信号的所需调整的目标增益,得到F个目标增益,F≥2;
    根据所述F个目标增益,调整所述第二音频数据中各个声道的增益,以得到所述目标音频数据。
  26. 根据权利要求25所述的方法,其特征在于,在根据所述行驶速度,调整所述第一音频数据中各个声道的增益之前,还包括:
    确定所述行驶速度的变化值超过预设速度阈值;和/或
    确定所述第一音频数据中每个声道的增益对应的调整值均小于或等于预设调整值,其中,当所述第一音频数据中目标声道的增益对应的目标调整值大于所述预设调整值时,将所述目标调整值更新为所述预设调整值。
  27. 根据权利要求25或26所述的方法,其特征在于,所述目标参数还包括所述车辆的加速时长,所述方法还包括:
    根据所述加速时长,控制所述车辆中的氛围灯工作。
  28. 根据权利要求1所述的方法,其特征在于,所述目标参数包括所述用户的状态信息,所述状态信息包括用户的疲劳等级;
    所述根据所述目标参数,对原始音频数据进行处理,得到目标音频数据,具体包括:
    根据所述疲劳等级,确定第一特征参数的目标调整值,第一特征参数为当前所需播放的原始音频数据的特征参数,所述第一特征参数包括音调和/或响度;
    根据所述目标调整值,对所述原始音频数据进行处理,得到所述目标音频数据,其中,所述目标音频数据的特征参数的值高于所述第一特征参数的值。
  29. 根据权利要求28所述的方法,其特征在于,所述输出所述目标音频数据,具体包括:
    根据所述疲劳等级,确定第一目标提示音;
    根据预先设定的播报顺序,输出所述目标音频数据和所述第一目标提示语音。
  30. 根据权利要求28或29所述的方法,其特征在于,所述方法还包括:
    根据所述疲劳等级和地图信息,确定第二目标提示音;
    输出所述第二目标提示音。
  31. 根据权利要求28-30任一所述的方法,其特征在于,所述目标设备位于车辆中;
    所述输出所述目标音频数据之前,所述方法还包括:
    确定所述车辆处于自动驾驶状态,且所述车辆所处的路段的路况低于预设路况阈值,和/或,确定所述车辆所处的路段为预设路段。
  32. 根据权利要求1所述的方法,其特征在于,所述目标参数包括所述用户的状态信息,所述状态信息包括用户选择的第一音频数据和第二音频数据;
    所述根据所述目标参数,对原始音频数据进行处理,得到目标音频数据,具体包括:
    确定所述第一音频数据的第一音频特征,第一音频特征包括:各个时刻的响度和/或各个节拍的位置点;
    根据所述第一音频特征,调整所述第二音频数据的第二音频特征,以得到第三音频数据,所述第二音频特征包括响度、音调和音速中的至少一项;
    根据所述第一音频数据和所述第三音频数据,得到所述目标音频数据。
  33. 根据权利要求32所述的方法,其特征在于,所述第一音频特征包括:所述第一音频数据的各个时刻的响度,所述第二音频特征包括响度;
    所述根据所述目标音频特征,调整所述第二音频数据的第二音频特征,具体包括:
    根据所述各个时刻的响度和预设响度比例,确定所述第二音频数据中各个时刻对应的目标响度;
    将所述第二音频数据中各个时刻的响度,调整至所述第二音频数据中各个时刻对应的目标响度。
  34. 根据权利要求32或33所述的方法,其特征在于,所述目标音频特征包括:各个节拍的位置点,所述第二音频特征包括音调和/或音速;
    所述根据所述目标音频特征,调整所述第二音频数据的音调,具体包括:
    针对所述第一音频数据中任意相邻的两个节拍,根据所述任意相邻的两个节拍,确定所述任意相邻的两个节拍对应的目标节奏;
    根据所述目标节奏,确定所述第二音频数据在所述任意相邻的两个节拍对应的位置点内的第二音频特征的目标调整值;
    根据所述目标调整值,对所述第二音频数据在所述任意相邻的两个节拍对应的位置点内的第二音频特征进行调整。
  35. 根据权利要求1所述的方法,其特征在于,所述目标参数包括所述用户的状态信息,所述状态信息包括以下一项或多项:用户选择的图片,视频,或者,用户为目标对象所添加的音频数据;
    所述根据所述目标参数,对原始音频数据进行处理,得到目标音频数据,具体包括:
    确定N张图片,N≥2;
    确定所述N张图片中各张图片内包含的目标对象,以得到M个目标对象,M≥1;
    确定各个所述目标对象在所述N张图中每张图片中的空间位置,以及,确定各个所述目标对象在目标视频中出现的时长,以得到M个第一时长,所述目标视频基于所述N张图片得到;
    根据各个所述目标对象的空间位置,以及所述N张图片中各个相邻的图片在所述目标视频中出现的时刻,确定各个所述目标对象在各个相邻的图片间的移动速度;
    根据所述M个目标对象,得到Q个第一音频数据,1≤Q≤M,其中,一个所述第一音频数据至少与一个所述目标对象相关联;
    将各个所述第一音频数据的第二时长均调整至与相应的所述目标对象对应的第一时长相等,以得到Q个第二音频数据;
    根据各个所述目标对象的空间位置,以及各个所述目标对象在各个相邻的图片间的移动速度,分别对各个所述目标对象对应的第二音频数据进行处理,以得到Q个第三音频数据;
    根据所述Q个第三音频数据和所述N张图片,得到目标视频,其中,所述目标视频中包括所述目标音频数据,所述目标音频数据基于所述Q个第三音频数据得到。
  36. 根据权利要求35所述的方法,其特征在于,所述方法还包括:
    根据所述N张图片,确定出与所述N张图片匹配的第四音频数据;
    将所述第四音频数据中至少一部分节拍的位置点作为所述N张图片中至少一部分图片出现的时刻,和/或,将所述第四音频数据中至少一部分小节的开始或结束的位置点作为所述N张图片中至少一部分图片出现的时刻。
  37. 根据权利要求35或36所述的方法,其特征在于,所述确定各个所述目标对象在所述N张图中每张图片中的空间位置,具体包括:
    针对第i张图片内的第k个目标对象,基于预先设定的三维坐标系,确定所述第k个目标对象在所述第i张图片中的第一空间位置,其中,所述三维坐标系的中心点为第i张图片的中心位置,所述第i张图片为所述N张图中的任意一张图片,所述第k个目标对象为所述第i张图片中的任意一个目标对象。
  38. 根据权利要求37所述的方法,其特征在于,所述方法还包括:
    确定所述第(i+1)张图片中不存在所述第k个目标对象;
    将所述第(i+1)张图片的第一边界上的第一位置,作为所述第k个目标对象在所述第(i+1)张图片中的第二空间位置。
  39. 根据权利要求38所述的方法,其特征在于,所述第一边界为所述第k个目标对象在所述第i张图片中的目标朝向上的边界,所述第一位置在所述第(i+1)张图片中以所述第一空间位置为起点,且在所述目标朝向上延伸的直线与所述第一边界的交点。
  40. 根据权利要求38或39所述的方法,其特征在于,所述方法还包括:
    确定所述第(i+2)张图片中不存在所述第k个目标对象;
    根据所述第一空间位置,所述第二空间位置,以及所述第i张图片和所述第(i+1)张图片间的时间间隔,确定所述第k个目标对象的第一移动速度和第一移动方向;
    将所述第(i+2)张图片之外的第二位置,作为所述第k个目标对象在所述第(i+2)张图片中的第三空间位置;其中,所述第二位置为在所述第一移动方向上,且与在所述第(i+2)张图片中的所述第二空间位置相距第一目标距离的位置点,所述第一目标距离根据所述第一移动速度,以及所述第(i+1)张图片和所述第(i+2)张图片间的时间间隔得到。
  41. 根据权利要求37-40任一所述的方法,其特征在于,所述方法还包括:
    确定所述第(i-1)张图片中不存在所述第k个目标对象,其中,i≥2;
    将所述第(i-1)张图片的第二边界上的第三位置,作为所述第k个目标对象在所述第(i-1)张图片中的第四空间位置。
  42. 根据权利要求41所述的方法,其特征在于,所述第二边界为所述第k个目标对象在所述第i张图片中的目标朝向的反方向上的边界,所述第三位置在所述第(i-1)张图片中以所 述第一空间位置为起点,且在所述目标朝向的反方向上延伸的直线与所述第二边界的交点。
  43. 根据权利要求41或42所述的方法,其特征在于,所述方法还包括:
    确定所述第(i-2)张图片中不存在所述第k个目标对象,其中,i≥3;
    根据所述第一空间位置,所述第四空间位置,以及所述第i张图片和所述第(i-1)张图片间的时间间隔,确定所述第k个目标对象的第二移动速度和第二移动方向;
    将所述第(i-2)张图片之外的第四位置,作为所述第k个目标对象在所述第(i-2)张图片中的第五空间位置;其中,所述第四位置为在所述第二移动方向的反方向上,且与在所述第(i-2)张图片中的所述第四空间位置相距第二目标距离的位置点,所述第二目标距离根据所述第二移动速度,以及所述第(i-1)张图片和所述第(i-2)张图片间的时间间隔得到。
  44. 根据权利要求37-43任一所述的方法,其特征在于,所述方法还包括:
    确定第(i+1)张图片至第(i+j)张图片中均不存在所述第k个目标对象,j≥2,且第(i+j+1)张图片中存在所述第k个目标对象,(i+j+1)≤N;
    以所述第i张图片为基准,分别确定所述第k个目标对象在第(i+1)张图片至第(i+j)张图片中各张图片中的空间位置,以得到第一空间位置集合{Pi+1,...,Pi+j},其中,Pi+j为所述第k个目标对象在所述第(i+j)张图片中的空间位置,以及,以所述第(i+j+1)张图片为基准,分别确定所述第k个目标对象在第(i+1)张图片至第(i+j)张图片中各张图片中的空间位置,以得到第二空间位置集合{P′i+1,...,P′i+j},其中,P′i+j为所述第k个目标对象在所述第(i+j)张图片中的空间位置;
    根据所述第一空间集合和所述第二空间集合,确定所述第k个目标对象在第(i+1)张图片至第(i+j)张图片中各张图片中的空间位置。
  45. 根据权利要求44所述的方法,其特征在于,所述根据所述第一空间集合和所述第二空间集合,确定所述第k个目标对象在第(i+1)张图片至第(i+j)张图片中各张图片中的空间位置,具体包括:
    根据所述第一空间集合和所述第二空间集合,分别确定所述第k个目标对象在所述第(i+1)张图片至所述第(i+j)张图片中每张图片内的两个空间位置之间的距离,以得到j个距离;
    根据所述第一空间集合和所述第二空间集合,确定所述第k个目标对象在第(i+c)张图片中的空间位置,所述第(i+c)张图片为所述j个距离的一个距离对应的图片,1≤c≤j;
    根据所述第k个目标对象在所述第i张图片中的空间位置,所述第k个目标对象在所述第(i+j+1)张图片中的空间位置,所述第k个目标对象在所述第(i+c)张图片中的空间位置,以及,所述第i张图片至所述第(i+j+1)张图片中各张图片在所述目标视频中出现的时刻,确定所述第k个目标对象所述第i张图片至所述第(i+c)张图片间的各张图片中的空间位置,以及确定所述第k个目标对象所述第第(i+c)张图片至所述第(i+j+1)张图片间的各张图片中的空间位置。
  46. 一种电子设备,其特征在于,包括:
    至少一个存储器,用于存储程序;
    至少一个处理器,用于执行所述存储器存储的程序;
    其中,当所述存储器存储的程序被执行时,所述处理器用于执行如权利要求1-45任一所述的方法。
  47. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,当所述计算机程序在电子设备上运行时,使得所述电子设备执行如权利要求1-45任一所述的方法。
  48. 一种计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求1-45任一所述的方法。
PCT/CN2023/099912 2022-06-24 2023-06-13 一种声音处理方法及电子设备 WO2023246563A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210727150.7 2022-06-24
CN202210727150.7A CN117334207A (zh) 2022-06-24 2022-06-24 一种声音处理方法及电子设备

Publications (2)

Publication Number Publication Date
WO2023246563A1 true WO2023246563A1 (zh) 2023-12-28
WO2023246563A9 WO2023246563A9 (zh) 2024-02-22

Family

ID=89277929

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/099912 WO2023246563A1 (zh) 2022-06-24 2023-06-13 一种声音处理方法及电子设备

Country Status (2)

Country Link
CN (1) CN117334207A (zh)
WO (1) WO2023246563A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105392102A (zh) * 2015-11-30 2016-03-09 武汉大学 用于非球面扬声器阵列的三维音频信号生成方法及系统
CN106657648A (zh) * 2016-12-28 2017-05-10 上海斐讯数据通信技术有限公司 预防疲劳驾驶的移动终端及其实现方法
CN107277736A (zh) * 2016-03-31 2017-10-20 株式会社万代南梦宫娱乐 模拟系统、声音处理方法及信息存储介质
CN107705778A (zh) * 2017-08-23 2018-02-16 腾讯音乐娱乐(深圳)有限公司 音频处理方法、装置、存储介质以及终端
CN113050913A (zh) * 2019-12-27 2021-06-29 哈曼国际工业有限公司 用于提供自然声音的系统和方法
CN113271380A (zh) * 2020-02-14 2021-08-17 斑马智行网络(香港)有限公司 音频的处理方法和装置
CN114040318A (zh) * 2021-11-02 2022-02-11 海信视像科技股份有限公司 一种空间音频的播放方法及设备
CN114140986A (zh) * 2021-11-23 2022-03-04 奇瑞汽车股份有限公司 疲劳驾驶预警方法、系统、存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105392102A (zh) * 2015-11-30 2016-03-09 武汉大学 用于非球面扬声器阵列的三维音频信号生成方法及系统
CN107277736A (zh) * 2016-03-31 2017-10-20 株式会社万代南梦宫娱乐 模拟系统、声音处理方法及信息存储介质
CN106657648A (zh) * 2016-12-28 2017-05-10 上海斐讯数据通信技术有限公司 预防疲劳驾驶的移动终端及其实现方法
CN107705778A (zh) * 2017-08-23 2018-02-16 腾讯音乐娱乐(深圳)有限公司 音频处理方法、装置、存储介质以及终端
CN113050913A (zh) * 2019-12-27 2021-06-29 哈曼国际工业有限公司 用于提供自然声音的系统和方法
CN113271380A (zh) * 2020-02-14 2021-08-17 斑马智行网络(香港)有限公司 音频的处理方法和装置
CN114040318A (zh) * 2021-11-02 2022-02-11 海信视像科技股份有限公司 一种空间音频的播放方法及设备
CN114140986A (zh) * 2021-11-23 2022-03-04 奇瑞汽车股份有限公司 疲劳驾驶预警方法、系统、存储介质

Also Published As

Publication number Publication date
CN117334207A (zh) 2024-01-02
WO2023246563A9 (zh) 2024-02-22

Similar Documents

Publication Publication Date Title
WO2020151387A1 (zh) 一种基于用户运动状态的推荐方法及电子设备
CN112397062A (zh) 语音交互方法、装置、终端及存储介质
CN114089933B (zh) 显示参数的调整方法、电子设备、芯片及可读存储介质
WO2021258814A1 (zh) 视频合成方法、装置、电子设备及存储介质
WO2022017261A1 (zh) 图像合成方法和电子设备
CN112214636A (zh) 音频文件的推荐方法、装置、电子设备以及可读存储介质
EP4203447A1 (en) Sound processing method and apparatus thereof
CN111835904A (zh) 一种基于情景感知和用户画像开启应用的方法及电子设备
CN113420177A (zh) 音频数据处理方法、装置、计算机设备及存储介质
WO2022143258A1 (zh) 一种语音交互处理方法及相关装置
WO2022022335A1 (zh) 天气信息的展示方法、装置和电子设备
WO2021238371A1 (zh) 生成虚拟角色的方法及装置
WO2023169448A1 (zh) 一种感知目标的方法和装置
CN114979457A (zh) 一种图像处理方法及相关装置
WO2023246563A1 (zh) 一种声音处理方法及电子设备
CN113554932A (zh) 轨迹回放方法及相关装置
WO2022222688A1 (zh) 一种窗口控制方法及其设备
CN115641867A (zh) 语音处理方法和终端设备
WO2022127211A1 (zh) 震动方法、装置、电子设备和可读存储介质
CN111916105B (zh) 语音信号处理方法、装置、电子设备及存储介质
CN111722896B (zh) 动画播放方法、装置、终端以及计算机可读存储介质
CN115964231A (zh) 基于负载模型的评估方法和装置
CN114445522A (zh) 笔刷效果图生成方法、图像编辑方法、设备和存储介质
WO2024114785A1 (zh) 一种图像处理方法、电子设备及系统
WO2024060968A1 (zh) 管理服务卡片的方法和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23826212

Country of ref document: EP

Kind code of ref document: A1