WO2024087727A1 - 基于车载语音ai的语音数据处理方法及相关设备 - Google Patents

基于车载语音ai的语音数据处理方法及相关设备 Download PDF

Info

Publication number
WO2024087727A1
WO2024087727A1 PCT/CN2023/105292 CN2023105292W WO2024087727A1 WO 2024087727 A1 WO2024087727 A1 WO 2024087727A1 CN 2023105292 W CN2023105292 W CN 2023105292W WO 2024087727 A1 WO2024087727 A1 WO 2024087727A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
vehicle
singing
data
features
Prior art date
Application number
PCT/CN2023/105292
Other languages
English (en)
French (fr)
Inventor
张贵海
卢放
周冰
李平
唐马政
杨锦
苗宇栋
Original Assignee
岚图汽车科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 岚图汽车科技有限公司 filed Critical 岚图汽车科技有限公司
Publication of WO2024087727A1 publication Critical patent/WO2024087727A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Definitions

  • the present disclosure belongs to the field of intelligent voice technology and relates to a voice data processing method based on vehicle-mounted voice AI and related equipment.
  • the present disclosure proposes a voice data processing method and related equipment based on in-vehicle voice AI, which improves the interactivity between users and AI assistants as well as the entertainment and product competitiveness of the in-vehicle smart cockpit.
  • a voice data processing method based on in-vehicle voice AI comprising: obtaining audio features and lyrics features of a target song; generating in-vehicle voice AI singing data of the target song based on the audio features and lyrics features; playing the target song and the in-vehicle voice AI singing data at the same time; and collecting and playing audio data of the target user in real time.
  • a voice data processing device based on in-vehicle voice AI
  • a computer-readable storage medium includes a stored program, and when the program is executed by a processor, the processor is prompted to implement the above-mentioned voice data processing method based on in-vehicle voice AI.
  • an electronic device comprising at least one processor and at least one memory connected to the processor; the processor is used to call program instructions in the memory and execute the voice data processing method based on in-vehicle voice AI.
  • FIG1 is a flow chart showing a method for processing voice data based on vehicle-mounted voice AI according to some embodiments of the present disclosure
  • FIG2 shows a structural block diagram of a voice data processing device based on vehicle-mounted voice AI according to some embodiments of the present disclosure
  • FIG3 shows a structural block diagram of an electronic device according to some embodiments of the present disclosure.
  • the embodiments of the present disclosure provide a voice data processing method based on in-vehicle voice AI, as shown in Figure 1, the method may include steps S101 to S104.
  • step S101 the audio features and lyrics features of the target song are obtained.
  • the above-mentioned actual application scenario may be that when the vehicle is in the starting state, the user operates the karaoke application installed in the vehicle's in-vehicle infotainment system, first selects the chorus mode, and there are multiple classified songs in the chorus mode.
  • the chorus mode can be a duet mode or a sing-along mode; then selects the song to be sung in chorus.
  • the target song may be a chorus song selected by the user, and the in-vehicle infotainment system may obtain the audio features and lyrics features of the selected chorus song.
  • the audio features may be the phoneme information, tone information, rhythm boundary text information, note information, beat information, legato score information, etc. of the selected chorus song.
  • the lyrics features may be the text data corresponding to the selected chorus song.
  • the lyrics information stored in the chorus may also be the lyrics information obtained by analyzing the audio data corresponding to the selected chorus song.
  • step S102 in-vehicle voice AI singing data of the target song is generated based on the audio features and lyrics features.
  • the AI assistant is an in-vehicle voice AI application configured in the in-vehicle infotainment system.
  • the in-vehicle voice AI singing data of the target song can be generated based on audio features, lyrics features and the in-vehicle voice AI application.
  • in-vehicle voice AI singing data can be generated based on audio features, lyrics features and the AI singing synthesis scheme model configured in the AI assistant.
  • the AI assistant can generate the in-vehicle voice AI singing data of the target song by configuring the AI singing synthesis solution model of the deep learning neural network algorithm based on the phoneme information, tone information, rhythmic boundary text information, note information, beat information, legato score information and lyrics information of the target song obtained in step S101.
  • the AI singing synthesis solution model is configured after model training using massive data.
  • the AI assistant can generate the in-vehicle voice AI singing data of the song “Good Luck Comes” based on the lyrics information of the song “Good Luck Comes", the phoneme information, tone information, and rhythmic boundary text information corresponding to the text in the lyrics information, the note information, beat information, and legato score information of the accompaniment audio of the song “Good Luck Comes", and the AI singing synthesis scheme model.
  • phonemes are the smallest units of speech divided according to the natural properties of speech. If analyzed based on the pronunciation actions in a syllable, one pronunciation action constitutes a phoneme. Phonemes are divided into two categories: vowels and consonants. For example, the Chinese syllable ⁇ ( ⁇ ) has only one phoneme, ⁇ (ài) has two phonemes, and ⁇ (dài) has three phonemes.
  • Tone refers to the change in the pitch of a language. It is the pitch of the sound that is inherent in Chinese syllables and has a distinct meaning.
  • the pitch of a tone is relative, not absolute; the change of a tone is a sliding movement, not a leap from one scale to another.
  • the pitch of a tone is usually marked in fives.
  • Rhythmic boundaries play an important role in the naturalness and accuracy of language expression. In people's communication, the pauses between sentences are rhyme boundaries.
  • step S103 the target song and the in-vehicle voice AI singing data are played simultaneously.
  • the selected target song and the car voice AI singing data can be processed by the karaoke application in the car infotainment system, and then output through the speaker device in the car infotainment system.
  • the target song with the original singer or the target song with only the accompaniment can be outputted by turning on or off the original singer.
  • step S104 the audio data of the target user is collected and played in real time.
  • the in-vehicle infotainment system can be connected to an external or built-in sound source acquisition device. While the target song and the in-vehicle voice AI singing data are playing, the sound information input by the user currently using the karaoke application through the sound source acquisition device is collected in real time, and the sound information is used as the audio data of the target user and played through the speaker device. Of course, the collected sound information can also be processed through echo cancellation technology, and after obtaining the audio data of the target user, the audio data is output through the speaker device.
  • the user when using the karaoke application, can connect an external microphone to the USB port of the in-vehicle infotainment system, select the song corresponding to the chorus mode of their preference, and sing at a reasonable angle and within the sound source collection range to input the human voice source.
  • the speaker of the in-vehicle infotainment system will play the song accompaniment and the in-vehicle voice AI singing data in step S103, and the in-vehicle infotainment system will filter the sound source output by the above speaker using sound echo cancellation technology, and output the human voice source input by the user and the filtered sound source with low latency.
  • echo cancellation is a processing method that prevents the sound from the far end from returning by eliminating or removing the far-end audio signal picked up by the local microphone.
  • the removal of this audio signal is completed through digital signal processing.
  • the basic principle of echo cancellation is to establish a voice model of the far-end signal based on the correlation between the speaker signal and the multipath echo generated by it, use it to estimate the echo, and continuously modify the coefficient of the filter to make the estimated value closer to the real echo. Then, the echo estimate is subtracted from the input signal of the microphone to achieve the purpose of echo cancellation.
  • the above scheme can solve the problem that currently in the vehicle, based on the mature karaoke software that has been configured in most vehicle cockpits, users can sing by themselves or with other users through a microphone.
  • This type of application software can generate a human voice with a reverberation effect after the user's voice is processed and mixed, and then mixed with the song accompaniment to make a singing sound, but the user cannot use this type of application software to sing with the AI assistant or sing duet, and can only cut the original song for chorus, that is, the user cannot sing the user's favorite songs with the AI assistant at the same time, and the interactivity with the AI assistant in entertainment singing is low, resulting in the problem that the product competitiveness of the smart cockpit is not high.
  • the user can select the chorus mode on the karaoke application, and then select the song to be chorused.
  • the in-vehicle infotainment system can extract audio features and lyrics features for the target song selected by the user, generate the in-vehicle voice AI singing data of the target song, play the target song and the in-vehicle voice AI singing data, and collect the audio data of the target user in real time, and finally mix and play the above three audios, so as to achieve the effect of improving the interactivity between the user and the AI assistant and the entertainment and product competitiveness of the in-vehicle smart cockpit.
  • step S101: obtaining audio features and lyrics features of the target song may include step S201-A or S201-B.
  • step S201-A the audio features and lyrics features of the target song are obtained based on the karaoke application.
  • the audio features and lyric features of the target song mentioned in step S101 can be directly analyzed by the karaoke application.
  • the karaoke application can directly call the karaoke audio file of the target song that is cached internally or downloaded, and analyze the karaoke audio file to obtain the audio features and lyric features of the above audio file.
  • step S201-B the audio data of the target song is obtained based on the karaoke application; and the audio features and lyrics features of the target song are determined based on the in-vehicle voice AI application and the audio data of the target song.
  • step S201-B first transfers the karaoke audio file of the target song that has been cached internally or downloaded to the in-vehicle voice AI application based on the karaoke application, and then the in-vehicle voice AI application (i.e., AI assistant) analyzes the above karaoke audio file to obtain the audio features and lyrics features of the target song.
  • AI assistant the in-vehicle voice AI application
  • the above embodiment designs two methods to analyze karaoke audio files. During the implementation process, it can be decided whether the karaoke application or the in-vehicle voice AI application will parse the karaoke audio files of the target songs according to the busy status of the process. This makes the in-vehicle infotainment system smoother during use.
  • step S301 the voice line of the AI singing of the in-vehicle voice AI singing data is the same as the voice line currently set by the in-vehicle voice AI application.
  • the voice lines of the in-vehicle voice AI application configured by the in-vehicle infotainment system are diverse, which can be male, female, dialect, etc., and the dialect can be Cantonese, Sichuanese, Northeastern dialect, etc.
  • users are often familiar with the voice of the in-vehicle voice AI application they set, and have a familiar sense of companionship.
  • the in-vehicle infotainment system can synchronize the voice line of the AI singing voice of the in-vehicle voice AI singing data with the voice line currently set by the in-vehicle voice AI application to the same voice line by synchronizing the voice line features, so that users can experience the chorus of the voice that accompanies them day and night, shorten the distance between users, and make the AI singing no longer a cold companionship for users, increase the coordination of the overall chorus, and enhance the user's singing experience. At the same time, it avoids the destruction of the overall entertainment singing atmosphere in the car due to the existence of two different voice lines when the in-vehicle infotainment system outputs certain prompt sounds.
  • the voice line set by the in-vehicle voice AI application can be set to a gentle female voice, and the voice line of the above AI singing voice can be set to be synchronized with the voice line set by the current in-vehicle voice AI application.
  • the in-vehicle voice AI singing voice data played by the speaker according to the voice line characteristics of the AI singing voice is audio based on the gentle female voice line.
  • step S401 the singing preference of the target user is determined based on the historical karaoke data of the target user.
  • the in-vehicle infotainment system records the user's singing habits and emotional ups and downs based on the sound source input by the user through the sound source acquisition device, and obtains the user's singing preference after analysis through the AI singing synthesis solution model of the neural network algorithm, and saves the singing preference to the database.
  • the singing preference can be personalized and classified and stored by setting different storage names.
  • the in-vehicle infotainment system can remind the user whether to perform classified storage. After the classified storage is performed, the karaoke application is started again, and when the user sings with the AI assistant through the karaoke application, the in-vehicle infotainment system can automatically match the stored singing preferences according to the user's voice characteristics to improve the singing experience.
  • step S402 the in-vehicle voice AI singing data is adjusted according to the singing preference.
  • the in-vehicle infotainment system will adapt the user's singing preference during the chorus according to the singing preference determined in step S401, so that the entire chorus is more harmonious and pleasant to the ear.
  • the singing preference can be the emotional expression of the user's singing, which can be high-pitched, excited, lost, sad, etc., and the emotional expression and volume of the AI singing of the in-vehicle voice AI singing data are adaptively adjusted according to the singing preference.
  • the historical karaoke data of user A recorded in the database are mostly sad songs, and the singing preferences are mostly low volume and little fluctuation in tone.
  • the AI assistant will adapt to the singing preferences of user A and adjust parameters such as tone and rhythmic boundaries to be similar to user A, so as to better complete the collaborative singing of the song.
  • step S102: generating in-vehicle voice AI singing data of the target song based on audio features and lyrics features may include step S501: transmitting the in-vehicle voice AI singing data to the karaoke application based on the Android interface definition language AIDL.
  • step S102 after the AI assistant generates the in-vehicle voice AI singing data of the target song according to the audio features and lyrics features by configuring the AI singing synthesis solution model of the deep learning neural network algorithm, the above-mentioned in-vehicle voice AI singing data needs to be transmitted to the K song application, and then it can be mixed and broadcast together with the target song.
  • the K song application and the AI assistant belong to two independent process programs in the in-vehicle infotainment system, so the in-vehicle voice AI singing data needs to be transmitted across processes.
  • Android Interface Definition Language is an interface definition language compiled based on Android.
  • different applications run in their own independent processes, and applications cannot access each other's memory space.
  • PCI Peripheral Component Interconnect
  • Android supports the PCI mechanism, but it requires serialized data that can be read by Android.
  • AIDL is used to describe the above data.
  • step S601 when the above method is executed, it may also include step S601: adjusting the voice voice currently set by the in-vehicle voice AI application based on the voice preference of the target user.
  • the voice line of the AI singing voice of the in-vehicle voice AI singing data can change along with the voice line currently set by the in-vehicle voice AI application. Then, according to the voice line preference set by the user, the voice line setting of the in-vehicle voice AI application can be adjusted to synchronize the voice line of the AI singing voice to the voice line preferred by the user.
  • the voice line can be a gentle female voice, a deep male voice, etc.
  • the in-vehicle infotainment The music system will recognize user A as a male or female voice, and adaptively switch the gender voice characteristics of the AI singing voice. If the voice of the AI singing voice has been pre-set to be the same as the voice voice currently set by the in-vehicle voice AI application, but the voice voice currently set by the in-vehicle voice AI application is different from the gender voice characteristics of the adaptively switched AI singing voice, then the priority of this step is greater than step S301 by default, and the voice setting is overwritten to achieve the purpose of improving the chorus experience.
  • step S701 adaptively adjusting the environmental atmosphere inside the vehicle based on the voice line of the AI singing voice data and/or the audio features of the target song.
  • the lighting and shading devices in the car can be adaptively adjusted according to the voice of the AI singing and/or the audio characteristics of the target song, so as to better integrate the scene with the atmosphere of the selected target song, so that the smart cockpit in the car is no longer just a carrier of the chorus song, but a part of the overall singing atmosphere.
  • the in-vehicle infotainment system automatically closes the window sunshade, adaptively adjusts the grayscale of the vehicle glass, reduces the light transmittance and saturation of the vehicle glass, and adjusts the interior ambient light to ice blue to simulate a snowy atmosphere. It also adjusts the low-frequency, mid-frequency, and high-frequency parameters of the speaker equipment to create a good singing environment.
  • the embodiment of the present disclosure also provides a voice data processing device based on vehicle-mounted voice AI, which is used to implement the method shown in the above-mentioned Figure 1 and the above-mentioned multiple embodiments.
  • This device embodiment corresponds to the aforementioned method embodiment, and can correspondingly implement all the contents of the aforementioned method embodiment.
  • the device may include: an acquisition unit 21, which is used to acquire the audio features and lyrics features of the target song; a generation unit 22, which is used to generate the vehicle-mounted voice AI singing data of the target song based on the audio features and lyrics features; a speaker unit 23, which is used to play the target song and the vehicle-mounted voice AI singing data at the same time; a collection and broadcasting unit 24, which is used to collect and play the audio data of the target user in real time.
  • the acquisition unit 21 is also used to acquire the audio features and lyrics features of the target song based on a Karaoke application; or, to acquire the audio data of the target song based on a Karaoke application; and to determine the audio features and lyrics features of the target song based on the in-vehicle voice AI application and the audio data of the target song.
  • the voice line of the AI singing voice of the in-vehicle voice AI singing data is the same as the voice line currently set by the in-vehicle voice AI application.
  • the device may also include a voice adjustment unit (not shown) for adjusting the voice voice currently set by the in-vehicle voice AI application based on the voice preference of the target user.
  • a voice adjustment unit (not shown) for adjusting the voice voice currently set by the in-vehicle voice AI application based on the voice preference of the target user.
  • the generation unit 22 is also used to determine the target user's singing preference based on the target user's historical karaoke data; and adjust the in-vehicle voice AI singing data according to the singing preference.
  • the device may also include a transmission unit (not shown) for transmitting the in-vehicle voice AI singing data to a karaoke application based on the Android interface definition language AIDL.
  • a transmission unit (not shown) for transmitting the in-vehicle voice AI singing data to a karaoke application based on the Android interface definition language AIDL.
  • the device may further include an atmosphere adjustment unit (not shown) for adjusting the atmosphere of the device.
  • the ambient atmosphere inside the vehicle is adaptively adjusted based on the voice line of the AI singing voice and/or the audio features of the target song in the in-vehicle voice AI singing data.
  • the present disclosure provides a voice data processing method based on vehicle voice AI, which solves the problem that most vehicle cockpits are currently equipped with mature karaoke software inside vehicles. It can generate effect human voice with reverberation after the user's voice is processed and mixed through the microphone, and then mixed with the song accompaniment to make a singing sound, but it cannot sing with the AI assistant or duet, and can only cut the original singer for chorus, that is, the user cannot sing the user's favorite songs with the AI assistant at the same time, and the interactivity with the AI assistant in entertainment singing is low, which leads to the problem that the product competitiveness of the smart cockpit is not high.
  • the present disclosure extracts audio features and lyrics features of the target song selected by the user, generates the vehicle voice AI singing data of the target song, plays the target song and the vehicle voice AI singing data, and collects the audio data of the target user in real time, and finally mixes and plays the above three audios, thereby achieving the effect of improving the interactivity between the user and the AI assistant and the entertainment and product competitiveness of the smart cockpit in the car.
  • the processor includes a kernel, and the kernel retrieves the corresponding program unit from the memory.
  • One or more kernels can be set, and a voice data processing method based on vehicle voice AI is implemented by adjusting kernel parameters to solve the problem in the prior art that users cannot sing with AI assistants or duets, can only switch to the original song, and users cannot sing their favorite songs with AI assistants at the same time, and the interactivity with AI assistants in entertainment singing is low.
  • An embodiment of the present disclosure provides a storage medium on which a program is stored.
  • a voice data processing method based on in-vehicle voice AI is implemented.
  • An embodiment of the present disclosure provides a processor, which is used to run a program, and when the program is running, a voice data processing method based on vehicle-mounted voice AI is executed.
  • An embodiment of the present disclosure provides an electronic device 30, as shown in FIG3 , the electronic device includes at least one processor 31, and at least one memory 32 and a bus 33 connected to the processor; wherein the processor 31 and the memory 32 communicate with each other through the bus 33; the processor 31 is used to call program instructions in the memory to execute the above-mentioned voice data processing method based on vehicle-mounted voice AI.
  • the electronic device in the present disclosure may be a server, a PC, a PAD, a mobile phone, etc.
  • the present disclosure also provides a computer program product, which, when executed on a data processing device, is suitable for executing a program that initializes the following method steps: obtaining audio features and lyrics features of a target song; generating in-vehicle voice AI singing data of the target song based on the audio features and lyrics features; playing the target song and the in-vehicle voice AI singing data at the same time; and collecting and playing the audio data of the target user in real time.
  • obtaining audio features and lyric features of a target song includes: obtaining the song audio features and lyric features of the target song based on a Karaoke application; or, obtaining audio data of the target song based on a Karaoke application; determining the audio features and lyric features of the target song based on an in-vehicle voice AI application and the audio data of the target song.
  • the above method also includes: the voice line of the AI singing of the in-vehicle voice AI singing data is the same as the voice line currently set by the in-vehicle voice AI application.
  • the method further includes: determining based on the historical karaoke data of the target user Determine the singing preferences of target users; adjust the in-vehicle voice AI singing data based on the singing preferences.
  • the above method also includes: transmitting the in-vehicle voice AI singing data to a karaoke application based on the Android interface definition language AIDL.
  • the above method also includes: adjusting the voice voice currently set by the in-vehicle voice AI application based on the voice preference of the target user.
  • the above method also includes: adaptively adjusting the environmental atmosphere inside the vehicle based on the voice line characteristics of the AI singing data of the in-vehicle voice AI singing and/or the song audio characteristics of the target song.
  • the present disclosure provides a voice data processing method based on vehicle voice AI, which solves the problem that most vehicle cockpits are currently equipped with mature karaoke software inside vehicles. It can generate effect human voice with reverberation after the user's voice is processed and mixed through the microphone, and then mixed with the song accompaniment to make a singing sound, but it cannot sing with the AI assistant or duet, and can only cut the original singer for chorus, that is, the user cannot sing the user's favorite songs with the AI assistant at the same time, and the interactivity with the AI assistant in entertainment singing is low, which leads to the problem that the product competitiveness of the smart cockpit is not high.
  • the present disclosure extracts audio features and lyrics features of the target song selected by the user, generates the vehicle voice AI singing data of the target song, plays the target song and the vehicle voice AI singing data, and collects the audio data of the target user in real time, and finally mixes and plays the above three audios, thereby achieving the effect of improving the interactivity between the user and the AI assistant and the entertainment and product competitiveness of the smart cockpit in the car.
  • the device includes one or more processors (CPU), memory and bus.
  • the device may also include input/output interface, network interface and the like.
  • the memory may include non-permanent memory in a computer-readable medium, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash RAM, including at least one memory chip.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash random access memory
  • the memory is an example of a computer-readable medium.
  • Computer-readable media include permanent and non-permanent, removable and non-removable media that can be implemented by any method or technology to store information.
  • the information can be computer-readable instructions, data structures, program modules or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, read-only disk read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory media such as modulated data signals and carrier waves.
  • the embodiments of the present disclosure may be provided as methods, systems or computer program products. Therefore, the present disclosure may take the form of a complete hardware embodiment, a complete software embodiment or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)

Abstract

本文公开了一种基于车载语音AI的语音数据处理方法及相关设备,上述方法包括:获取目标歌曲的音频特征和歌词特征;基于所述音频特征和所述歌词特征生成所述目标歌曲的车载语音AI歌声数据;同时播放所述目标歌曲和所述车载语音AI歌声数据;实时采集并播放目标用户的音频数据。

Description

基于车载语音AI的语音数据处理方法及相关设备
相关申请的交叉引用
本申请要求于2022年10月28日提交、申请号为202211335986.9的中国专利申请的优先权,其全部内容通过引用合并于此。
技术领域
本公开属于智能语音技术领域,涉及一种基于车载语音AI的语音数据处理方法及相关设备。
背景技术
随着人们生活水平的提高和对美好生活的向往,开车自驾的娱乐需求与日俱增。目前在车辆内部,大部分车载座舱已配置成熟的K歌软件,用户可以利用K歌软件,通过麦克风自己演唱或与其他用户一起演唱。这类K歌软件可以将用户的声音经过效果处理和混合后生成带混响的效果人声,再和歌曲伴奏混合,进而发出歌唱的声音。但用户利用这类K歌软件无法实现和AI助手合唱或对唱,只能切原唱进行演唱,即用户并不能和AI助手同时合唱用户喜欢的歌曲,在娱乐演唱方面与AI助手的交互性较低,从而导致配置该类K歌软件智能座舱的产品竞争力不高。
发明内容
鉴于上述问题,本公开提出了一种基于车载语音AI的语音数据处理方法及相关设备,提升了用户与AI助手的交互性以及车内智能座舱的娱乐性和产品竞争力。
依据本公开的第一方面,提供了一种基于车载语音AI的语音数据处理方法,该方法包括:获取目标歌曲的音频特征和歌词特征;基于音频特征和歌词特征生成目标歌曲的车载语音AI歌声数据;同时播放目标歌曲和车载语音AI歌声数据;以及实时采集并播放目标用户的音频数据。
依据本公开的第二方面,提供了一种基于车载语音AI的语音数据处理装置,包括:获取单元,用于获取目标歌曲的音频特征和歌词特征;生成单元,用于基于歌曲音频特征和歌词特征生成目标歌曲的车载语音AI歌声数据;扬声单元,用于同时播放目标歌曲和车载语音AI歌声数据;以及采播单元,用于实时采集并播放目标用户的音频数据。
依据本公开的第三方面,提供了一种计算机可读存储介质,上述计算机可读存储介质包括存储的程序,上述程序在被处理器执行时促使处理器实现上述的基于车载语音AI的语音数据处理方法。
依据本公开的的第四方面,提供了一种电子设备,包括至少一个处理器、以及与上述处理器连接的至少一个存储器;上述处理器用于调用上述存储器中的程序指令,并执行上述的基于车载语音AI的语音数据处理方法。
上述说明仅是本公开技术方案的概述,为了能够更清楚了解本公开的技术手段,而可依照说明书的内容予以实施,并且为了让本公开的上述和其它目的、特征和优点能够更明显易懂,以下特举本公开的具体实施方式。
附图说明
通过阅读下文实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出本公开的一些实施方式,而并不认为是对本公开的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了依据本公开一些实施例的一种基于车载语音AI的语音数据处理方法的流程示意图;
图2示出了依据本公开一些实施例的一种基于车载语音AI的语音数据处理装置的结构框图;
图3示出了依据本公开一些实施例的一种电子设备的结构框图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
为了解决目前用户利用K歌软件无法实现和AI助手合唱或对唱,只能切原唱进行合唱,进而用户不能和AI助手同时合唱喜欢的歌曲,在娱乐演唱方面与AI助手的交互性较低,从而导致配置该类K歌软件的智能座舱的产品竞争力不高的问题,本公开实施例提供了一种基于车载语音AI的语音数据处理方法,如图1所示,该方法可以包括步骤S101至步骤S104。
在步骤S101,获取目标歌曲的音频特征和歌词特征。
在一些实施例中,上述实际应用场景可以是,当车辆处于启动状态时,用户通过操作安装于车辆的车载信息娱乐系统中的K歌应用,先选择合唱模式,合唱模式下有多首已分类的歌曲,合唱模式可以是对唱模式或跟唱模式;再选定要合唱的歌曲。
可以理解的是,目标歌曲可以是用户选定的合唱歌曲,车载信息娱乐系统可以获取上述选定的合唱歌曲的音频特征和歌词特征。音频特征可以是选定的合唱歌曲的音素信息、声调信息、韵律边界文本信息、音符信息、节拍信息、连音符乐谱信息等。歌词特征可以是选定的合唱歌曲对应的文本数据 中存储的歌词信息,也可以是对选定的合唱歌曲对应的音频数据进行分析后得到的歌词信息。
在步骤S102,基于音频特征和歌词特征生成目标歌曲的车载语音AI歌声数据。
可以理解的是,AI助手为车载信息娱乐系统中配置的一种车载语音AI应用,在实现过程中,可以基于音频特征、歌词特征和车载语音AI应用生成目标歌曲的车载语音AI歌声数据,例如基于音频特征、歌词特征以及AI助手中配置的AI歌声合成方案模型生成车载语音AI歌声数据。
需要说明的是,AI助手可以根据步骤S101所获取的目标歌曲的音素信息、声调信息、韵律边界文本信息,音符信息、节拍信息、连音符乐谱信息和歌词信息,通过配置深度学习神经网络算法的AI歌声合成方案模型生成目标歌曲的车载语音AI歌声数据。在一些实施例中,该AI歌声合成方案模型是使用海量数据进行模型训练后配置的。
以用户在车载信息娱乐系统中选定的合唱歌曲是《好运来》为例,AI助手可以基于《好运来》歌曲的歌词信息,歌词信息中文本对应的音素信息、声调信息、韵律边界文本信息,《好运来》歌曲的伴奏音频的音符信息、节拍信息、连音符乐谱信息,以及AI歌声合成方案模型,生成《好运来》歌曲的车载语音AI歌声数据。
应当理解的是,音素是根据语音的自然属性划分出来的最小语音单位,如果依据音节里的发音动作来分析,则一个发音动作构成一个音素。音素分为元音与辅音两大类,如汉语音节啊(ā)只有一个音素,爱(ài)有两个音素,代(dài)有三个音素。
声调是指语言的音调的变化,是汉语音节中所固有的,有区别意义的声音的高低。声调的音高是相对的,不是绝对;声调的变化是滑动的,而不是像从一个音阶到另一个音阶那样跳跃式地移动。声调的高低通常用五度标记法。
韵律边界对语言表达的自然度以及准确度这两个指标起着重要作用。在人们交流中,语句间停顿的部分即为韵律边界。
在步骤S103,同时播放目标歌曲和车载语音AI歌声数据。
在一些实施例中,可以基于车载信息娱乐系统中的K歌应用将选定的目标歌曲与车载语音AI歌声数据进行音频处理,然后通过车载信息娱乐系统中的扬声设备输出。在实现过程中,可以通过开启原唱或者关闭原唱的方式,对应实现输出带有原唱的目标歌曲或者输出只有伴奏的目标歌曲。
通过同时播放车载语音AI歌声数据和带有原唱的目标歌曲,或者同时播放车载语音AI歌声数据和只有伴奏的目标歌曲,可以为后续用户合唱奠定良好的基础,使整体音频输出体验更佳,并且通过自由开启原唱的方式也可以在后续用户输入音源时实现三种声线同时合唱,提升了演唱方式的丰富性。
在步骤S104,实时采集并播放目标用户的音频数据。
需要说明的是,车载信息娱乐系统可以外接或内置声源采集设备。在目标歌曲和车载语音AI歌声数据播放的同时,实时采集当前使用K歌应用的用户通过声源采集设备输入的声音信息,将声音信息作为目标用户的音频数据,并通过扬声设备播放。当然,还可以将采集到的声音信息通过回声消除技术处理,得到目标用户的音频数据后,通过扬声设备输出该音频数据。
在一些实施例中,用户在使用K歌应用时可以将外置麦克风连接至车载信息娱乐系统的USB接口,选择自己偏好的合唱模式对应的歌曲后,在合理角度和声源采集范围内进行演唱,以输入人声声源。此时,车载信息娱乐系统的扬声器会播放步骤S103中的歌曲伴奏和车载语音AI歌声数据,车载信息娱乐系统对上述扬声器输出的音源,使用声音回声消除技术进行过滤处理,并将用户输入的人声声源与经过滤处理后的音源进行低延迟的二次输出。
可以理解的是,回声消除是通过消除或者移除本地话筒中拾取到的远端的音频信号来阻止远端的声音返回去的一种处理方法,这种音频信号的移除都是通过数字信号处理来完成的。回声消除的基本原理是以扬声器信号与由它产生的多路径回声的相关性为基础,建立远端信号的语音模型,利用它对回声进行估计,并不断修改滤波器的系数,使得估计值更加逼近真实的回声。然后,将回声估计值从话筒的输入信号中减去,从而达到消除回声的目的。
上述方案可以解决目前在车辆内部,基于大部分车载座舱已配置的成熟的K歌软件,用户可以通过麦克风自己演唱或与其他用户一起演唱,这类应用软件可以将用户的声音经过效果处理和混合后生成带混响效果的人声,再和歌曲伴奏混合,进而发出歌唱的声音,但用户利用这类应用软件无法实现和AI助手合唱或对唱,只能切原唱进行合唱,即用户并不能和AI助手同时合唱用户喜欢的歌曲,在娱乐演唱方面与AI助手的交互性较低,从而导致智能座舱的产品竞争力不高的问题。根据本公开的上述方法,用户可以通过在K歌应用上选择合唱的模式,进而选择要合唱的歌曲。车载信息娱乐系统可以对用户选择的目标歌曲提取音频特征和歌词特征,生成目标歌曲的车载语音AI歌声数据,播放目标歌曲和车载语音AI歌声数据,并实时采集目标用户的音频数据,最终混合播放上述三种音频,实现了提升用户与AI助手的交互性以及车内智能座舱的娱乐性和产品竞争力的效果。
在一些实施例中,上述方法在执行时候,步骤S101:获取目标歌曲的音频特征和歌词特征,可以包括步骤S201-A或者S201-B。
在步骤S201-A,基于K歌应用获取目标歌曲的音频特征和歌词特征。
需要说明的是,步骤S101里所提及的目标歌曲的音频特征和歌词特征可以通过K歌应用直接分析得出,在一些实施例中,K歌应用可以直接调用内部已缓存或根据下载得到的目标歌曲的K歌音频文件,并对K歌音频文件进行分析,得到上述音频文件的音频特征和歌词特征。
在步骤S201-B,基于K歌应用获取目标歌曲的音频数据;以及基于车载语音AI应用和目标歌曲的音频数据,确定目标歌曲的音频特征和歌词特征。
需要说明的是,步骤S201-B相对于步骤S201-A的区别在于,步骤S201-B先基于K歌应用将内部已缓存或根据下载得到的目标歌曲的K歌音频文件传输至车载语音AI应用,再由车载语音AI应用(即AI助手)对上述K歌音频文件进行分析,得到目标歌曲的音频特征和歌词特征。
上述实施例设计了两种方式对K歌音频文件进行分析,在实现过程中,可以根据进程的忙碌状态决定由K歌应用对目标歌曲的K歌音频文件进行解析,还是由车载语音AI应用对目标歌曲的K歌音频文件进行解析,如此使得车载信息娱乐系统在使用过程中更为流畅。
可以理解的是,上述方法分为A与B两种,在执行时执行其中任意一种,均可达到获取目标歌曲的歌曲音频特征和歌词特征的目的。
在一些实施例中,上述方法在执行时,还可以包括步骤S301:车载语音AI歌声数据的AI歌声的声线与车载语音AI应用当前设定的语音声线相同。
需要说明的是,车载信息娱乐系统所配置的车载语音AI应用的声线多种多样,可以是男声、女声、方言等,而方言可以是粤语,四川话,东北话等。在日常使用中,用户往往对其设定的车载语音AI应用的声音较为熟悉,有一种熟悉的陪伴感。车载信息娱乐系统可以通过同步声线特征,将车载语音AI歌声数据的AI歌声的声线与车载语音AI应用当前设定的语音声线同步至同一种声线,如此能给用户带来自己在与朝夕相处陪伴的声音合唱的体验感,拉近与用户之间的距离,让AI歌声之于用户而言不再是冰冷的陪伴,增添了整体合唱的协调性,提升了用户演唱体验,同时避免了车载信息娱乐系统输出某些提示音时,由于存在两种不同声线,破坏整体车内娱乐演唱的氛围。
在一些实施例中,可以将车载语音AI应用设定的语音声线设置为温柔女声,并将上述AI歌声的声线设置为同步至车载语音AI当前应用设定的语音声线。设置完成后,扬声器根据AI歌声的声线特征播放的车载语音AI歌声数据是基于温柔女声声线的音频。
在一些实施例中,上述方法在执行时,还可以包括步骤S401至步骤S402。
在步骤S401,基于目标用户的历史K歌数据,确定目标用户的演唱偏好。
需要说明的是,在用户每次演唱的过程中,车载信息娱乐系统根据用户通过声源采集设备输入的音源,记录用户的演唱习惯和情绪起伏,并经由神经网络算法的AI歌声合成方案模型分析后,得到用户的演唱偏好,并保存该演唱偏好至数据库。可以通过设置不同的储存名称对演唱偏好进行个性化分类储存。
例如,用户为驾驶员A、乘客B和宝宝C。车载信息娱乐系统可以提醒用户是否进行分类存储,在执行进行分类存储后,K歌应用再次启动,并且用户通过K歌应用与AI助手合唱时,车载信息娱乐系统可以通过用户的声线特征自动匹配已储存的演唱偏好,以提高演唱体验。
在步骤S402,根据演唱偏好调整车载语音AI歌声数据。
需要说明的是,车载信息娱乐系统会根据步骤S401所确定的演唱偏好,在合唱过程中适配用户的演唱偏好,使整个合唱更为和谐以及动听。上述演唱偏好可以是用户演唱的情绪表达,可以是高亢的、兴奋的、失落的、伤感的等,并根据上述演唱偏好自适应调整车载语音AI歌声数据的AI歌声的情绪表达和音量。
在一些实施例中,数据库中记录的用户A的历史K歌数据多为伤感类型歌曲,演唱偏好多为声音音量较低,声调起伏不大,当用户A使用K歌应用选择合唱模式与AI助手合唱《珊瑚海》时,AI助手则会适应用户A的演唱偏好,将声调、韵律边界等参数调整与用户A类似,以便于更好的完成歌曲的合作演唱。
在一些实施例中,上述方法在执行时,步骤S102:基于音频特征和歌词特征生成目标歌曲的车载语音AI歌声数据,可以包括步骤S501:基于安卓接口定义语言AIDL将车载语音AI歌声数据传输至K歌应用。
可以理解的是,根据步骤S102,在AI助手根据音频特征和歌词特征通过配置深度学习神经网络算法的AI歌声合成方案模型生成目标歌曲的车载语音AI歌声数据后,需要将上述车载语音AI歌声数据传输至K歌应用,进而能与目标歌曲一同混合播出,K歌应用和AI助手在车载信息娱乐系统内分属两个独立的进程程序,故需要将车载语音AI歌声数据进行跨进程传输。
安卓接口定义语言(Android Interface Definition Language,AIDL)是一种基于Android编译的接口定义语言,因为在Android中,不同的应用程序运行在各自独立的进程里,应用程序之间并不能访问对方的内存空间。为了实现进程间的通信,要用到定义局部总线的标准(Peripheral Component Interconnect,PCI)机制。Android支持PCI机制,但需要具备Android可读取的序列化数据,AIDL则是为了描述上述数据而使用的。
在一些实施例中,上述方法在执行时,还可以包括步骤S601:基于目标用户的声线偏好,调整车载语音AI应用当前设定的语音声线。
需要说明的是,不同的用户对车载语音AI应用设定的语音声线的偏好大概率不同,由步骤S301可知,车载语音AI歌声数据的AI歌声的声线可以伴随车载语音AI应用当前设定的语音声线而改变,那么根据用户设定的声线偏好,调整车载语音AI应用的语音声线设置,可以实现同步AI歌声的声线至用户偏好的声线。该声线可以是温柔女声、低沉男声等。
在一些实施例中,用户A在使用K歌应用合唱的过程中,车载信息娱 乐系统会识别用户A为男声或者女声,并且自适应切换AI歌声的性别声线特征。若AI歌声的声线已预先被设置为与车载语音AI应用当前设定的语音声线相同,但车载语音AI应用当前设定的的语音声线与自适应切换AI歌声的性别声线特征不同,此时本步骤的优先级默认大于步骤S301,进行声线设置覆盖,以达到提高合唱体验的目的。
在一些实施例中,上述方法在执行时,还可以包括步骤S701:基于车载语音AI歌声数据的AI歌声的声线和/或目标歌曲的音频特征自适应调整车辆内部的环境氛围。
需要说明的是,车内的灯光和遮光设备可以根据AI歌声的声线和/或目标歌曲的音频特征自适应调整,将场景与选定的目标歌曲所属的氛围感进行更好的融合,使得车内智能座舱不再只是合唱歌曲的载体,而是整体演唱氛围的一部分。
例如,用户通过K歌应用选择歌曲《发如雪》进行与AI助手的合唱时,车载信息娱乐系统控制自动关闭车窗的遮阳装置,自适应调整车辆玻璃的灰度,降低车辆玻璃的透光率以及饱和度,将车内氛围灯伴随调整为冰蓝色,模拟雪天氛围场景,调整扬声设备的低频、中频、高频参数,以达到良好的演唱环境氛围。
需要说明的是,作为对上述图1及相关的多种实施例所示方法的实现,本公开实施例还提供了一种基于车载语音AI的语音数据处理装置,用于对上述图1以及上述多个实施例所示的方法进行实现。该装置实施例与前述方法实施例对应,并且能够对应实现前述方法实施例中的全部内容。如图2所示,该装置可以包括:获取单元21,用于获取目标歌曲的音频特征和歌词特征;生成单元22,用于基于音频特征和歌词特征生成目标歌曲的车载语音AI歌声数据;扬声单元23,用于同时播放目标歌曲和车载语音AI歌声数据;采播单元24,用于实时采集并播放目标用户的音频数据。
在一些实施例中,获取单元21,还用于基于K歌应用获取目标歌曲的音频特征和歌词特征;或,基于K歌应用获取目标歌曲的音频数据;基于车载语音AI应用和目标歌曲的音频数据,确定目标歌曲的音频特征和歌词特征。
在一些实施例中,车载语音AI歌声数据的AI歌声的声线与车载语音AI应用当前设定的语音声线相同。
在一些实施例中,该装置还可以包括声线调整单元(图未示),用于基于目标用户的声线偏好,调整车载语音AI应用当前设定的语音声线。
在一些实施例中,生成单元22,还用于基于目标用户的历史K歌数据,确定目标用户的演唱偏好;根据演唱偏好调整车载语音AI歌声数据。
在一些实施例中,该装置还可以包括传输单元(图未示),用于基于安卓接口定义语言AIDL将车载语音AI歌声数据传输至K歌应用。
在一些实施例中,该装置还可以包括氛围调整单元(图未示),用于基 于车载语音AI歌声数据的AI歌声的声线和/或目标歌曲的音频特征自适应调整车辆内部的环境氛围。
借由上述技术方案,本公开提供了一种基于车载语音AI的语音数据处理方法,解决了目前在车辆内部,大部分车载座舱已配置的成熟的K歌软件,可以使用户人声通过麦克风经过效果处理和混合后生成带混响的效果人声,再和歌曲伴奏混合进而发出歌唱的声音,但无法和AI助手合唱或对唱,只能切原唱进行合唱,即用户并不能和AI助手同时合唱用户喜欢的歌曲,在娱乐演唱方面与AI助手的交互性较低,从而导致智能座舱的产品竞争力不高的问题。本公开通过对用户选择的目标歌曲提取音频特征和歌词特征,生成目标歌曲的车载语音AI歌声数据,播放目标歌曲和车载语音AI歌声数据,并实时采集目标用户的音频数据,最终混合播放上述三种音频,实现了提升用户与AI助手的交互性以及车内智能座舱的娱乐性和产品竞争力的效果。
处理器中包含内核,由内核去存储器中调取相应的程序单元。内核可以设置一个或以上,通过调整内核参数来实现一种基于车载语音AI的语音数据处理方法,以解决现有技术中用户无法和AI助手合唱或对唱,只能切原唱,用户并不能和AI助手同时合唱用户喜欢的歌曲,在娱乐演唱方面与AI助手的交互性较低的问题。
本公开实施例提供了一种存储介质,其上存储有程序,该程序被处理器执行时实现基于车载语音AI的语音数据处理方法。
本公开实施例提供了一种处理器,处理器用于运行程序,程序运行时执行基于车载语音AI的语音数据处理方法。
本公开实施例提供了一种电子设备30,如图3所示,电子设备包括至少一个处理器31、以及与处理器连接的至少一个存储器32、总线33;其中,处理器31、存储器32通过总线33完成相互间的通信;处理器31用于调用存储器中的程序指令,以执行上述的基于车载语音AI的语音数据处理方法。
本公开中的电子设备可以是服务器、PC、PAD、手机等。
本公开还提供了一种计算机程序产品,当在数据处理设备上执行时,适于执行初始化有如下方法步骤的程序:获取目标歌曲的音频特征和歌词特征;基于音频特征和歌词特征生成目标歌曲的车载语音AI歌声数据;同时播放目标歌曲和车载语音AI歌声数据;实时采集并播放目标用户的音频数据。
在一些实施例中,获取目标歌曲的音频特征和歌词特征,包括:基于K歌应用获取目标歌曲的歌曲音频特征和歌词特征;或,基于K歌应用获取目标歌曲的音频数据;基于车载语音AI应用和目标歌曲的音频数据,确定目标歌曲的音频特征和歌词特征。
在一些实施例中,上述方法还包括:车载语音AI歌声数据的AI歌声的声线与车载语音AI应用当前设定的语音声线相同。
在一些实施例中,上述方法还包括:基于目标用户的历史K歌数据,确 定目标用户的演唱偏好;根据演唱偏好调整车载语音AI歌声数据。
在一些实施例中,上述方法还包括:基于安卓接口定义语言AIDL车载语音AI歌声数据传输至K歌应用。
在一些实施例中,上述方法还包括:基于目标用户的声线偏好,调整车载语音AI应用当前设定的语音声线。
在一些实施例中,上述方法还包括:基于车载语音AI歌声数据的AI歌声的声线特征和/或目标歌曲的歌曲音频特征自适应调整车辆内部的环境氛围。
借由上述技术方案,本公开提供了一种基于车载语音AI的语音数据处理方法,解决了目前在车辆内部,大部分车载座舱已配置的成熟的K歌软件,可以使用户人声通过麦克风经过效果处理和混合后生成带混响的效果人声,再和歌曲伴奏混合进而发出歌唱的声音,但无法和AI助手合唱或对唱,只能切原唱进行合唱,即用户并不能和AI助手同时合唱用户喜欢的歌曲,在娱乐演唱方面与AI助手的交互性较低,从而导致智能座舱的产品竞争力不高的问题。本公开通过对用户选择的目标歌曲提取音频特征和歌词特征,生成目标歌曲的车载语音AI歌声数据,播放目标歌曲和车载语音AI歌声数据,并实时采集目标用户的音频数据,最终混合播放上述三种音频,实现了提升用户与AI助手的交互性以及车内智能座舱的娱乐性和产品竞争力的效果。
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
在一个典型的配置中,设备包括一个或多个处理器(CPU)、存储器和总线。设备还可以包括输入/输出接口、网络接口等。
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM),存储器包括至少一个存储芯片。存储器是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器 (CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本公开中的界定,计算机可读介质不包括暂存电脑可读媒体(transitorymedia),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本公开的实施例可提供为方法、系统或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
以上仅为本公开的实施例而已,并不用于限制本公开。对于本领域技术人员来说,本公开可以有各种更改和变化。凡在本公开的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本公开的权利要求范围之内。

Claims (10)

  1. 一种基于车载语音AI的语音数据处理方法,包括:
    获取目标歌曲的音频特征和歌词特征;
    基于所述音频特征和所述歌词特征生成所述目标歌曲的车载语音AI歌声数据;
    同时播放所述目标歌曲和所述车载语音AI歌声数据;以及
    实时采集并播放目标用户的音频数据。
  2. 根据权利要求1所述的方法,其中,所述获取目标歌曲的音频特征和歌词特征,包括:
    基于K歌应用获取所述目标歌曲的所述音频特征和所述歌词特征;或,
    基于所述K歌应用获取所述目标歌曲的音频数据;
    基于车载语音AI应用和所述目标歌曲的音频数据,确定所述目标歌曲的所述音频特征和所述歌词特征。
  3. 根据权利要求2所述的方法,其中,所述车载语音AI歌声数据的AI歌声的声线与所述车载语音AI应用当前设定的语音声线相同。
  4. 根据权利要求2所述的方法,还包括:
    基于所述目标用户的声线偏好,调整所述车载语音AI应用当前设定的语音声线。
  5. 根据权利要求1所述的方法,还包括:
    基于所述目标用户的历史K歌数据,确定所述目标用户的演唱偏好;
    根据所述演唱偏好调整所述车载语音AI歌声数据。
  6. 根据权利要求1所述的方法,,还包括:
    基于安卓接口定义语言AIDL将所述车载语音AI歌声数据传输至K歌应用。
  7. 根据权利要求1所述的方法,还包括:
    基于所述车载语音AI歌声数据的AI歌声的声线和/或所述目标歌曲的所述音频特征自适应调整车辆内部的环境氛围。
  8. 一种基于车载语音AI的语音数据处理装置,包括:
    获取单元,用于获取目标歌曲的音频特征和歌词特征;
    生成单元,用于基于所述音频特征和所述歌词特征生成所述目标歌曲的车载语音AI歌声数据;
    扬声单元,用于同时播放所述目标歌曲和所述车载语音AI歌声数据;以及
    采播单元,用于实时采集并播放目标用户的音频数据。
  9. 一种计算机可读存储介质,包括存储的程序,所述程序在被处理器执 行时促使所述处理器实现如权利要求1至权利要求7中任一项所述的基于车载语音AI的语音数据处理方法。
  10. 一种电子设备,包括至少一个处理器、以及与所述处理器连接的至少一个存储器;所述处理器用于调用所述存储器中的程序指令,并执行如权利要求1至权利要求7中任一项所述的基于车载语音AI的语音数据处理方法。
PCT/CN2023/105292 2022-10-28 2023-06-30 基于车载语音ai的语音数据处理方法及相关设备 WO2024087727A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211335986.9A CN115938340A (zh) 2022-10-28 2022-10-28 基于车载语音ai的语音数据处理方法及相关设备
CN202211335986.9 2022-10-28

Publications (1)

Publication Number Publication Date
WO2024087727A1 true WO2024087727A1 (zh) 2024-05-02

Family

ID=86698290

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/105292 WO2024087727A1 (zh) 2022-10-28 2023-06-30 基于车载语音ai的语音数据处理方法及相关设备

Country Status (2)

Country Link
CN (1) CN115938340A (zh)
WO (1) WO2024087727A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115938340A (zh) * 2022-10-28 2023-04-07 岚图汽车科技有限公司 基于车载语音ai的语音数据处理方法及相关设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5235124A (en) * 1991-04-19 1993-08-10 Pioneer Electronic Corporation Musical accompaniment playing apparatus having phoneme memory for chorus voices
US5518408A (en) * 1993-04-06 1996-05-21 Yamaha Corporation Karaoke apparatus sounding instrumental accompaniment and back chorus
CN107957908A (zh) * 2017-11-20 2018-04-24 深圳创维数字技术有限公司 一种麦克风共享方法、装置、计算机设备及存储介质
CN109003623A (zh) * 2018-08-08 2018-12-14 爱驰汽车有限公司 车载唱歌评分系统、方法、设备及存储介质
CN111754965A (zh) * 2019-03-29 2020-10-09 比亚迪股份有限公司 车载k歌装置、方法和车辆
CN113225716A (zh) * 2021-04-19 2021-08-06 北京塞宾科技有限公司 一种车载k歌实现方法、系统、设备及存储介质
CN115938340A (zh) * 2022-10-28 2023-04-07 岚图汽车科技有限公司 基于车载语音ai的语音数据处理方法及相关设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5235124A (en) * 1991-04-19 1993-08-10 Pioneer Electronic Corporation Musical accompaniment playing apparatus having phoneme memory for chorus voices
US5518408A (en) * 1993-04-06 1996-05-21 Yamaha Corporation Karaoke apparatus sounding instrumental accompaniment and back chorus
CN107957908A (zh) * 2017-11-20 2018-04-24 深圳创维数字技术有限公司 一种麦克风共享方法、装置、计算机设备及存储介质
CN109003623A (zh) * 2018-08-08 2018-12-14 爱驰汽车有限公司 车载唱歌评分系统、方法、设备及存储介质
CN111754965A (zh) * 2019-03-29 2020-10-09 比亚迪股份有限公司 车载k歌装置、方法和车辆
CN113225716A (zh) * 2021-04-19 2021-08-06 北京塞宾科技有限公司 一种车载k歌实现方法、系统、设备及存储介质
CN115938340A (zh) * 2022-10-28 2023-04-07 岚图汽车科技有限公司 基于车载语音ai的语音数据处理方法及相关设备

Also Published As

Publication number Publication date
CN115938340A (zh) 2023-04-07

Similar Documents

Publication Publication Date Title
CN106898340B (zh) 一种歌曲的合成方法及终端
CN101308652B (zh) 一种个性化歌唱语音的合成方法
KR101274961B1 (ko) 클라이언트단말기를 이용한 음악 컨텐츠 제작시스템
ES2561534T3 (es) Mezclador de pistas de audio semántico
JPH06102877A (ja) 音響構成装置
CN110211556B (zh) 音乐文件的处理方法、装置、终端及存储介质
CN108053814B (zh) 一种模拟用户歌声的语音合成系统及方法
CN108766409A (zh) 一种戏曲合成方法、装置和计算机可读存储介质
WO2024087727A1 (zh) 基于车载语音ai的语音数据处理方法及相关设备
Feugère et al. Cantor Digitalis: chironomic parametric synthesis of singing
JP5598516B2 (ja) カラオケ用音声合成システム,及びパラメータ抽出装置
US20200105244A1 (en) Singing voice synthesis method and singing voice synthesis system
WO2007091475A1 (ja) 音声合成装置、音声合成方法及びプログラム
US20230402047A1 (en) Audio processing method and apparatus, electronic device, and computer-readable storage medium
Bonada et al. Singing voice synthesis combining excitation plus resonance and sinusoidal plus residual models
US20070137465A1 (en) Sound synthesis incorporating delay for expression
Dong et al. Loudness and pitch of Kunqu Opera
CN107393556A (zh) 一种实现音频处理的方法及装置
Janer Singing-driven interfaces for sound synthesizers
Loscos Spectral processing of the singing voice.
JP2022065554A (ja) 音声合成方法およびプログラム
Shiliang The research on the singing voice Timbre of the eastern Yugur traditional folk songs
CN113035181A (zh) 语音数据处理方法、设备和系统
CN108922505A (zh) 信息处理方法及装置
Leech-Wilkinson Expressive gestures in Schubert singing on record