WO2024082389A1 - 音乐分轨匹配振动的触觉反馈方法、系统及相关设备 - Google Patents

音乐分轨匹配振动的触觉反馈方法、系统及相关设备 Download PDF

Info

Publication number
WO2024082389A1
WO2024082389A1 PCT/CN2022/136291 CN2022136291W WO2024082389A1 WO 2024082389 A1 WO2024082389 A1 WO 2024082389A1 CN 2022136291 W CN2022136291 W CN 2022136291W WO 2024082389 A1 WO2024082389 A1 WO 2024082389A1
Authority
WO
WIPO (PCT)
Prior art keywords
track
audio data
audio
energy proportion
time
Prior art date
Application number
PCT/CN2022/136291
Other languages
English (en)
French (fr)
Inventor
孟增铀
曹梦雅
裴诗雨
郑亚军
Original Assignee
瑞声开泰声学科技(上海)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 瑞声开泰声学科技(上海)有限公司 filed Critical 瑞声开泰声学科技(上海)有限公司
Publication of WO2024082389A1 publication Critical patent/WO2024082389A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/016Input arrangements with force or tactile feedback as computer generated output to the user
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Definitions

  • the present invention relates to the application field of deep learning technology, and in particular to a tactile feedback method, system and related equipment for music track matching vibration.
  • Music can express the author's joy, sorrow, anger, strength and other emotions through different rhythms, rhymes, and speeds.
  • the tactile feedback technology of vibrations that matches the speed, lightness and stress of the music will give the audience a more real and intense immersive sensory experience.
  • the instrument components contained in different styles of music are different, and different instrument components play different roles in the analysis of the rhythm and rhyme of a piece of music. For example, percussion instruments make it easier for people to capture the rhythm and rhythm of music because of their regular tapping, and can be matched with more accurate vibration feedback.
  • the method of generating vibrations by utilizing the characteristics of music itself is often based on the use of rhythmic instruments such as drum beats to generate corresponding vibrations.
  • this method is not suitable for music with a slower rhythm.
  • the existing technology cannot generate vibrations of corresponding vibration levels by analyzing the strength of different rhythms in the music, and the vibration feedback experience brought to users is relatively limited.
  • the technical problem to be solved by the present invention is to provide a method for producing a vibration output that more accurately matches the rhythm and motion of music.
  • the present invention provides a tactile feedback method for music track matching vibration, the tactile feedback method is based on a deep learning model, and the tactile feedback method comprises the following steps:
  • the matching vibration signal is output as a driving signal of a driver to achieve a tactile feedback effect.
  • the step of calculating the energy proportion corresponding to each of the sub-track audio data in the original audio data is specifically as follows:
  • the energy proportion of the transformed sub-track audio data in the original audio data is calculated.
  • the step of generating a matching vibration signal corresponding to the original audio data according to the time-frequency spectrum is specifically:
  • the time-frequency curve containing the vibration information is output as the matching vibration signal.
  • the sub-track audio data includes at least a first audio track, a second audio track, a third audio track and a fourth audio track having different audio track characteristics.
  • the preset weighting rule is specifically:
  • the energy proportion of the second audio track is the largest: determine whether the energy proportion of the first audio track is the second largest: if the energy proportion of the first audio track is the second largest, take the time-frequency spectrum of the first audio track and the second audio track weighted and output; if the energy proportion of the first audio track is not the second largest, only take the time-frequency spectrum of the second audio track as output;
  • the energy proportion of the second audio track is not the largest: then determine whether the energy proportion of the third audio track is the largest: if the energy proportion of the third audio track is not the largest, take the time-frequency spectrum of the fourth audio track as output; if the energy proportion of the third audio track is the largest, then take the time-frequency spectrum of the third audio track as output.
  • the first audio track is a percussion track
  • the second audio track is a track for other musical instruments
  • the third audio track is a vocal track
  • the fourth audio track is a bass track.
  • the present invention further provides a tactile feedback system for music track matching vibration, comprising:
  • the original audio acquisition module is used to acquire the original audio data
  • a track splitting module used to split the original audio data into tracks using a preset deep learning model to obtain a plurality of split-track audio data
  • a proportion calculation module used to calculate the energy proportion corresponding to each of the sub-track audio data in the original audio data
  • a weight calculation module used to determine the weight of each corresponding track audio data according to the energy proportion
  • a weighted calculation module used to perform weighted calculation on all the sub-track audio data according to a preset weighting rule, obtain a time-frequency spectrum and output it;
  • a matching vibration module used for generating a matching vibration signal corresponding to the original audio data according to the time-frequency spectrum
  • the tactile feedback module is used to output the matching vibration signal as a driving signal of a driver to achieve a tactile feedback effect.
  • the present invention further provides a computer device, comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps in the tactile feedback method for matching vibrations to music tracks as described in any one of the above items.
  • the present invention further provides a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the steps in the tactile feedback method for matching vibrations to music tracks as described in any one of the above are implemented.
  • the tactile feedback method of the present invention music is divided into tracks through a preset deep learning model, different tracks with greatly different characteristics are distinguished, and then their importance in the original audio is determined according to the energy proportion of different tracks.
  • weights of different sizes different tracks are flexibly weighted combined, and audio data is matched with vibrations.
  • a vibration output that more accurately matches the rhythm and rhythm of the audio data is output, so that users can get a better tactile feedback experience.
  • FIG1 is a schematic flow chart of the steps of a tactile feedback method for music track matching vibration provided by an embodiment of the present invention
  • FIG2 is a schematic diagram of the structure of a deep learning model provided by an embodiment of the present invention.
  • FIG3 is a schematic diagram of a preset weighting rule provided by an embodiment of the present invention.
  • FIG4 is a schematic diagram of an audio track after being divided into tracks by a deep learning model provided by an embodiment of the present invention.
  • FIG5 is a time-frequency spectrum comparison diagram of each audio track provided by an embodiment of the present invention.
  • FIG6 is a schematic diagram of a matching vibration signal provided in an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of the structure of a system 200 for generating a tactile feedback effect according to an embodiment of the present invention.
  • FIG8 is a schematic diagram of the structure of a computer device provided in an embodiment of the present invention.
  • FIG. 1 is a schematic flow chart of the steps of a tactile feedback method for music track matching vibration provided by an embodiment of the present invention.
  • the tactile feedback method comprises the following steps:
  • the original audio data obtained by the embodiment of the present invention does not make any specific limitation on the musical form of the data, such as pop music, rock music, symphony music, etc.
  • the method for obtaining the original audio data includes but is not limited to: obtaining the data from existing audio data, or extracting the data in real time through a recorder, video shooting, etc., and converting the data into a separate audio data file.
  • the deep learning model is a neural network model for separating audios of various characteristics in audio data.
  • a structure of the deep learning model for separating the original audio data into tracks is shown in FIG2 .
  • the deep learning model includes an encoding layer composed of multiple encoders, a neural network recursive layer including an LSTM (Long short-term memory) structure, and a decoding layer composed of multiple decoders.
  • LSTM Long short-term memory
  • the sub-track audio data includes at least a first audio track, a second audio track, a third audio track and a fourth audio track having different audio track characteristics.
  • the step of calculating the energy proportion corresponding to each of the sub-track audio data in the original audio data is specifically as follows:
  • the energy proportion of the transformed sub-track audio data in the original audio data is calculated.
  • S5. Perform weighted calculation on all the sub-track audio data according to a preset weighting rule to obtain a time-frequency spectrum and output it.
  • the preset weighting rule is specifically:
  • One of the audio tracks in all the divided-track audio data is used as the audio track used when generating the time-frequency spectrum.
  • the preset weighting rules for the four types of track audio data are as follows:
  • the energy proportion of the second audio track is the largest: determine whether the energy proportion of the first audio track is the second largest: if the energy proportion of the first audio track is the second largest, take the time-frequency spectrum of the first audio track and the second audio track weighted and output; if the energy proportion of the first audio track is not the second largest, only take the time-frequency spectrum of the second audio track as output;
  • the energy proportion of the second audio track is not the largest: then determine whether the energy proportion of the third audio track is the largest: if the energy proportion of the third audio track is not the largest, take the time-frequency spectrum of the fourth audio track as output; if the energy proportion of the third audio track is the largest, then take the time-frequency spectrum of the third audio track as output.
  • the first track is a percussion instrument
  • the second track is a track for other musical instruments
  • the third track is a vocal track
  • the fourth track is a bass track.
  • FIG3 is a schematic diagram of a preset weighting rule provided in an embodiment of the present invention.
  • the bass track is the lower frequency part of the audio.
  • the audio also includes mid-range and high-range. For users, the hearing experience brought by the bass change is stronger than that of the mid-range and high-range parts; percussion instruments and musical instruments are the parts of the audio that emphasize the fast and slow rhythm.
  • percussion instruments are reflected as a regular frequency fluctuation, and instruments other than percussion instruments are often combined with percussion instruments to reflect the type of music; the vocal track is more special in the audio because the human voice does not have regularity, but when the human voice in the music is reflected as vibration, it also has a great impact on the user experience.
  • the number of tracks divided specifically can be flexibly changed.
  • the embodiment of the present invention can use at least one of the sub-track audio data with the largest energy proportion in the audio data as the basic data of the time-frequency spectrum, so that the time-frequency spectrum pays more attention to reflecting the characteristics of the audio data that need to be matched to generate vibration feedback.
  • the step of generating a matching vibration signal corresponding to the original audio data according to the time-frequency spectrum is specifically:
  • the time-frequency curve containing the vibration information is output as the matching vibration signal.
  • the tactile feedback effect needs to be realized by a vibration feedback system having a driver mainly composed of a motor.
  • FIG. 4 is a schematic diagram of the audio tracks after the deep learning model is used to divide the tracks according to the embodiment of the present invention.
  • the audio tracks in FIG. 4 are, from top to bottom, original audio data, bass track, percussion track, other instrument tracks, and vocal track.
  • FIG. 5 For comparison, please refer to the time-frequency spectrum comparison diagram of each audio track shown in FIG. 5 . It can be seen that the energy proportions of the multiple audio data divided from the original audio data are quite different due to their different basic track characteristics.
  • the matching vibration signal generated after weighting according to the preset weighting rule in the embodiment of the present invention according to the different energy proportions is shown in FIG. 6 , wherein the first line is an unprocessed general vibration signal, and the third line is the matching vibration signal generated after weighting according to the embodiment of the present invention.
  • the tactile feedback method of the present invention music is divided into tracks through a preset deep learning model, different tracks with greatly different characteristics are distinguished, and then their importance in the original audio is determined according to the energy proportion of different tracks.
  • weights of different sizes different tracks are flexibly weighted combined, and audio data is matched with vibrations.
  • a vibration output that more accurately matches the rhythm and rhythm of the audio data is output, so that users can get a better tactile feedback experience.
  • the embodiment of the present invention further provides a tactile feedback system for music track matching vibration.
  • FIG. 7 is a structural schematic diagram of a tactile feedback system 200 for music track matching vibration provided by the embodiment of the present invention, which includes:
  • the original audio acquisition module 201 is used to acquire original audio data
  • a track division module 202 is used to divide the original audio data into tracks using a preset deep learning model to obtain a plurality of divided-track audio data;
  • a proportion calculation module 203 is used to calculate the energy proportion of each of the sub-track audio data in the original audio data
  • a weight calculation module 204 is used to determine the weight of each corresponding track audio data according to the energy proportion
  • a weighted calculation module 205 is used to perform weighted calculation on all the sub-track audio data according to a preset weighted rule to obtain a time-frequency spectrum and output it;
  • a matching vibration module 206 configured to generate a matching vibration signal corresponding to the original audio data according to the time-frequency spectrum
  • the tactile feedback module 207 is used to output the matching vibration signal as a driving signal of the driver to achieve a tactile feedback effect.
  • the tactile feedback system 200 for matching vibrations to music tracks provided in an embodiment of the present invention can implement the steps in the tactile feedback method for matching vibrations to music tracks in the above embodiment, and can achieve the same technical effects. Please refer to the description in the above embodiment, which will not be repeated here.
  • the embodiment of the present invention further provides a computer device, as shown in Figure 8, which is a schematic diagram of the structure of the computer device provided by the embodiment of the present invention.
  • the computer device 300 includes: a processor 301, a memory 302, and a computer program stored in the memory 302 and executable on the processor 301.
  • the processor 301 calls the computer program stored in the memory 302, and when executing the computer program, the steps in the tactile feedback method of music track matching vibration in the above embodiment are implemented, including:
  • the matching vibration signal is output as a driving signal of a driver to achieve a tactile feedback effect.
  • the step of calculating the energy proportion corresponding to each of the sub-track audio data in the original audio data is specifically as follows:
  • the energy proportion of the transformed sub-track audio data in the original audio data is calculated.
  • the step of generating a matching vibration signal corresponding to the original audio data according to the time-frequency spectrum is specifically:
  • the time-frequency curve containing the vibration information is output as the matching vibration signal.
  • the sub-track audio data includes at least a first audio track, a second audio track, a third audio track and a fourth audio track having different audio track characteristics.
  • the preset weighting rule is specifically:
  • the energy proportion of the second audio track is the largest: determine whether the energy proportion of the first audio track is the second largest: if the energy proportion of the first audio track is the second largest, take the time-frequency spectrum of the first audio track and the second audio track weighted and output; if the energy proportion of the first audio track is not the second largest, only take the time-frequency spectrum of the second audio track as output;
  • the energy proportion of the second audio track is not the largest: then determine whether the energy proportion of the third audio track is the largest: if the energy proportion of the third audio track is not the largest, take the time-frequency spectrum of the fourth audio track as output; if the energy proportion of the third audio track is the largest, then take the time-frequency spectrum of the third audio track as output.
  • the first audio track is a percussion track
  • the second audio track is a track for other musical instruments
  • the third audio track is a vocal track
  • the fourth audio track is a bass track.
  • the computer device 300 provided in the embodiment of the present invention can implement the steps in the tactile feedback method of music track matching vibration in the above embodiment, and can achieve the same technical effect. Please refer to the description in the above embodiment and will not be repeated here.
  • An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored.
  • a computer program is stored.
  • the various processes and steps in the tactile feedback method for matching vibrations with music tracks provided in an embodiment of the present invention are implemented, and the same technical effect can be achieved. To avoid repetition, it will not be described here.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Auxiliary Devices For Music (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本发明涉及深度学习技术应用领域,尤其涉及一种音乐分轨匹配振动的触觉反馈方法、系统及相关设备,所述方法包括:获取原始音频数据;利用预设的深度学习模型对所述原始音频数据进行分轨,得到多个分轨音频数据;计算每一个所述分轨音频数据在所述原始音频数据中对应的能量占比;根据所述能量占比确定每一个对应的所述分轨音频数据的权重;根据预设加权规则对所有所述分轨音频数据进行加权计算,得到时频谱;根据所述时频谱生成对应所述原始音频数据的匹配振动信号;根据所述匹配振动信号输出触觉反馈效果。与相关技术相比,本发明将音频数据与振动匹配,并输出与音频数据节奏、律动等更加精确匹配的振动反馈,使用户获得更佳的触觉反馈体验。

Description

音乐分轨匹配振动的触觉反馈方法、系统及相关设备 技术领域
本发明涉及深度学习技术应用领域,尤其涉及一种音乐分轨匹配振动的触觉反馈方法、系统及相关设备。
背景技术
音乐通过不同的节奏、韵律、缓急可以表达作者的欢乐、忧愁、愤怒、坚强等不同的情绪,而根据音乐的节奏快慢、轻音重音等匹配的振动的触觉反馈技术,则会给听众更加真实而强烈的沉浸式感官体验。音乐因不同风格类型所包含的乐器成分各不相同,而不同乐器成分对于一首音乐的节奏韵律的分析起到的作用各不相同,如打击乐因为有规律性的敲击会让人更加容易捕获音乐的节奏和律动,进而可以配以更加精准的振动反馈。
相关技术中,利用音乐本身的特性来生成振动的方法中,往往是基于音乐的鼓点等节奏性较强的乐器来产生对应的振动,但是这种方法不适用于节奏较缓慢的音乐,同时,现有技术并不能通过分析音乐中不同节奏的强弱来生成对应振感等级的振动,给用户带来的振动反馈体验比较有限。
因此,有必要提供一种新的触觉反馈方法,以获得与音乐节奏、律动等更加精确匹配的振动输出。
技术问题
本发明要解决的技术问题是提供一种能够生产与音乐节奏、律动等更加精确匹配的振动输出的方法。
技术解决方案
为解决上述技术问题,第一方面,本发明提供了一种音乐分轨匹配振动的触觉反馈方法,所述触觉反馈方法基于深度学习模型,所述触觉反馈方法包括以下步骤:
获取原始音频数据;
利用预设的深度学习模型对所述原始音频数据进行分轨,得到多个分轨音频数据;
计算每一个所述分轨音频数据在所述原始音频数据中对应的能量占比;
根据所述能量占比确定每一个对应的所述分轨音频数据的权重;
根据预设加权规则对所有所述分轨音频数据进行加权计算,得到时频谱并输出;
根据所述时频谱生成对应所述原始音频数据的匹配振动信号;
将所述匹配振动信号输出,作为驱动器的驱动信号以实现触觉反馈效果。
优选的,所述计算每一个所述分轨音频数据在所述原始音频数据中对应的能量占比的步骤,具体为:
对每一个所述分轨音频数据进行短时傅里叶变换处理,得到对应的变换分轨音频数据;
计算所述变换分轨音频数据在所述原始音频数据中的所述能量占比。
优选的,所述根据所述时频谱生成对应所述原始音频数据的匹配振动信号的步骤,具体为:
对所述时频谱进行归一化处理,得到时频曲线;
将所述时频曲线中大于预设频率阈值的部分对应设置振动信息;
将包含所述振动信息的所述时频曲线输出作为所述匹配振动信号。
优选的,所述分轨音频数据至少包括音轨特征各不相同的第一音轨、第二音轨、第三音轨和第四音轨。
优选的,所述预设加权规则具体为:
判断所述第一音轨的所述能量占比是否最大:
若是:
判断所述第二音轨的能量占比是否第二大:若所述第二音轨的所述能量占比是第二大,取所述第一音轨与所述其第二音轨的所述时频谱加权并作为输出;若所述第二音轨的所述能量占比不是第二大,只取所述第一音轨的所述时频谱作为输出;
若否:
判断所述第二音轨的所述能量占比是否最大:
若所述第二音轨的所述能量占比是最大:则判断所述第一音轨的所述能量占比是否第二大:若所述第一音轨的所述能量占比是第二大,取所述第一音轨与所述第二音轨的所述时频谱加权并作为输出;若所述第一音轨的所述能量占比不是第二大,只取所述第二音轨的所述时频谱作为输出;
若所述第二音轨的所述能量占比不是最大:则判断所述第三音轨的所述能量占比是否最大:若所述第三音轨的能量占比不是最大,取所述第四音轨的所述时频谱作为输出;若所述第三音轨的能量占比是最大,则取所述第三音轨的所述时频谱作为输出。
优选的,所述第一音轨为打击乐,所述第二音轨为其他乐器音轨,所述第三音轨为人声音轨,所述第四音轨为低音音轨。
第二方面,本发明还提供了一种音乐分轨匹配振动的触觉反馈系统,包括:
原始音频获取模块,用于获取原始音频数据;
分轨模块,用于利用预设的深度学习模型对所述原始音频数据进行分轨,得到多个分轨音频数据;
占比计算模块,用于计算每一个所述分轨音频数据在所述原始音频数据中对应的能量占比;
权重计算模块,用于根据所述能量占比确定每一个对应的所述分轨音频数据的权重;
加权计算模块,用于根据预设加权规则对所有所述分轨音频数据进行加权计算,得到时频谱并输出;
匹配振动模块,用于根据所述时频谱生成对应所述原始音频数据的匹配振动信号;
触觉反馈模块,用将所述匹配振动信号输出,作为驱动器的驱动信号以实现触觉反馈效果。
第三方面,本发明还提供了一种计算机设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上任意一项所述的音乐分轨匹配振动的触觉反馈方法中的步骤。
第四方面,本发明还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如上任意一项所述的音乐分轨匹配振动的触觉反馈方法中的步骤。
有益效果
与相关技术相比,本发明的触觉反馈方法中,通过预设的深度学习模型,对音乐进行分轨处理,将特点差别较大的不同音轨区分开来,再根据不同音轨的能量占比确定其在原始音频中的重要程度,以设置不同大小的权重,对不同音轨进行灵活加权组合,进行将音频数据与振动匹配,最后输出与音频数据节奏、律动等更加精确匹配的振动输出,使用户获得更佳的触觉反馈体验。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图,其中:
图1是本发明实施例提供的音乐分轨匹配振动的触觉反馈方法的步骤流程示意图;
图2是本发明实施例提供的深度学习模型的结构示意图;
图3是本发明实施例提供的预设加权规则的示意图;
图4是本发明实施例提供的经过深度学习模型分轨后的音轨示意图;
图5是本发明实施例提供的各个音轨的时频谱对比图;
图6是本发明实施例提供的匹配振动信号示意图;
图7是本发明实施例提供的触觉反馈效果的生成系统200的结构示意图;
图8是本发明实施例提供的计算机设备的结构示意图。
本发明的实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
请参阅图1,图1是本发明实施例提供的音乐分轨匹配振动的触觉反馈方法的步骤流程示意图,所述触觉反馈方法包括以下步骤:
S1、获取原始音频数据。
具体的,本发明实施例获取的所述原始音频数据,不对其表现的音乐形式做具体的限定,例如流行乐、摇滚乐、交响乐等,用于获取所述原始音频数据的方法包括但不限于:从现有的音频数据中获取,或者通过录音机、视频拍摄等方式进行实时提取后,将其转化为单独的音频数据文件等方法。
S2、利用预设的深度学习模型对所述原始音频数据进行分轨,得到多个分轨音频数据。
具体的,所述深度学习模型是一种用于将音频数据中的各种不同特性的音频分离的神经网络模型,在本发明实施例中,一种用于对所述原始音频数据进行分轨的所述深度学习模型的结构如图2所示,所述深度学习模型包括多个编码器组成的编码层、包含LSTM(Long short-term memory,长短期记忆)结构的神经网络递归层、以及包括多个解码器组成的解码层,神经网络递归层中,为了提取出不同特性的音频音轨,可以按需要设置不同的LSTM模块。
优选的,所述分轨音频数据至少包括音轨特征各不相同的第一音轨、第二音轨、第三音轨和第四音轨。
S3、计算每一个所述分轨音频数据在所述原始音频数据中对应的能量占比。
优选的,所述计算每一个所述分轨音频数据在所述原始音频数据中对应的能量占比的步骤,具体为:
对每一个所述分轨音频数据进行短时傅里叶变换处理,得到对应的变换分轨音频数据;
计算所述变换分轨音频数据在所述原始音频数据中的所述能量占比。
S4、根据所述能量占比确定每一个对应的所述分轨音频数据的权重。
S5、根据预设加权规则对所有所述分轨音频数据进行加权计算,得到时频谱并输出。
优选的,所述预设加权规则具体为:
将所有所述分轨音频数据中的其中一种音轨作为所述时频谱生成时使用的音轨。
具体的,在一种可能的实施例中,四种所述分轨音频数据,所述预设加权规则具体如下:
判断所述第一音轨的所述能量占比是否最大:
若是:
判断所述第二音轨的能量占比是否第二大:若所述第二音轨的所述能量占比是第二大,取所述第一音轨与所述其第二音轨的所述时频谱加权并作为输出;若所述第二音轨的所述能量占比不是第二大,只取所述第一音轨的所述时频谱作为输出;
若否:
判断所述第二音轨的所述能量占比是否最大:
若所述第二音轨的所述能量占比是最大:则判断所述第一音轨的所述能量占比是否第二大:若所述第一音轨的所述能量占比是第二大,取所述第一音轨与所述第二音轨的所述时频谱加权并作为输出;若所述第一音轨的所述能量占比不是第二大,只取所述第二音轨的所述时频谱作为输出;
若所述第二音轨的所述能量占比不是最大:则判断所述第三音轨的所述能量占比是否最大:若所述第三音轨的能量占比不是最大,取所述第四音轨的所述时频谱作为输出;若所述第三音轨的能量占比是最大,则取所述第三音轨的所述时频谱作为输出。
优选的,所述第一音轨为打击乐,所述第二音轨为其他乐器音轨,所述第三音轨为人声音轨,所述第四音轨为低音音轨,请参照图3,图3是本发明实施例提供的预设加权规则的示意图,低音音轨是音频中频率较低的部分,对应的,音频中还包括中音、高音,对于用户来说,低音变化带来的听感比中音、高音的部分要强烈;打击乐、乐器是音频中着重表达节奏快慢的部分,其中,打击乐体现为一种有规律的频率起伏,而打击以外的乐器往往通过与打击乐结合来体现音乐的类型;人声音轨是音频中较为特别的,因为人声不具有规律性,但是人声在音乐中的体现反馈为振动时,也对用户体验又很大影响。需要说明的是,在本发明实施例中,具体分成的音轨数可以灵活改变。
根据以上的所述预设加权规则,本发明实施例能够以音频数据中能量占比最大的至少一种所述分轨音频数据作为所述时频谱的基础数据,从而使所述时频谱更注重于体现出音频数据中需要对应匹配生成振动反馈的特性。
S6、根据所述时频谱生成对应所述原始音频数据的匹配振动信号。
优选的,所述根据所述时频谱生成对应所述原始音频数据的匹配振动信号的步骤,具体为:
对所述时频谱进行归一化处理,得到时频曲线;
将所述时频曲线中大于预设频率阈值的部分对应设置振动信息;
将包含所述振动信息的所述时频曲线输出作为所述匹配振动信号。
S7、将所述匹配振动信号输出,作为驱动器的驱动信号以实现触觉反馈效果。
在本发明实施例中,所述触觉反馈效果需要具有以马达为主的驱动器的振动反馈系统实现。
示例性的,请参照图4,图4是本发明实施例经过深度学习模型分轨后的音轨示意图,图4中的各音轨从上至下依次为:原始音频数据、低音音轨、打击乐音轨、其他乐器音轨、人声音轨。作为比较,请参照图5所示的各个音轨的时频谱对比图,可以看出,作为从所述原始音频数据分轨得到的多条所述分轨音频数据由于其基本的音轨特征不同,其对应的所述能量占比的差异较大,根据不同的所述能量占比,依据本发明实施例中的所述预设加权规则进行加权后产生的所述匹配振动信号如图6所示,其中,第一行为未经过处理的一般振动信号,第三行为本发明实施例经过加权后产生的所述匹配振动信号。
与相关技术相比,本发明的触觉反馈方法中,通过预设的深度学习模型,对音乐进行分轨处理,将特点差别较大的不同音轨区分开来,再根据不同音轨的能量占比确定其在原始音频中的重要程度,以设置不同大小的权重,对不同音轨进行灵活加权组合,进行将音频数据与振动匹配,最后输出与音频数据节奏、律动等更加精确匹配的振动输出,使用户获得更佳的触觉反馈体验。
本发明实施例还提供了一种音乐分轨匹配振动的触觉反馈系统,请参照图7,图7是本发明实施例提供的音乐分轨匹配振动的触觉反馈系统200的结构示意图,其包括:
原始音频获取模块201,用于获取原始音频数据;
分轨模块202,用于利用预设的深度学习模型对所述原始音频数据进行分轨,得到多个分轨音频数据;
占比计算模块203,用于计算每一个所述分轨音频数据在所述原始音频数据中对应的能量占比;
权重计算模块204,用于根据所述能量占比确定每一个对应的所述分轨音频数据的权重;
加权计算模块205,用于根据预设加权规则对所有所述分轨音频数据进行加权计算,得到时频谱并输出;
匹配振动模块206,用于根据所述时频谱生成对应所述原始音频数据的匹配振动信号;
触觉反馈模块207,用于将所述匹配振动信号输出,作为驱动器的驱动信号以实现触觉反馈效果。
本发明实施例提供的音乐分轨匹配振动的触觉反馈系统200能够实现如上述实施例中的音乐分轨匹配振动的触觉反馈方法中的步骤,且能实现同样的技术效果,参上述实施例中的描述,此处不再赘述。
本发明实施例还提供一种计算机设备,请参图8所示,图8是本发明实施例提供的计算机设备的结构示意图。所述计算机设备300包括:处理器301、存储器302及存储在所述存储器302上并可在所述处理器301上运行的计算机程序。
请结合图1,所述处理器301调用所述存储器302存储的计算机程序,执行所述计算机程序时实现上述实施例中的所述音乐分轨匹配振动的触觉反馈方法中的步骤,包括:
获取原始音频数据;
利用预设的深度学习模型对所述原始音频数据进行分轨,得到多个分轨音频数据;
计算每一个所述分轨音频数据在所述原始音频数据中对应的能量占比;
根据所述能量占比确定每一个对应的所述分轨音频数据的权重;
根据预设加权规则对所有所述分轨音频数据进行加权计算,得到时频谱并输出;
根据所述时频谱生成对应所述原始音频数据的匹配振动信号;
将所述匹配振动信号输出,作为驱动器的驱动信号以实现触觉反馈效果。
优选的,所述计算每一个所述分轨音频数据在所述原始音频数据中对应的能量占比的步骤,具体为:
对每一个所述分轨音频数据进行短时傅里叶变换处理,得到对应的变换分轨音频数据;
计算所述变换分轨音频数据在所述原始音频数据中的所述能量占比。
优选的,所述根据所述时频谱生成对应所述原始音频数据的匹配振动信号的步骤,具体为:
对所述时频谱进行归一化处理,得到时频曲线;
将所述时频曲线中大于预设频率阈值的部分对应设置振动信息;
将包含所述振动信息的所述时频曲线输出作为所述匹配振动信号。
优选的,所述分轨音频数据至少包括音轨特征各不相同的第一音轨、第二音轨、第三音轨和第四音轨。
优选的,所述预设加权规则具体为:
判断所述第一音轨的所述能量占比是否最大:
若是:
判断所述第二音轨的能量占比是否第二大:若所述第二音轨的所述能量占比是第二大,取所述第一音轨与所述其第二音轨的所述时频谱加权并作为输出;若所述第二音轨的所述能量占比不是第二大,只取所述第一音轨的所述时频谱作为输出;
若否:
判断所述第二音轨的所述能量占比是否最大:
若所述第二音轨的所述能量占比是最大:则判断所述第一音轨的所述能量占比是否第二大:若所述第一音轨的所述能量占比是第二大,取所述第一音轨与所述第二音轨的所述时频谱加权并作为输出;若所述第一音轨的所述能量占比不是第二大,只取所述第二音轨的所述时频谱作为输出;
若所述第二音轨的所述能量占比不是最大:则判断所述第三音轨的所述能量占比是否最大:若所述第三音轨的能量占比不是最大,取所述第四音轨的所述时频谱作为输出;若所述第三音轨的能量占比是最大,则取所述第三音轨的所述时频谱作为输出。
优选的,所述第一音轨为打击乐,所述第二音轨为其他乐器音轨,所述第三音轨为人声音轨,所述第四音轨为低音音轨。
本发明实施例提供的计算机设备300能够实现如上述实施例中的音乐分轨匹配振动的触觉反馈方法中的步骤,且能实现同样的技术效果,参上述实施例中的描述,此处不再赘述。
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现本发明实施例提供的音乐分轨匹配振动的触觉反馈方法中的各个过程及步骤,且能实现相同的技术效果,为避免重复,这里不再赘述。
以上所述的仅是本发明的实施方式,在此应当指出,对于本领域的普通技术人员来说,在不脱离本发明创造构思的前提下,还可以做出改进,但这些均属于本发明的保护范围。

Claims (9)

  1. 一种音乐分轨匹配振动的触觉反馈方法,其特征在于,所述触觉反馈方法基于深度学习模型,所述触觉反馈方法包括以下步骤:
    获取原始音频数据;
    利用预设的深度学习模型对所述原始音频数据进行分轨,得到多个分轨音频数据;
    计算每一个所述分轨音频数据在所述原始音频数据中对应的能量占比;
    根据所述能量占比确定每一个对应的所述分轨音频数据的权重;
    根据预设加权规则对所有所述分轨音频数据进行加权计算,得到时频谱并输出;
    根据所述时频谱生成对应所述原始音频数据的匹配振动信号;
    将所述匹配振动信号输出,作为振动器的驱动信号以实现触觉反馈效果。
  2. 根据权利要求1所述的音乐分轨匹配振动的触觉反馈方法,其特征在于,所述计算每一个所述分轨音频数据在所述原始音频数据中对应的能量占比的步骤,具体为:
    对每一个所述分轨音频数据进行短时傅里叶变换处理,得到对应的变换分轨音频数据;
    计算所述变换分轨音频数据在所述原始音频数据中的所述能量占比。
  3. 根据权利要求1所述的音乐分轨匹配振动的触觉反馈方法,其特征在于,所述根据所述时频谱生成对应所述原始音频数据的匹配振动信号的步骤,具体为:
    对所述时频谱进行归一化处理,得到时频曲线;
    将所述时频曲线中大于预设频率阈值的部分对应设置振动信息;
    将包含所述振动信息的所述时频曲线输出作为所述匹配振动信号。
  4. 根据权利要求1所述的音乐分轨匹配振动的触觉反馈方法,其特征在于,所述分轨音频数据至少包括音轨特征各不相同的第一音轨、第二音轨、第三音轨和第四音轨。
  5. 根据权利要求4所述的音乐分轨匹配振动的触觉反馈方法,其特征在于,所述预设加权规则具体为:
    判断所述第一音轨的所述能量占比是否最大:
    若是:
    判断所述第二音轨的能量占比是否第二大:若所述第二音轨的所述能量占比是第二大,取所述第一音轨与所述第二音轨的所述时频谱加权并作为输出;若所述第二音轨的所述能量占比不是第二大,只取所述第一音轨的所述时频谱作为输出;
    若否:
    判断所述第二音轨的所述能量占比是否最大:
    若所述第二音轨的所述能量占比是最大:则判断所述第一音轨的所述能量占比是否第二大:若所述第一音轨的所述能量占比是第二大,取所述第一音轨与所述第二音轨的所述时频谱加权并作为输出;若所述第一音轨的所述能量占比不是第二大,只取所述第二音轨的所述时频谱作为输出;
    若所述第二音轨的所述能量占比不是最大:则判断所述第三音轨的所述能量占比是否最大:若所述第三音轨的能量占比不是最大,取所述第四音轨的所述时频谱作为输出;若所述第三音轨的能量占比是最大,则取所述第三音轨的所述时频谱作为输出。
  6. 根据权利要求4所述的音乐分轨匹配振动的触觉反馈方法,其特征在于,所述第一音轨为打击乐,所述第二音轨为其他乐器音轨,所述第三音轨为人声音轨,所述第四音轨为低音音轨。
  7. 一种音乐分轨匹配振动的触觉反馈系统,其特征在于,包括:
    原始音频获取模块,用于获取原始音频数据;
    分轨模块,用于利用预设的深度学习模型对所述原始音频数据进行分轨,得到多个分轨音频数据;
    占比计算模块,用于计算每一个所述分轨音频数据在所述原始音频数据中对应的能量占比;
    权重计算模块,用于根据所述能量占比确定每一个对应的所述分轨音频数据的权重;
    加权计算模块,用于根据预设加权规则对所有所述分轨音频数据进行加权计算,得到时频谱并输出;
    匹配振动模块,用于根据所述时频谱生成对应所述原始音频数据的匹配振动信号;
    触觉反馈模块,用于将所述匹配振动信号输出,作为驱动器的驱动信号以实现触觉反馈效果。
  8. 一种计算机设备,其特征在于,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至6中任意一项所述的音乐分轨匹配振动的触觉反馈方法中的步骤。
  9. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至6中任意一项所述的音乐分轨匹配振动的触觉反馈方法中的步骤。
PCT/CN2022/136291 2022-10-20 2022-12-02 音乐分轨匹配振动的触觉反馈方法、系统及相关设备 WO2024082389A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211283874.3 2022-10-19
CN202211283874.3A CN116185167A (zh) 2022-10-20 2022-10-20 音乐分轨匹配振动的触觉反馈方法、系统及相关设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/334,340 Continuation US20240134459A1 (en) 2022-10-19 2023-06-12 Haptic feedback method, system and related device for matching split-track music to vibration

Publications (1)

Publication Number Publication Date
WO2024082389A1 true WO2024082389A1 (zh) 2024-04-25

Family

ID=86444849

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/136291 WO2024082389A1 (zh) 2022-10-20 2022-12-02 音乐分轨匹配振动的触觉反馈方法、系统及相关设备

Country Status (2)

Country Link
CN (1) CN116185167A (zh)
WO (1) WO2024082389A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144257A (zh) * 2018-08-22 2019-01-04 音曼(北京)科技有限公司 从歌曲中提取特征并转化为触觉感受的方法
CN109871120A (zh) * 2018-12-31 2019-06-11 瑞声科技(新加坡)有限公司 触觉反馈方法
US20200258357A1 (en) * 2017-08-07 2020-08-13 Sony Corporation Phase computing device, phase computing method, haptic presentation system, and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200258357A1 (en) * 2017-08-07 2020-08-13 Sony Corporation Phase computing device, phase computing method, haptic presentation system, and program
CN109144257A (zh) * 2018-08-22 2019-01-04 音曼(北京)科技有限公司 从歌曲中提取特征并转化为触觉感受的方法
CN109871120A (zh) * 2018-12-31 2019-06-11 瑞声科技(新加坡)有限公司 触觉反馈方法

Also Published As

Publication number Publication date
CN116185167A (zh) 2023-05-30

Similar Documents

Publication Publication Date Title
JP7243052B2 (ja) オーディオ抽出装置、オーディオ再生装置、オーディオ抽出方法、オーディオ再生方法、機械学習方法及びプログラム
Dean Hyperimprovisation: computer-interactive sound improvisation
US20030014215A1 (en) Method for computing sense data and device for computing sense data
Jensenius An action–sound approach to teaching interactive music
CN109410972B (zh) 生成音效参数的方法、装置及存储介质
JPH09222897A (ja) カラオケ採点装置
Danielsen et al. Shaping rhythm: Timing and sound in five groove-based genres
JPH10247099A (ja) 音声信号の符号化方法および音声の記録再生装置
WO2024082389A1 (zh) 音乐分轨匹配振动的触觉反馈方法、系统及相关设备
Trochidis et al. CAMeL: Carnatic percussion music generation using n-gram models
Sarkar et al. Recognition and prediction in a network music performance system for Indian percussion
US20200410982A1 (en) Information processing apparatus and information processing method and computer-readable storage medium
JP7147384B2 (ja) 情報処理方法および情報処理装置
Jaime et al. A new multiformat rhythm game for music tutoring
US20240134459A1 (en) Haptic feedback method, system and related device for matching split-track music to vibration
CN112420006A (zh) 运行模拟乐器组件的方法及装置、存储介质、计算机设备
Zhang et al. Intelligent music accompaniment system based on discrete wavelet transform
EP1265221A1 (en) Automatic music improvisation method and device
JP4220108B2 (ja) 音響信号符号化システム
CN116189636B (zh) 基于电子乐器的伴奏生成方法、装置、设备及存储介质
US11797267B2 (en) Method for playing audio source using user interaction and a music application using the same
US20240112689A1 (en) Synthesizing audio for synchronous communication
Curtz Feature extraction and non-binary bass line classification in a drumbeat generator application
JPH1173199A (ja) 音響信号の符号化方法およびコンピュータ読み取り可能な記録媒体
Aono et al. Development of a session system with acoustic instruments