CN107481735A - Method for converting audio sound production, server and computer readable storage medium - Google Patents

Method for converting audio sound production, server and computer readable storage medium Download PDF

Info

Publication number
CN107481735A
CN107481735A CN201710752085.2A CN201710752085A CN107481735A CN 107481735 A CN107481735 A CN 107481735A CN 201710752085 A CN201710752085 A CN 201710752085A CN 107481735 A CN107481735 A CN 107481735A
Authority
CN
China
Prior art keywords
converted
frequency spectrum
voice data
spectrum information
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710752085.2A
Other languages
Chinese (zh)
Inventor
冯祖学
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
MIGU Music Co Ltd
Original Assignee
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
MIGU Music Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Migu Cultural Technology Co Ltd, China Mobile Communications Group Co Ltd, MIGU Music Co Ltd filed Critical Migu Cultural Technology Co Ltd
Priority to CN201710752085.2A priority Critical patent/CN107481735A/en
Publication of CN107481735A publication Critical patent/CN107481735A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • G10L2021/0135Voice conversion or morphing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

The invention discloses a method for converting audio frequency to sound, which comprises the following steps: acquiring audio data to be converted and a conversion target object of the audio data to be converted, analyzing the audio data to be converted to obtain an analysis result, and determining audio track information of the audio data to be converted according to the analysis result, wherein the audio track information at least comprises the tone of the audio data to be converted; and determining the acoustic frequency spectrum information of the conversion target object in a preset acoustic frequency spectrum information database, converting the audio track information of the audio data to be converted according to the acoustic frequency spectrum information of the conversion target object, and determining the converted audio data. The invention also discloses a converted audio sounding device and a computer readable storage medium.

Description

A kind of method, server and the computer-readable recording medium of transducing audio sounding
Technical field
The present invention relates to audio signal processing technique, more particularly to a kind of method, server and the calculating of transducing audio sounding Machine readable storage medium storing program for executing.
Background technology
In existing music APP, although the function of providing is increasingly abundanter, these functions are mainly both for sound Function in terms of happy APP unmusical broadcasting, such as music social functions and music consumption function etc., and it is directed to traditional sound Happy broadcasting field, the function that music APP can be provided still are mainly the function in terms of tuning, such as tune, rhythm etc. Regulation, it is evident that the main purpose of this kind of work(is preferably to listen song to experience in order to bring user one, and to be used similar Disc-jockey functionality be also required to user have certain music general knowledge with basis, thus existing music APP can be provided it is this kind of Audient's scope of disc-jockey functionality is smaller.Therefore, overall, the function that existing music APP is provided is in recreational side Face still shows slightly deficiency, and especially the basic function in music APP --- in terms of music, existing music APP is broadcast in music Put the recreational more inadequate of the function that aspect is provided.
In daily life, each user often has one or several singers oneself liked, likes for oneself Singer, user not only likes the song that these singers oneself sing, it may be desirable that the singer oneself liked can sing Other songs oneself liked.It is therefore, existing also without the method that can change song artist from currently available technology The functions that are provided of music APP can not meet user's use demand.
The content of the invention
In view of this, the embodiment of the present invention it is expected to provide a kind of method, server and the computer of transducing audio sounding Readable storage medium storing program for executing, the singer that the audio file Central Plains singer of selection is revised as oneself liking can be sung, with Improve interesting and Consumer's Experience.
To reach above-mentioned purpose, the embodiments of the invention provide a kind of method of transducing audio sounding:
The switch target object of voice data to be converted and the voice data to be converted is obtained, to described to be converted Voice data is parsed, and obtains analysis result, and the track of the voice data to be converted is determined according to the analysis result Information, wherein, the track information comprises at least the tone color of the voice data to be converted;
The acoustical frequency spectrum information of the switch target object is determined in default acoustical frequency spectrum information database, according to The acoustical frequency spectrum information of the switch target object is changed to the track information of the voice data to be converted, determines to turn Voice data after changing.
Wherein, before above-mentioned acquisition voice data to be converted and switch target object, methods described also includes:
The acoustical frequency spectrum information of at least one switch target object is obtained, the acoustical frequency spectrum of the switch target object is believed Cease and be associated with the identification information of the switch target object, generate acoustical frequency spectrum information database.
Wherein, the acoustical frequency spectrum information of at least one object of above-mentioned acquisition, including:
The sound of object is acquired, digital-to-analogue conversion is carried out to the object sound of acquisition, obtains the object Digital audio-frequency data, the object is parsed according to the digital audio-frequency data, obtain the acoustical frequency spectrum letter of the object Breath, wherein, the acoustical frequency spectrum information of the object comprises at least the syllable spectrum information of object pronunciation.
Wherein, track of the above-mentioned acoustical frequency spectrum information according to the switch target object to the voice data to be converted Information is changed, including:
According to the tone color of the acoustical frequency spectrum information sound intermediate frequency feature of the target converting objects to the audio number to be converted According to track information in syllable tone color carry out tuning.
The embodiments of the invention provide a kind of transducing audio sound-producing device, it is characterised in that described device includes:
Parsing module, for obtaining the switch target object of voice data to be converted and the voice data to be converted, The voice data to be converted is parsed, obtains analysis result, is determined according to the analysis result described to be converted The track information of voice data, wherein, the track information comprises at least the tone color of the voice data to be converted;;
Modular converter, for determining the acoustics of the switch target object in default acoustical frequency spectrum information database Spectrum information, the track information of the voice data to be converted is carried out according to the acoustical frequency spectrum information of the switch target object Conversion, determine the voice data after conversion.
Wherein, said apparatus also includes:
Generation module, for obtaining the acoustical frequency spectrum information of at least one switch target object, by the switch target pair The acoustical frequency spectrum information of elephant and the identification information of the switch target object are associated, and generate acoustical frequency spectrum information database.
Wherein, above-mentioned generation module, is specifically used for:
The sound of object is acquired, digital-to-analogue conversion is carried out to the object sound of acquisition, obtains the object Digital audio-frequency data, the object is parsed according to the digital audio-frequency data, obtain the acoustical frequency spectrum letter of the object Breath, wherein, the acoustical frequency spectrum information of the object comprises at least the syllable spectrum information of object pronunciation.
Wherein, above-mentioned modular converter, is specifically used for:
According to the tone color of the acoustical frequency spectrum information sound intermediate frequency feature of the target converting objects to the audio number to be converted According to track information in syllable tone color carry out tuning.
The embodiments of the invention provide a kind of server, it is characterised in that including:Processor and for store can locate The memory of the computer program run on reason device,
Wherein, when the processor is used to run the computer program, perform:
The switch target object of voice data to be converted and the voice data to be converted is obtained, to described to be converted Voice data is parsed, and obtains analysis result, and the track of the voice data to be converted is determined according to the analysis result Information, wherein, the track information comprises at least the tone color of the voice data to be converted;In default acoustical frequency spectrum Information Number According to the acoustical frequency spectrum information that the switch target object is determined in storehouse, according to the acoustical frequency spectrum information of the switch target object The track information of the voice data to be converted is changed, determines the voice data after conversion.
The embodiments of the invention provide a kind of computer-readable recording medium, computer program is stored thereon with, its feature It is, the computer program is realized when being executed by processor:
The switch target object of voice data to be converted and the voice data to be converted is obtained, to described to be converted Voice data is parsed, and obtains analysis result, and the track of the voice data to be converted is determined according to the analysis result Information, wherein, the track information comprises at least the tone color of the voice data to be converted;In default acoustical frequency spectrum Information Number According to the acoustical frequency spectrum information that the switch target object is determined in storehouse, according to the acoustical frequency spectrum information of the switch target object The track information of the voice data to be converted is changed, determines the voice data after conversion.
Method, server and the computer-readable storage medium of a kind of transducing audio sounding provided in an embodiment of the present invention, are obtained The switch target object of voice data to be converted and the voice data to be converted is taken, the voice data to be converted is entered Row parsing, obtains analysis result, the track information of the voice data to be converted is determined according to the analysis result, wherein, The track information comprises at least the tone color of the voice data to be converted;Determined in default acoustical frequency spectrum information database Go out the acoustical frequency spectrum information of the switch target object, wait to turn to described according to the acoustical frequency spectrum information of the switch target object The track information for changing voice data is changed, and determines the voice data after conversion.In this way, the voice data of selection is carried out Parsing, obtain the track information of the audio, the track according to the acoustical frequency spectrum information of the converting objects of setting to the audio Information is changed, and is obtained the voice data for possessing converting objects audio frequency characteristics, is improved the recreational of music APP, give simultaneously User brings more preferable usage experience.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of transducing audio vocal technique of the embodiment of the present invention;
Fig. 2 is the structural representation of transducing audio sound-producing device of the embodiment of the present invention;
Fig. 3 is first embodiment of the invention schematic flow sheet.
Embodiment
In order to more detailed the characteristics of understanding the embodiment of the present invention and technology contents, below to the embodiment of the present invention Realization be described in detail.
Fig. 1 is the schematic flow sheet of transducing audio vocal technique of the embodiment of the present invention, as shown in figure 1, the embodiment of the present invention The audio conversion method of offer comprises the following steps:
Step 101:The switch target object of voice data to be converted and the voice data to be converted is obtained, to described Voice data to be converted is parsed, and obtains analysis result, and the audio number to be converted is determined according to the analysis result According to track information;
Wherein, the track information comprises at least the tone color of the voice data to be converted.
Step 102:The acoustical frequency spectrum of the switch target object is determined in default acoustical frequency spectrum information database Information, the track information of the voice data to be converted is turned according to the acoustical frequency spectrum information of the switch target object Change, determine the voice data after conversion.
Wherein, according to the tone color of the acoustical frequency spectrum information sound intermediate frequency feature of the target converting objects to the sound to be converted The tone color of syllable carries out tuning in the track information of frequency evidence.
In actual applications, the acoustical frequency spectrum information according to the switch target object is to the audio number to be converted According to track information changed, determine conversion after voice data, can also be accomplished by the following way:
The switch target object of voice data to be converted and the voice data to be converted is obtained, to described to be converted Voice data is parsed, and obtains analysis result, is determined according to the analysis result each in the voice data to be converted Pronunciation syllable corresponding to individual text message and the text message;
Hair of the switch target object to the text message is determined in default acoustical frequency spectrum information database Message ceases and the spectrum information of the pronunciation information, according to the character order of the text message to the pronunciation determined The spectrum information of information is arranged and audio conversion, determines the voice data after turning.
Before the step 101, audio conversion method provided in an embodiment of the present invention is further comprising the steps of:
The acoustical frequency spectrum information of at least one switch target object is obtained, the acoustical frequency spectrum of the switch target object is believed Cease and be associated with the identification information of the switch target object, generate acoustical frequency spectrum information database;
Wherein, the sound of object is acquired, digital-to-analogue conversion is carried out to the object sound of acquisition, it is described right to obtain The digital audio-frequency data of elephant, the object is parsed according to the digital audio-frequency data, obtain the acoustics frequency of the object Spectrum information, wherein, the acoustical frequency spectrum information of the object comprises at least the syllable spectrum information of object pronunciation.
Fig. 2 is the composition structural representation of transducing audio sound-producing device of the embodiment of the present invention, it is characterised in that the audio Conversion equipment includes:
Parsing module 201, for obtaining the switch target pair of voice data to be converted and the voice data to be converted As being parsed to the voice data to be converted, obtaining analysis result, wait to turn according to being determined the analysis result The track information of voice data is changed, wherein, the track information comprises at least the tone color of the voice data to be converted;
Modular converter 202, for determining the switch target object in default acoustical frequency spectrum information database Acoustical frequency spectrum information, according to the acoustical frequency spectrum information of the switch target object to the track information of the voice data to be converted Changed, determine the voice data after conversion.
Wherein, above-mentioned parsing module 201, is specifically used for:
After being parsed to the voice data to be converted, at least one voice data sound to be converted is determined The audio frequency characteristics of section, wherein, the audio frequency characteristics include loudness, tone, the tone color of the voice data syllable;
The audio frequency characteristics of the voice data syllable to be converted determined are synthesized, obtained described to be converted The track information of voice data.
Wherein, said apparatus also includes:
Generation module 203, for obtaining the acoustical frequency spectrum information of at least one switch target object, by the switch target The acoustical frequency spectrum information of object and the identification information of the switch target object are associated, and generate acoustical frequency spectrum information data Storehouse.
Wherein, above-mentioned generation module 203, is specifically used for:
The sound of object is acquired, digital-to-analogue conversion is carried out to the object sound of acquisition, obtains the object Digital audio-frequency data, the object is parsed according to the digital audio-frequency data, obtain the acoustical frequency spectrum letter of the object Breath, wherein, the acoustical frequency spectrum information of the object comprises at least the syllable spectrum information of object pronunciation.
Wherein, above-mentioned modular converter 202, is specifically used for:
According to the tone color of the acoustical frequency spectrum information sound intermediate frequency feature of the target converting objects to the audio number to be converted According to track information in syllable tone color carry out tuning.
The embodiments of the invention provide a kind of server, it is characterised in that including:Processor and for store can locate The memory of the computer program run on reason device,
Wherein, when the processor is used to run the computer program, perform:
The switch target object of voice data to be converted and the voice data to be converted is obtained, to described to be converted Voice data is parsed, and obtains analysis result, and the track of the voice data to be converted is determined according to the analysis result Information, wherein, the track information comprises at least the tone color of the voice data to be converted;In default acoustical frequency spectrum Information Number According to the acoustical frequency spectrum information that the switch target object is determined in storehouse, according to the acoustical frequency spectrum information of the switch target object The track information of the voice data to be converted is changed, determines the voice data after conversion.
Wherein, the above-mentioned track information that the voice data to be converted is determined according to the analysis result, including:
After being parsed to the voice data to be converted, at least one voice data sound to be converted is determined The audio frequency characteristics of section, wherein, the audio frequency characteristics include loudness, tone, the tone color of the voice data syllable;
The audio frequency characteristics of the voice data syllable to be converted determined are synthesized, obtained described to be converted The track information of voice data.
Wherein, before above-mentioned acquisition voice data to be converted and converting objects, methods described also includes:
The acoustical frequency spectrum information of at least one switch target object is obtained, the acoustical frequency spectrum of the switch target object is believed Cease and be associated with the identification information of the switch target object, generate acoustical frequency spectrum information database.
Wherein, the acoustical frequency spectrum information of at least one object of above-mentioned acquisition, including:
The sound of object is acquired, digital-to-analogue conversion is carried out to the object sound of acquisition, obtains the object Digital audio-frequency data, the object is parsed according to the digital audio-frequency data, obtain the acoustical frequency spectrum letter of the object Breath, wherein, the acoustical frequency spectrum information of the object comprises at least the syllable spectrum information of object pronunciation.
Wherein, the above-mentioned acoustical frequency spectrum according to the converting objects is carried out to the track information of the voice data to be converted Conversion, including:
According to the tone color of the acoustical frequency spectrum information sound intermediate frequency feature of the target converting objects to the audio number to be converted According to track information in syllable tone color carry out tuning.
The embodiments of the invention provide a kind of computer-readable recording medium, computer program is stored thereon with, its feature It is, the computer program is realized when being executed by processor:
The switch target object of voice data to be converted and the voice data to be converted is obtained, to described to be converted Voice data is parsed, and obtains analysis result, and the track of the voice data to be converted is determined according to the analysis result Information, wherein, the track information comprises at least the tone color of the voice data to be converted;In default acoustical frequency spectrum Information Number According to the acoustical frequency spectrum information that the switch target object is determined in storehouse, according to the acoustical frequency spectrum information of the switch target object The track information of the voice data to be converted is changed, determines the voice data after conversion.
Wherein, the above-mentioned track information that the voice data to be converted is determined according to the analysis result, including:
After being parsed to the voice data to be converted, at least one voice data sound to be converted is determined The audio frequency characteristics of section, wherein, the audio frequency characteristics include loudness, tone, the tone color of the voice data syllable;
The audio frequency characteristics of the voice data syllable to be converted determined are synthesized, obtained described to be converted The track information of voice data.
Wherein, before above-mentioned acquisition voice data to be converted and converting objects, methods described also includes:
The acoustical frequency spectrum information of at least one switch target object is obtained, the acoustical frequency spectrum of the switch target object is believed Cease and be associated with the identification information of the switch target object, generate acoustical frequency spectrum information database.
Wherein, the acoustical frequency spectrum information of at least one object of above-mentioned acquisition, including:
The sound of object is acquired, digital-to-analogue conversion is carried out to the object sound of acquisition, obtains the object Digital audio-frequency data, the object is parsed according to the digital audio-frequency data, obtain the acoustical frequency spectrum letter of the object Breath, wherein, the acoustical frequency spectrum information of the object comprises at least the syllable spectrum information of object pronunciation.
Wherein, the above-mentioned acoustical frequency spectrum according to the converting objects is carried out to the track information of the voice data to be converted Conversion, including:
According to the tone color of the acoustical frequency spectrum information sound intermediate frequency feature of the target converting objects to the audio number to be converted According to track information in syllable tone color carry out tuning.
Above-mentioned generation module 203 can be by any kind of volatibility or non-volatile memory device or their group Close to realize.Wherein, nonvolatile memory can be read-only storage (ROM, Read Only Memory), it is programmable read-only Memory (PROM, Programmable Read-Only Memory), Erasable Programmable Read Only Memory EPROM (EPROM, Erasable Programmable Read-Only Memory), Electrically Erasable Read Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), magnetic RAM (FRAM, Ferromagnetic Random Access Memory), flash memory (Flash Memory), magnetic surface storage, light Disk or read-only optical disc (CD-ROM, Compact Disc Read-Only Memory);Magnetic surface storage can be that disk is deposited Reservoir or magnetic tape storage.Volatile memory can be random access memory (RAM, Random Access Memory), It is used as External Cache.By exemplary but be not restricted explanation, the RAM of many forms can use, such as static random Access memory (SRAM, Static Random Access Memory), synchronous static RAM (SSRAM, Synchronous Static Random Access Memory), dynamic random access memory (DRAM, Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, Synchronous Dynamic Random Access Memory), double data speed synchronous dynamic RAM (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), enhanced Synchronous Dynamic Random Access Memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), synchronized links dynamic random are deposited Access to memory (SLDRAM, SyncLink Dynamic Random Access Memory), direct rambus arbitrary access are deposited Reservoir (DRRAM, Direct Rambus Random Access Memory).The generation module 203 of description of the embodiment of the present invention It is intended to the memory of including but not limited to these and any other suitable type.
In the exemplary embodiment, the server can by one or more application specific integrated circuits (ASIC, Application Specific Integrated Circuit), DSP, PLD (PLD, Programmable Logic Device), CPLD (CPLD, Complex Programmable Logic Device), scene Programmable gate array (FPGA, Field-Programmable Gate Array), general processor, controller, microcontroller (MCU, Micro Controller Unit), microprocessor (Microprocessor) or other electronic components are realized, are used for Perform preceding method.
The method of transducing audio sounding of the embodiment of the present invention is entered below by exemplified by music APP change song original singers Row is expanded on further.
Embodiment one
First embodiment of the invention provides a kind of concrete methods of realizing of transducing audio vocal technique, as shown in figure 3, institute The method of stating comprises the following steps:
Step 301:Acoustical frequency spectrum information is gathered, establishes acoustical frequency spectrum information bank;
In actual applications, because sound is a kind of sound wave with certain frequency of oscillation, and sound wave has frequency of oscillation, shaken The physical parameters such as width, waveform or characteristic, it is exactly these different parameters and characteristic, just causes sound there are a variety of sense of hearings Effect.If divided according to the characteristic voice of various musical instruments, there is the different performance shape of four kinds of tone, volume, tone color and figure etc. Formula, is exactly these different forms, the characteristics of just determining various different musical instrument sounds.Wherein, tone is shaken with electromagnetic wave Swing a kind of form of frequency dependence, and frequency is directly proportional, frequency is high, then tone is just high, and frequency is low, then tone is also low;Volume is A kind of form related to the oscillation amplitude of electromagnetic wave, its size is directly proportional to the amplitude of electromagnetic wave, and amplitude is big, volume with regard to big, Amplitude is small, and volume is also small.Compare from us intuitively for auditory effect, tone height, then the sound hair tip sent, thin, tone Low, the sound sent seems simple and honest.
And tone color then refers to the sense quality of sound, i.e. people hear the auditory effect of sound, the sound difference of different people Exactly distinguished by tone color.Equally it is soprano, even if they sing the sound of same first song, Li Guyi and Song Zuying, Ting Zhongyi Can is listened accurately to distinguish, here it is the effect of tone color.And tone color is determined by the waveform of above-mentioned electromagnetic wave Fixed.The waveform of standard electromagnetic wave is sine wave, such as the alternating current that our conventional days, and its waveform is exactly a kind of sine of standard Ripple.But the sound of people, the sound of various musical instruments, and a variety of sound in nature, its waveform be often it is a kind of compared with For the shape of complexity, exactly these waveforms of different shapes, the tone colors of alternative sounds is just determined.The tone color of sound is except can To represent (time-domain representation that waveform is sound) by waveform outside, it can also represent that (frequency spectrum is the frequency of sound by sound spectrum Domain representation), by carrying out Fourier transformation to a bit of waveform of sound, you can obtain the sound audio corresponding to this section of waveform Spectrum.
Because the sound of same tone color may have a variety of different waveforms, but the frequency spectrum of the sound of same tone color is past Past is identical, thus the Main Basiss being had different timbres usually using sound spectrum as differentiation alternative sounds.
In the embodiment of the present invention, in order to realize the effect for imitating different people sound, the present invention needs to treat mould in advance Imitative people carries out sound collection, and the acoustical frequency spectrum information of the user is extracted from the voice data collected, specifically, music APP can gather the acoustic information of singer in advance, and the acoustical frequency spectrum letter of those singers is extracted from the audio-frequency information collected Breath;Or active user can also utilize the voice input device of terminal, for example, microphone, the sound of typing oneself, and lead to Cross music APP the sound of oneself uploads onto the server, to cause server to extract the acoustical frequency spectrum information of the user.
In actual applications, when gathering acoustical frequency spectrum information, it is only necessary to gather 20 basic acoustical frequency spectrums, pass through This 20 basic acoustical frequency spectrums can be combined into more than 400 kinds of acoustical frequency spectrum combination entirely, so as to pass through this more than 400 kinds of acoustics frequencies Spectrum is combined to simulate the sound of the user.
, can be by those acoustical frequency spectrum information and user after the acoustical frequency spectrum information of user or other singers is collected Or the names associate of singer is saved in the acoustical frequency spectrum information bank of server.
Step 302:Audio parsing is carried out to user's selection, artist to be changed song;
In actual applications, sound is recorded or regenerated by analog machine, turns into analogue audio frequency, then is digitized into turning into number Word tone frequency, the song that we are usually heard by music APP, is exactly a kind of DAB.Audio mentioned here parses Using digital audio and video signals as analysis object, using Digital Signal Processing as parsing means, extraction signal is a series of in time domain, frequency domain The process of characteristic.
Audio parsing is main to make use of Fourier transform and signal sampling technology realizes.Fourier transform is to carry out frequency spectrum The basis of analysis, the spectrum analysis of signal refer to the frequency structure by signal, ask for amplitude, phase of its component etc. by frequency point Cloth rule, various " spectrums " using frequency as transverse axis are established, such as amplitude spectrum, phase spectrum.
Audio parsing is carried out by the song selected user in the embodiment of the present invention, the correlation of the song can be obtained Audio frequency parameter, such as track, loudness, tone, waveform etc..Wherein, every track both defines the attribute of this track, such as should Tone color attribute of bar track etc..Because tone color may decide that user hears the difference of sound, therefore can be by the song Parse obtained track to modify, to change the effect of the sound of the first song artist.
Step 303:The acoustical frequency spectrum information corresponding to the singer of user's selection is determined, and is believed according to the acoustical frequency spectrum of determination Breath, modifies to the track information of the song of artist to be changed;
In actual applications, the acoustical frequency spectrum information according to corresponding to the singer that user selects, to by performing step 302 The track information of the song of the artist to be changed obtained is modified, and by way of changing track, changes head songs The tone color of bent singer, the sound of the singer of the song is converted into user by the sound of original singer so as to reach The effect of the sound of selected singer.
Embodiment two
Below to the method for transducing audio sounding of the embodiment of the present invention so that the singer to specific song changes as an example It is illustrated:
Active user wishes to hear sings song of the original singer as pottery Zhe using Sun Yanzi sound《Love is very simple》, it is first First, music APP is to song《Love is very simple》Audio file carry out audio parsing, obtain the track of the song;Secondly, from clothes Sun Yanzi acoustical frequency spectrum information is found in business device, is sung in antiphonal style by the acoustical frequency spectrum information according to the singer Sun Yanzi found It is bent《Love is very simple》Pronunciation track modify, it is final to obtain the song sung with Sun Yanzi sound《Love is very simple》, lead to Crossing aforesaid way can reach song《Love is very simple》The effect of Sun Yanzi sound is converted into by the sound of pottery Zhe.
The process modified in actual applications to track is time-consuming very short, and a 10M or so song is modified The spent time is probably in 15s to 30s or so, thus song is changed in the arrival that the method provided by the present invention can be quickly The effect of artist sound.
Method, server and the computer-readable storage medium of a kind of transducing audio sounding provided in an embodiment of the present invention, lead to Cross and obtain voice data to be converted and converting objects, the voice data to be converted is parsed, generate analysis result, The track information of the voice data to be converted is determined according to the analysis result;According to the converting objects in default sound The acoustical frequency spectrum information that the converting objects is determined in spectrum information database is learned, according to the acoustical frequency spectrum of the converting objects Information is changed to the track information of the voice data to be converted, determines the voice data after conversion.In this way, it is based on mesh The use demand of preceding user and it is caused, a song liked, one singer oneself liked of reselection can be directed to, and lead to The voice for crossing this singer sings the song, so as to reach the effect that other songs are sung with the voice of the singer liked, carries High music APP's is recreational, while brings more preferable usage experience to user.
It should be noted that the foregoing is only a preferred embodiment of the present invention, it is not intended to limit the present invention's Protection domain.

Claims (10)

  1. A kind of 1. method of transducing audio sounding, it is characterised in that methods described includes:
    The switch target object of voice data to be converted and the voice data to be converted is obtained, to the audio to be converted Data are parsed, and obtain analysis result, and the track information of the voice data to be converted is determined according to the analysis result, Wherein, the track information comprises at least the tone color of the voice data to be converted;
    The acoustical frequency spectrum information of the switch target object is determined in default acoustical frequency spectrum information database, according to described The acoustical frequency spectrum information of switch target object is changed to the track information of the voice data to be converted, after determining conversion Voice data.
  2. 2. according to the method for claim 1, it is characterised in that described to obtain voice data and switch target pair to be converted As before, methods described also includes:
    Obtain the acoustical frequency spectrum information of at least one switch target object, by the acoustical frequency spectrum information of the switch target object with The identification information of the switch target object is associated, and generates acoustical frequency spectrum information database.
  3. 3. according to the method for claim 2, it is characterised in that the acoustical frequency spectrum information for obtaining at least one object, Including:
    The sound of object is acquired, digital-to-analogue conversion is carried out to the object sound of acquisition, obtains the numeral of the object Voice data, the object is parsed according to the digital audio-frequency data, obtains the acoustical frequency spectrum information of the object, its In, the acoustical frequency spectrum information of the object comprises at least the syllable spectrum information of object pronunciation.
  4. 4. according to the method for claim 1, it is characterised in that the acoustical frequency spectrum according to the switch target object is believed Cease and the track information of the voice data to be converted is changed, including:
    According to the tone color of the acoustical frequency spectrum information sound intermediate frequency feature of the target converting objects to the voice data to be converted The tone color of syllable carries out tuning in track information.
  5. 5. a kind of transducing audio sound-producing device, it is characterised in that described device includes:
    Parsing module, for obtaining the switch target object of voice data to be converted and the voice data to be converted, to institute State voice data to be converted to be parsed, obtain analysis result, the audio to be converted is determined according to the analysis result The track information of data, wherein, the track information comprises at least the tone color of the voice data to be converted;
    Modular converter, for determining the acoustical frequency spectrum of the switch target object in default acoustical frequency spectrum information database Information, the track information of the voice data to be converted is turned according to the acoustical frequency spectrum information of the switch target object Change, determine the voice data after conversion.
  6. 6. device according to claim 5, it is characterised in that described device also includes:
    Generation module, for obtaining the acoustical frequency spectrum information of at least one switch target object, by the switch target object Acoustical frequency spectrum information and the identification information of the switch target object are associated, and generate acoustical frequency spectrum information database.
  7. 7. device according to claim 6, it is characterised in that the generation module, be specifically used for:
    The sound of object is acquired, digital-to-analogue conversion is carried out to the object sound of acquisition, obtains the numeral of the object Voice data, the object is parsed according to the digital audio-frequency data, obtains the acoustical frequency spectrum information of the object, its In, the acoustical frequency spectrum information of the object comprises at least the syllable spectrum information of object pronunciation.
  8. 8. device according to claim 6, it is characterised in that the modular converter, be specifically used for:
    According to the tone color of the acoustical frequency spectrum information sound intermediate frequency feature of the target converting objects to the voice data to be converted The tone color of syllable carries out tuning in track information.
  9. A kind of 9. server, it is characterised in that including:Processor and the computer journey that can be run on a processor for storage The memory of sequence,
    Wherein, when the processor is used to run the computer program, perform claim requires the step of 1 to 4 any methods described Suddenly.
  10. 10. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the computer program quilt The step of Claims 1-4 any methods described is realized during computing device.
CN201710752085.2A 2017-08-28 2017-08-28 Method for converting audio sound production, server and computer readable storage medium Pending CN107481735A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710752085.2A CN107481735A (en) 2017-08-28 2017-08-28 Method for converting audio sound production, server and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710752085.2A CN107481735A (en) 2017-08-28 2017-08-28 Method for converting audio sound production, server and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN107481735A true CN107481735A (en) 2017-12-15

Family

ID=60602945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710752085.2A Pending CN107481735A (en) 2017-08-28 2017-08-28 Method for converting audio sound production, server and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN107481735A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108364658A (en) * 2018-03-21 2018-08-03 冯键能 Cyberchat method and server-side
CN109243477A (en) * 2018-10-17 2019-01-18 杭州兆华电子有限公司 A kind of audio repeating box
CN109348274A (en) * 2018-09-12 2019-02-15 咪咕音乐有限公司 Live broadcast interaction method and device and storage medium
CN110062267A (en) * 2019-05-05 2019-07-26 广州虎牙信息科技有限公司 Live data processing method, device, electronic equipment and readable storage medium storing program for executing
CN110162660A (en) * 2019-05-28 2019-08-23 维沃移动通信有限公司 Audio-frequency processing method, device, mobile terminal and storage medium
CN110170170A (en) * 2019-05-30 2019-08-27 维沃移动通信有限公司 A kind of information display method and terminal device
CN110505496A (en) * 2018-05-16 2019-11-26 腾讯科技(深圳)有限公司 Live-broadcast control method and device, storage medium and electronic device
TWI685835B (en) * 2018-10-26 2020-02-21 財團法人資訊工業策進會 Audio playback device and audio playback method thereof
WO2021128256A1 (en) * 2019-12-27 2021-07-01 深圳市优必选科技股份有限公司 Voice conversion method, apparatus and device, and storage medium
CN113259701A (en) * 2021-05-18 2021-08-13 游艺星际(北京)科技有限公司 Method and device for generating personalized timbre and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101359473A (en) * 2007-07-30 2009-02-04 国际商业机器公司 Auto speech conversion method and apparatus
CN102881283A (en) * 2011-07-13 2013-01-16 三星电子(中国)研发中心 Method and system for processing voice
CN102982809A (en) * 2012-12-11 2013-03-20 中国科学技术大学 Conversion method for sound of speaker
CN103295574A (en) * 2012-03-02 2013-09-11 盛乐信息技术(上海)有限公司 Singing voice conversion device and method thereof
JP5545935B2 (en) * 2009-09-04 2014-07-09 国立大学法人 和歌山大学 Voice conversion device and voice conversion method
US20150025892A1 (en) * 2012-03-06 2015-01-22 Agency For Science, Technology And Research Method and system for template-based personalized singing synthesis
CN105390141A (en) * 2015-10-14 2016-03-09 科大讯飞股份有限公司 Sound conversion method and sound conversion device
CN106205623A (en) * 2016-06-17 2016-12-07 福建星网视易信息系统有限公司 A kind of sound converting method and device
CN107093421A (en) * 2017-04-20 2017-08-25 深圳易方数码科技股份有限公司 A kind of speech simulation method and apparatus

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101359473A (en) * 2007-07-30 2009-02-04 国际商业机器公司 Auto speech conversion method and apparatus
JP5545935B2 (en) * 2009-09-04 2014-07-09 国立大学法人 和歌山大学 Voice conversion device and voice conversion method
CN102881283A (en) * 2011-07-13 2013-01-16 三星电子(中国)研发中心 Method and system for processing voice
CN103295574A (en) * 2012-03-02 2013-09-11 盛乐信息技术(上海)有限公司 Singing voice conversion device and method thereof
US20150025892A1 (en) * 2012-03-06 2015-01-22 Agency For Science, Technology And Research Method and system for template-based personalized singing synthesis
CN102982809A (en) * 2012-12-11 2013-03-20 中国科学技术大学 Conversion method for sound of speaker
CN105390141A (en) * 2015-10-14 2016-03-09 科大讯飞股份有限公司 Sound conversion method and sound conversion device
CN106205623A (en) * 2016-06-17 2016-12-07 福建星网视易信息系统有限公司 A kind of sound converting method and device
CN107093421A (en) * 2017-04-20 2017-08-25 深圳易方数码科技股份有限公司 A kind of speech simulation method and apparatus

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108364658A (en) * 2018-03-21 2018-08-03 冯键能 Cyberchat method and server-side
CN110505496A (en) * 2018-05-16 2019-11-26 腾讯科技(深圳)有限公司 Live-broadcast control method and device, storage medium and electronic device
CN109348274A (en) * 2018-09-12 2019-02-15 咪咕音乐有限公司 Live broadcast interaction method and device and storage medium
CN109243477A (en) * 2018-10-17 2019-01-18 杭州兆华电子有限公司 A kind of audio repeating box
TWI685835B (en) * 2018-10-26 2020-02-21 財團法人資訊工業策進會 Audio playback device and audio playback method thereof
US11049490B2 (en) 2018-10-26 2021-06-29 Institute For Information Industry Audio playback device and audio playback method thereof for adjusting text to speech of a target character using spectral features
CN110062267A (en) * 2019-05-05 2019-07-26 广州虎牙信息科技有限公司 Live data processing method, device, electronic equipment and readable storage medium storing program for executing
CN110162660A (en) * 2019-05-28 2019-08-23 维沃移动通信有限公司 Audio-frequency processing method, device, mobile terminal and storage medium
CN110170170A (en) * 2019-05-30 2019-08-27 维沃移动通信有限公司 A kind of information display method and terminal device
WO2021128256A1 (en) * 2019-12-27 2021-07-01 深圳市优必选科技股份有限公司 Voice conversion method, apparatus and device, and storage medium
CN113259701A (en) * 2021-05-18 2021-08-13 游艺星际(北京)科技有限公司 Method and device for generating personalized timbre and electronic equipment
CN113259701B (en) * 2021-05-18 2023-01-20 游艺星际(北京)科技有限公司 Method and device for generating personalized timbre and electronic equipment

Similar Documents

Publication Publication Date Title
CN107481735A (en) Method for converting audio sound production, server and computer readable storage medium
EP3675122B1 (en) Text-to-speech from media content item snippets
CN107123415B (en) Automatic song editing method and system
CN101996627B (en) Speech processing apparatus, speech processing method and program
CN112382257B (en) Audio processing method, device, equipment and medium
CN102881283B (en) Method and system for processing voice
US11521585B2 (en) Method of combining audio signals
CN112289300B (en) Audio processing method and device, electronic equipment and computer readable storage medium
Zagorski-Thomas The musicology of record production
JP5598516B2 (en) Voice synthesis system for karaoke and parameter extraction device
Schneider Perception of timbre and sound color
Davies Works of Music: Approaches to the Ontology of Music from Analytic Philosophy
CN112669811B (en) Song processing method and device, electronic equipment and readable storage medium
CN105895079A (en) Voice data processing method and device
CN108922505A (en) Information processing method and device
CN103425901A (en) Original sound data organizer
CN101370216B (en) Emotional processing and playing method for mobile phone audio files
Einbond Subtractive Synthesis: noise and digital (un) creativity
KR20090023912A (en) Music data processing system
Wang et al. Soundscape: in the view of music
Ornoy et al. Analysis of contemporary violin recordings of 19th century repertoire: Identifying trends and impacts
James et al. Representations of Decay in the Works of Cat Hope
CN107704534A (en) A kind of audio conversion method and device
O’Callaghan Mediated Mimesis: Transcription as Processing
Cushing Three solitudes and a DJ: A mashed-up study of counterpoint in a digital realm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171215