CN107481735A

CN107481735A - Method for converting audio sound production, server and computer readable storage medium

Info

Publication number: CN107481735A
Application number: CN201710752085.2A
Authority: CN
Inventors: 冯祖学
Original assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd; MIGU Music Co Ltd
Current assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd; MIGU Music Co Ltd
Priority date: 2017-08-28
Filing date: 2017-08-28
Publication date: 2017-12-15

Abstract

The invention discloses a method for converting audio frequency to sound, which comprises the following steps: acquiring audio data to be converted and a conversion target object of the audio data to be converted, analyzing the audio data to be converted to obtain an analysis result, and determining audio track information of the audio data to be converted according to the analysis result, wherein the audio track information at least comprises the tone of the audio data to be converted; and determining the acoustic frequency spectrum information of the conversion target object in a preset acoustic frequency spectrum information database, converting the audio track information of the audio data to be converted according to the acoustic frequency spectrum information of the conversion target object, and determining the converted audio data. The invention also discloses a converted audio sounding device and a computer readable storage medium.

Description

A kind of method, server and the computer-readable recording medium of transducing audio sounding

Technical field

The present invention relates to audio signal processing technique, more particularly to a kind of method, server and the calculating of transducing audio sounding Machine readable storage medium storing program for executing.

Background technology

In existing music APP, although the function of providing is increasingly abundanter, these functions are mainly both for sound Function in terms of happy APP unmusical broadcasting, such as music social functions and music consumption function etc., and it is directed to traditional sound Happy broadcasting field, the function that music APP can be provided still are mainly the function in terms of tuning, such as tune, rhythm etc. Regulation, it is evident that the main purpose of this kind of work(is preferably to listen song to experience in order to bring user one, and to be used similar Disc-jockey functionality be also required to user have certain music general knowledge with basis, thus existing music APP can be provided it is this kind of Audient's scope of disc-jockey functionality is smaller.Therefore, overall, the function that existing music APP is provided is in recreational side Face still shows slightly deficiency, and especially the basic function in music APP --- in terms of music, existing music APP is broadcast in music Put the recreational more inadequate of the function that aspect is provided.

In daily life, each user often has one or several singers oneself liked, likes for oneself Singer, user not only likes the song that these singers oneself sing, it may be desirable that the singer oneself liked can sing Other songs oneself liked.It is therefore, existing also without the method that can change song artist from currently available technology The functions that are provided of music APP can not meet user's use demand.

The content of the invention

In view of this, the embodiment of the present invention it is expected to provide a kind of method, server and the computer of transducing audio sounding Readable storage medium storing program for executing, the singer that the audio file Central Plains singer of selection is revised as oneself liking can be sung, with Improve interesting and Consumer's Experience.

To reach above-mentioned purpose, the embodiments of the invention provide a kind of method of transducing audio sounding：

The switch target object of voice data to be converted and the voice data to be converted is obtained, to described to be converted Voice data is parsed, and obtains analysis result, and the track of the voice data to be converted is determined according to the analysis result Information, wherein, the track information comprises at least the tone color of the voice data to be converted；

The acoustical frequency spectrum information of the switch target object is determined in default acoustical frequency spectrum information database, according to The acoustical frequency spectrum information of the switch target object is changed to the track information of the voice data to be converted, determines to turn Voice data after changing.

Wherein, before above-mentioned acquisition voice data to be converted and switch target object, methods described also includes：

The acoustical frequency spectrum information of at least one switch target object is obtained, the acoustical frequency spectrum of the switch target object is believed Cease and be associated with the identification information of the switch target object, generate acoustical frequency spectrum information database.

Wherein, the acoustical frequency spectrum information of at least one object of above-mentioned acquisition, including：

The sound of object is acquired, digital-to-analogue conversion is carried out to the object sound of acquisition, obtains the object Digital audio-frequency data, the object is parsed according to the digital audio-frequency data, obtain the acoustical frequency spectrum letter of the object Breath, wherein, the acoustical frequency spectrum information of the object comprises at least the syllable spectrum information of object pronunciation.

Wherein, track of the above-mentioned acoustical frequency spectrum information according to the switch target object to the voice data to be converted Information is changed, including：

According to the tone color of the acoustical frequency spectrum information sound intermediate frequency feature of the target converting objects to the audio number to be converted According to track information in syllable tone color carry out tuning.

The embodiments of the invention provide a kind of transducing audio sound-producing device, it is characterised in that described device includes：

Parsing module, for obtaining the switch target object of voice data to be converted and the voice data to be converted, The voice data to be converted is parsed, obtains analysis result, is determined according to the analysis result described to be converted The track information of voice data, wherein, the track information comprises at least the tone color of the voice data to be converted；；

Modular converter, for determining the acoustics of the switch target object in default acoustical frequency spectrum information database Spectrum information, the track information of the voice data to be converted is carried out according to the acoustical frequency spectrum information of the switch target object Conversion, determine the voice data after conversion.

Wherein, said apparatus also includes：

Generation module, for obtaining the acoustical frequency spectrum information of at least one switch target object, by the switch target pair The acoustical frequency spectrum information of elephant and the identification information of the switch target object are associated, and generate acoustical frequency spectrum information database.

Wherein, above-mentioned generation module, is specifically used for：

Wherein, above-mentioned modular converter, is specifically used for：

The embodiments of the invention provide a kind of server, it is characterised in that including：Processor and for store can locate The memory of the computer program run on reason device,

Wherein, when the processor is used to run the computer program, perform：

The switch target object of voice data to be converted and the voice data to be converted is obtained, to described to be converted Voice data is parsed, and obtains analysis result, and the track of the voice data to be converted is determined according to the analysis result Information, wherein, the track information comprises at least the tone color of the voice data to be converted；In default acoustical frequency spectrum Information Number According to the acoustical frequency spectrum information that the switch target object is determined in storehouse, according to the acoustical frequency spectrum information of the switch target object The track information of the voice data to be converted is changed, determines the voice data after conversion.

The embodiments of the invention provide a kind of computer-readable recording medium, computer program is stored thereon with, its feature It is, the computer program is realized when being executed by processor：

Method, server and the computer-readable storage medium of a kind of transducing audio sounding provided in an embodiment of the present invention, are obtained The switch target object of voice data to be converted and the voice data to be converted is taken, the voice data to be converted is entered Row parsing, obtains analysis result, the track information of the voice data to be converted is determined according to the analysis result, wherein, The track information comprises at least the tone color of the voice data to be converted；Determined in default acoustical frequency spectrum information database Go out the acoustical frequency spectrum information of the switch target object, wait to turn to described according to the acoustical frequency spectrum information of the switch target object The track information for changing voice data is changed, and determines the voice data after conversion.In this way, the voice data of selection is carried out Parsing, obtain the track information of the audio, the track according to the acoustical frequency spectrum information of the converting objects of setting to the audio Information is changed, and is obtained the voice data for possessing converting objects audio frequency characteristics, is improved the recreational of music APP, give simultaneously User brings more preferable usage experience.

Brief description of the drawings

Fig. 1 is the schematic flow sheet of transducing audio vocal technique of the embodiment of the present invention；

Fig. 2 is the structural representation of transducing audio sound-producing device of the embodiment of the present invention；

Fig. 3 is first embodiment of the invention schematic flow sheet.

Embodiment

In order to more detailed the characteristics of understanding the embodiment of the present invention and technology contents, below to the embodiment of the present invention Realization be described in detail.

Fig. 1 is the schematic flow sheet of transducing audio vocal technique of the embodiment of the present invention, as shown in figure 1, the embodiment of the present invention The audio conversion method of offer comprises the following steps：

Step 101：The switch target object of voice data to be converted and the voice data to be converted is obtained, to described Voice data to be converted is parsed, and obtains analysis result, and the audio number to be converted is determined according to the analysis result According to track information；

Wherein, the track information comprises at least the tone color of the voice data to be converted.

Step 102：The acoustical frequency spectrum of the switch target object is determined in default acoustical frequency spectrum information database Information, the track information of the voice data to be converted is turned according to the acoustical frequency spectrum information of the switch target object Change, determine the voice data after conversion.

Wherein, according to the tone color of the acoustical frequency spectrum information sound intermediate frequency feature of the target converting objects to the sound to be converted The tone color of syllable carries out tuning in the track information of frequency evidence.

In actual applications, the acoustical frequency spectrum information according to the switch target object is to the audio number to be converted According to track information changed, determine conversion after voice data, can also be accomplished by the following way：

The switch target object of voice data to be converted and the voice data to be converted is obtained, to described to be converted Voice data is parsed, and obtains analysis result, is determined according to the analysis result each in the voice data to be converted Pronunciation syllable corresponding to individual text message and the text message；

Hair of the switch target object to the text message is determined in default acoustical frequency spectrum information database Message ceases and the spectrum information of the pronunciation information, according to the character order of the text message to the pronunciation determined The spectrum information of information is arranged and audio conversion, determines the voice data after turning.

Before the step 101, audio conversion method provided in an embodiment of the present invention is further comprising the steps of：

The acoustical frequency spectrum information of at least one switch target object is obtained, the acoustical frequency spectrum of the switch target object is believed Cease and be associated with the identification information of the switch target object, generate acoustical frequency spectrum information database；

Wherein, the sound of object is acquired, digital-to-analogue conversion is carried out to the object sound of acquisition, it is described right to obtain The digital audio-frequency data of elephant, the object is parsed according to the digital audio-frequency data, obtain the acoustics frequency of the object Spectrum information, wherein, the acoustical frequency spectrum information of the object comprises at least the syllable spectrum information of object pronunciation.

Fig. 2 is the composition structural representation of transducing audio sound-producing device of the embodiment of the present invention, it is characterised in that the audio Conversion equipment includes：

Parsing module 201, for obtaining the switch target pair of voice data to be converted and the voice data to be converted As being parsed to the voice data to be converted, obtaining analysis result, wait to turn according to being determined the analysis result The track information of voice data is changed, wherein, the track information comprises at least the tone color of the voice data to be converted；

Modular converter 202, for determining the switch target object in default acoustical frequency spectrum information database Acoustical frequency spectrum information, according to the acoustical frequency spectrum information of the switch target object to the track information of the voice data to be converted Changed, determine the voice data after conversion.

Wherein, above-mentioned parsing module 201, is specifically used for：

After being parsed to the voice data to be converted, at least one voice data sound to be converted is determined The audio frequency characteristics of section, wherein, the audio frequency characteristics include loudness, tone, the tone color of the voice data syllable；

The audio frequency characteristics of the voice data syllable to be converted determined are synthesized, obtained described to be converted The track information of voice data.

Wherein, said apparatus also includes：

Generation module 203, for obtaining the acoustical frequency spectrum information of at least one switch target object, by the switch target The acoustical frequency spectrum information of object and the identification information of the switch target object are associated, and generate acoustical frequency spectrum information data Storehouse.

Wherein, above-mentioned generation module 203, is specifically used for：

Wherein, above-mentioned modular converter 202, is specifically used for：

Wherein, when the processor is used to run the computer program, perform：

Wherein, the above-mentioned track information that the voice data to be converted is determined according to the analysis result, including：

Wherein, before above-mentioned acquisition voice data to be converted and converting objects, methods described also includes：

Wherein, the above-mentioned acoustical frequency spectrum according to the converting objects is carried out to the track information of the voice data to be converted Conversion, including：

Above-mentioned generation module 203 can be by any kind of volatibility or non-volatile memory device or their group Close to realize.Wherein, nonvolatile memory can be read-only storage (ROM, Read Only Memory), it is programmable read-only Memory (PROM, Programmable Read-Only Memory), Erasable Programmable Read Only Memory EPROM (EPROM, Erasable Programmable Read-Only Memory), Electrically Erasable Read Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), magnetic RAM (FRAM, Ferromagnetic Random Access Memory), flash memory (Flash Memory), magnetic surface storage, light Disk or read-only optical disc (CD-ROM, Compact Disc Read-Only Memory)；Magnetic surface storage can be that disk is deposited Reservoir or magnetic tape storage.Volatile memory can be random access memory (RAM, Random Access Memory), It is used as External Cache.By exemplary but be not restricted explanation, the RAM of many forms can use, such as static random Access memory (SRAM, Static Random Access Memory), synchronous static RAM (SSRAM, Synchronous Static Random Access Memory), dynamic random access memory (DRAM, Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, Synchronous Dynamic Random Access Memory), double data speed synchronous dynamic RAM (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), enhanced Synchronous Dynamic Random Access Memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), synchronized links dynamic random are deposited Access to memory (SLDRAM, SyncLink Dynamic Random Access Memory), direct rambus arbitrary access are deposited Reservoir (DRRAM, Direct Rambus Random Access Memory).The generation module 203 of description of the embodiment of the present invention It is intended to the memory of including but not limited to these and any other suitable type.

In the exemplary embodiment, the server can by one or more application specific integrated circuits (ASIC, Application Specific Integrated Circuit), DSP, PLD (PLD, Programmable Logic Device), CPLD (CPLD, Complex Programmable Logic Device), scene Programmable gate array (FPGA, Field-Programmable Gate Array), general processor, controller, microcontroller (MCU, Micro Controller Unit), microprocessor (Microprocessor) or other electronic components are realized, are used for Perform preceding method.

The method of transducing audio sounding of the embodiment of the present invention is entered below by exemplified by music APP change song original singers Row is expanded on further.

Embodiment one

First embodiment of the invention provides a kind of concrete methods of realizing of transducing audio vocal technique, as shown in figure 3, institute The method of stating comprises the following steps：

Step 301：Acoustical frequency spectrum information is gathered, establishes acoustical frequency spectrum information bank；

In actual applications, because sound is a kind of sound wave with certain frequency of oscillation, and sound wave has frequency of oscillation, shaken The physical parameters such as width, waveform or characteristic, it is exactly these different parameters and characteristic, just causes sound there are a variety of sense of hearings Effect.If divided according to the characteristic voice of various musical instruments, there is the different performance shape of four kinds of tone, volume, tone color and figure etc. Formula, is exactly these different forms, the characteristics of just determining various different musical instrument sounds.Wherein, tone is shaken with electromagnetic wave Swing a kind of form of frequency dependence, and frequency is directly proportional, frequency is high, then tone is just high, and frequency is low, then tone is also low；Volume is A kind of form related to the oscillation amplitude of electromagnetic wave, its size is directly proportional to the amplitude of electromagnetic wave, and amplitude is big, volume with regard to big, Amplitude is small, and volume is also small.Compare from us intuitively for auditory effect, tone height, then the sound hair tip sent, thin, tone Low, the sound sent seems simple and honest.

And tone color then refers to the sense quality of sound, i.e. people hear the auditory effect of sound, the sound difference of different people Exactly distinguished by tone color.Equally it is soprano, even if they sing the sound of same first song, Li Guyi and Song Zuying, Ting Zhongyi Can is listened accurately to distinguish, here it is the effect of tone color.And tone color is determined by the waveform of above-mentioned electromagnetic wave Fixed.The waveform of standard electromagnetic wave is sine wave, such as the alternating current that our conventional days, and its waveform is exactly a kind of sine of standard Ripple.But the sound of people, the sound of various musical instruments, and a variety of sound in nature, its waveform be often it is a kind of compared with For the shape of complexity, exactly these waveforms of different shapes, the tone colors of alternative sounds is just determined.The tone color of sound is except can To represent (time-domain representation that waveform is sound) by waveform outside, it can also represent that (frequency spectrum is the frequency of sound by sound spectrum Domain representation), by carrying out Fourier transformation to a bit of waveform of sound, you can obtain the sound audio corresponding to this section of waveform Spectrum.

Because the sound of same tone color may have a variety of different waveforms, but the frequency spectrum of the sound of same tone color is past Past is identical, thus the Main Basiss being had different timbres usually using sound spectrum as differentiation alternative sounds.

In the embodiment of the present invention, in order to realize the effect for imitating different people sound, the present invention needs to treat mould in advance Imitative people carries out sound collection, and the acoustical frequency spectrum information of the user is extracted from the voice data collected, specifically, music APP can gather the acoustic information of singer in advance, and the acoustical frequency spectrum letter of those singers is extracted from the audio-frequency information collected Breath；Or active user can also utilize the voice input device of terminal, for example, microphone, the sound of typing oneself, and lead to Cross music APP the sound of oneself uploads onto the server, to cause server to extract the acoustical frequency spectrum information of the user.

In actual applications, when gathering acoustical frequency spectrum information, it is only necessary to gather 20 basic acoustical frequency spectrums, pass through This 20 basic acoustical frequency spectrums can be combined into more than 400 kinds of acoustical frequency spectrum combination entirely, so as to pass through this more than 400 kinds of acoustics frequencies Spectrum is combined to simulate the sound of the user.

, can be by those acoustical frequency spectrum information and user after the acoustical frequency spectrum information of user or other singers is collected Or the names associate of singer is saved in the acoustical frequency spectrum information bank of server.

Step 302：Audio parsing is carried out to user's selection, artist to be changed song；

In actual applications, sound is recorded or regenerated by analog machine, turns into analogue audio frequency, then is digitized into turning into number Word tone frequency, the song that we are usually heard by music APP, is exactly a kind of DAB.Audio mentioned here parses Using digital audio and video signals as analysis object, using Digital Signal Processing as parsing means, extraction signal is a series of in time domain, frequency domain The process of characteristic.

Audio parsing is main to make use of Fourier transform and signal sampling technology realizes.Fourier transform is to carry out frequency spectrum The basis of analysis, the spectrum analysis of signal refer to the frequency structure by signal, ask for amplitude, phase of its component etc. by frequency point Cloth rule, various " spectrums " using frequency as transverse axis are established, such as amplitude spectrum, phase spectrum.

Audio parsing is carried out by the song selected user in the embodiment of the present invention, the correlation of the song can be obtained Audio frequency parameter, such as track, loudness, tone, waveform etc..Wherein, every track both defines the attribute of this track, such as should Tone color attribute of bar track etc..Because tone color may decide that user hears the difference of sound, therefore can be by the song Parse obtained track to modify, to change the effect of the sound of the first song artist.

Step 303：The acoustical frequency spectrum information corresponding to the singer of user's selection is determined, and is believed according to the acoustical frequency spectrum of determination Breath, modifies to the track information of the song of artist to be changed；

In actual applications, the acoustical frequency spectrum information according to corresponding to the singer that user selects, to by performing step 302 The track information of the song of the artist to be changed obtained is modified, and by way of changing track, changes head songs The tone color of bent singer, the sound of the singer of the song is converted into user by the sound of original singer so as to reach The effect of the sound of selected singer.

Embodiment two

Below to the method for transducing audio sounding of the embodiment of the present invention so that the singer to specific song changes as an example It is illustrated：

Active user wishes to hear sings song of the original singer as pottery Zhe using Sun Yanzi sound《Love is very simple》, it is first First, music APP is to song《Love is very simple》Audio file carry out audio parsing, obtain the track of the song；Secondly, from clothes Sun Yanzi acoustical frequency spectrum information is found in business device, is sung in antiphonal style by the acoustical frequency spectrum information according to the singer Sun Yanzi found It is bent《Love is very simple》Pronunciation track modify, it is final to obtain the song sung with Sun Yanzi sound《Love is very simple》, lead to Crossing aforesaid way can reach song《Love is very simple》The effect of Sun Yanzi sound is converted into by the sound of pottery Zhe.

The process modified in actual applications to track is time-consuming very short, and a 10M or so song is modified The spent time is probably in 15s to 30s or so, thus song is changed in the arrival that the method provided by the present invention can be quickly The effect of artist sound.

Method, server and the computer-readable storage medium of a kind of transducing audio sounding provided in an embodiment of the present invention, lead to Cross and obtain voice data to be converted and converting objects, the voice data to be converted is parsed, generate analysis result, The track information of the voice data to be converted is determined according to the analysis result；According to the converting objects in default sound The acoustical frequency spectrum information that the converting objects is determined in spectrum information database is learned, according to the acoustical frequency spectrum of the converting objects Information is changed to the track information of the voice data to be converted, determines the voice data after conversion.In this way, it is based on mesh The use demand of preceding user and it is caused, a song liked, one singer oneself liked of reselection can be directed to, and lead to The voice for crossing this singer sings the song, so as to reach the effect that other songs are sung with the voice of the singer liked, carries High music APP's is recreational, while brings more preferable usage experience to user.

It should be noted that the foregoing is only a preferred embodiment of the present invention, it is not intended to limit the present invention's Protection domain.

Claims

A kind of 1. method of transducing audio sounding, it is characterised in that methods described includes：

The switch target object of voice data to be converted and the voice data to be converted is obtained, to the audio to be converted Data are parsed, and obtain analysis result, and the track information of the voice data to be converted is determined according to the analysis result, Wherein, the track information comprises at least the tone color of the voice data to be converted；

The acoustical frequency spectrum information of the switch target object is determined in default acoustical frequency spectrum information database, according to described The acoustical frequency spectrum information of switch target object is changed to the track information of the voice data to be converted, after determining conversion Voice data.
2. according to the method for claim 1, it is characterised in that described to obtain voice data and switch target pair to be converted As before, methods described also includes：

Obtain the acoustical frequency spectrum information of at least one switch target object, by the acoustical frequency spectrum information of the switch target object with The identification information of the switch target object is associated, and generates acoustical frequency spectrum information database.
3. according to the method for claim 2, it is characterised in that the acoustical frequency spectrum information for obtaining at least one object, Including：

The sound of object is acquired, digital-to-analogue conversion is carried out to the object sound of acquisition, obtains the numeral of the object Voice data, the object is parsed according to the digital audio-frequency data, obtains the acoustical frequency spectrum information of the object, its In, the acoustical frequency spectrum information of the object comprises at least the syllable spectrum information of object pronunciation.
4. according to the method for claim 1, it is characterised in that the acoustical frequency spectrum according to the switch target object is believed Cease and the track information of the voice data to be converted is changed, including：

According to the tone color of the acoustical frequency spectrum information sound intermediate frequency feature of the target converting objects to the voice data to be converted The tone color of syllable carries out tuning in track information.
5. a kind of transducing audio sound-producing device, it is characterised in that described device includes：

Parsing module, for obtaining the switch target object of voice data to be converted and the voice data to be converted, to institute State voice data to be converted to be parsed, obtain analysis result, the audio to be converted is determined according to the analysis result The track information of data, wherein, the track information comprises at least the tone color of the voice data to be converted；

Modular converter, for determining the acoustical frequency spectrum of the switch target object in default acoustical frequency spectrum information database Information, the track information of the voice data to be converted is turned according to the acoustical frequency spectrum information of the switch target object Change, determine the voice data after conversion.
6. device according to claim 5, it is characterised in that described device also includes：

Generation module, for obtaining the acoustical frequency spectrum information of at least one switch target object, by the switch target object Acoustical frequency spectrum information and the identification information of the switch target object are associated, and generate acoustical frequency spectrum information database.
7. device according to claim 6, it is characterised in that the generation module, be specifically used for：

The sound of object is acquired, digital-to-analogue conversion is carried out to the object sound of acquisition, obtains the numeral of the object Voice data, the object is parsed according to the digital audio-frequency data, obtains the acoustical frequency spectrum information of the object, its In, the acoustical frequency spectrum information of the object comprises at least the syllable spectrum information of object pronunciation.
8. device according to claim 6, it is characterised in that the modular converter, be specifically used for：

According to the tone color of the acoustical frequency spectrum information sound intermediate frequency feature of the target converting objects to the voice data to be converted The tone color of syllable carries out tuning in track information.
A kind of 9. server, it is characterised in that including：Processor and the computer journey that can be run on a processor for storage The memory of sequence,

Wherein, when the processor is used to run the computer program, perform claim requires the step of 1 to 4 any methods described Suddenly.
10. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the computer program quilt The step of Claims 1-4 any methods described is realized during computing device.