CN108922505A

CN108922505A - Information processing method and device

Info

Publication number: CN108922505A
Application number: CN201810673919.5A
Authority: CN
Inventors: 方田
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2018-06-26
Filing date: 2018-06-26
Publication date: 2018-11-30
Anticipated expiration: 2038-06-26
Also published as: CN108922505B

Abstract

The embodiment of the invention discloses a kind of information processing method and devices.The method includes：Acquire the first audio-frequency information, wherein the first audio-frequency information includes：At least one of melodic information, cadence information and timbre information；It generates and associated second audio-frequency information of first audio-frequency information, wherein the first audio-frequency information and the content of second audio-frequency information are at least partly different.

Description

Information processing method and device

Technical field

The present invention relates to information technology field more particularly to a kind of information processing methods and device.

Background technique

In existing audio frequency broadcast system, it is to play to have deposited in local audio library or remote audio library that audio, which plays all, Audio.But the local audio library of either audio frequency apparatus or remote audio library are all sometimes unable to satisfy user and work as Preceding listens to demand, plays alternatively, user needs to find oneself desired audio in the audio of magnanimity, in this way, existing sound Not enough, and the experience of user is not good enough for the intelligence of frequency play system.

Summary of the invention

An embodiment of the present invention is intended to provide a kind of information processing method and devices.

The technical proposal of the invention is realized in this way：In a first aspect, the embodiment of the present invention provides a kind of information processing side Method, including：

Acquire the first audio-frequency information, wherein the first audio-frequency information includes：Melodic information, cadence information and timbre information At least one；

It generates and associated second audio-frequency information of first audio-frequency information, wherein the first audio-frequency information and institute described the The content of two audio-frequency informations is at least partly different.

In some embodiments, first audio-frequency information with second audio-frequency information content at least partly not Together, including at least one of：

The playing duration of second audio-frequency information is different from the playing duration of first audio-frequency information；

The first melodic information at least portion of second melodic information of second audio-frequency information and first audio-frequency information Divide difference；

The first cadence information at least portion of second cadence information of second audio-frequency information and first audio-frequency information Divide difference；

First timbre information of the second timbre information of second audio-frequency information and first audio-frequency information is at least Part is different.

In some embodiments, first timbre information and second timbre information include at least one of：

First kind timbre information, wherein the first kind timbre information includes：The timbre information of voice；The voice Timbre information includes at least one of：The tone color of male voice, the tone color of female voice, the tone color of child's voice, at least two class voice mix shape At mixing voice；

Second class timbre information, wherein the second class tone color information includes：The timbre information of musical instrument；

Third class timbre information, wherein the third class timbre information is：Tone color other than the voice and the musical instrument Information.

In some embodiments, the generation and associated second audio-frequency information of first audio-frequency information, including it is following At least one:

According to the Audio attribute information of first audio-frequency information, second audio-frequency information is generated；

According to the corresponding customer attribute information of first audio-frequency information, second audio-frequency information is generated.

In some embodiments, the Audio attribute information according to first audio-frequency information generates second sound Frequency information, including at least one of：

According to the melody characteristics attribute of first audio-frequency information, rhythm characteristic attribute, tamber characteristic attribute, style of song attribute And at least one of music type attribute, generate second audio-frequency information.

In some embodiments, described according to the corresponding customer attribute information of first audio-frequency information, generate described the Two audio-frequency informations, including：

Record information, emotional state information and user, which are played, according to the user preference information, audio indicates information extremely It is one of few, generate the second audio-frequency information.

In some embodiments, described that note is played according to the corresponding user preference information of first audio-frequency information, audio Record information, emotional state information and user indicate information at least one, generate the second audio-frequency information, including it is following at least One of：

According to the emotional state information, the duration of second audio-frequency information is determined；

Information is indicated according to the user, determines the duration of second audio-frequency information；

Information is indicated according to the user, continues to generate second audio-frequency information；

Information is indicated according to the user, restores to generate second audio-frequency information；

Information is indicated according to the user, stops generating second audio-frequency information；

According to the emotional state information, continue to generate second audio-frequency information；

According to the emotional state information, stop generating second audio-frequency information；

According to the emotional state information, restore to generate second audio-frequency information

Information is indicated according to the emotional state information and the user, determines the duration of second audio；

Information is indicated according to the emotional state information and the user, continues to generate second audio；

Information is indicated according to the emotional state information and the user, stops generating second audio；

Information is indicated according to the emotional state information and the user, restores to generate second audio.

In some embodiments, the generation and associated second audio-frequency information of first audio-frequency information, including：

First audio-frequency information is handled using audio processing model, output and second audio-frequency information.

In some embodiments, the generation and associated second audio-frequency information of first audio-frequency information, including it is following At least one：

The first music score of Chinese operas information of second audio-frequency information is generated according to first audio-frequency information；

The first lyrics information of second audio-frequency information is generated according to first audio-frequency information；

The first music score of Chinese operas information and the first lyrics information that generate according to first audio-frequency information are synthesized, is generated and described the The corresponding song files of two audio-frequency informations.

Second aspect, a kind of information processing unit, which is characterized in that including：

Acquisition module, for acquiring the first audio-frequency information, wherein the first audio-frequency information includes：Melodic information, cadence information And at least one of timbre information；

Generation module, for generating and associated second audio-frequency information of first audio-frequency information, wherein the first audio letter Cease from second audio-frequency information content it is at least partly different.

In some embodiments, the generation module is specifically used for executing at least one of：

In some embodiments, the generation module is specifically used for playing record according to the user preference information, audio Information, emotional state information and user indicate at least one of information, generate the second audio-frequency information.

In some embodiments, the generation module is specifically used for using audio processing model to first audio-frequency information It is handled, output and second audio-frequency information.

In some embodiments, the generation module is specifically used for generating described second according to first audio-frequency information First music score of Chinese operas information of audio-frequency information；First lyrics letter of second audio-frequency information is generated according to first audio-frequency information Breath；The first music score of Chinese operas information and the first lyrics information generated according to first audio-frequency information is synthesized, is generated and second sound The corresponding song files of frequency information.

Information processing method and device provided in an embodiment of the present invention, can be certainly after collecting first audio-frequency information It is dynamic to generate associated second audio-frequency information of the first audio-frequency information of language, in this way, can first audio-frequency information based on acquisition it is dynamic State generates second audio-frequency information related to the first audio-frequency information and that content is at least partly different, and being equivalent to electronic equipment can Audio is created automatically with the audio-frequency information acquired based on one, is currently associated with the first audio-frequency information to meet listening to for user And the second audio-frequency information of dynamic generation demand, electronic equipment has the characteristics that intelligent high and user satisfaction is high.

Detailed description of the invention

Fig. 1 is the flow diagram of the first information processing method provided in an embodiment of the present invention；

Fig. 2 is that a kind of second audio-frequency information provided in an embodiment of the present invention generates schematic diagram；

Fig. 3 is the structural schematic diagram of the first information processing unit provided in an embodiment of the present invention；

Fig. 4 is a kind of melody schematic diagram of twelve-tone equal temperament provided in an embodiment of the present invention；

Fig. 5 is a kind of schematic diagram of standard sine wave provided in an embodiment of the present invention；

Fig. 6 is a kind of staff schematic diagram of music score provided in an embodiment of the present invention；

Fig. 7 is the schematic diagram of line on music score shown in Fig. 6 provided in an embodiment of the present invention；

Fig. 8 is the schematic diagram of another music score line provided in an embodiment of the present invention.

Specific embodiment

Technical solution of the present invention is further described in detail with reference to the accompanying drawings and specific embodiments of the specification.

As shown in Figure 1, the embodiment of the present invention provides a kind of information processing method, including：

Step S110：Acquire the first audio-frequency information, wherein the first audio-frequency information includes：Melodic information, cadence information and sound At least one of color information；

Step S120：Generate with associated second audio-frequency information of first audio-frequency information, wherein the first audio-frequency information with The content of second audio-frequency information is at least partly different.

In the present embodiment, the information processing method can be for applied in the first electronic equipment, by the first electronic equipment First audio-frequency information is acquired using microphone etc., first audio-frequency information can be space where first electronic equipment The audio-frequency information generated after interior any sound is collected is acquired for example, user hums song by first electronic equipment Generate first audio-frequency information.

First electronic equipment voluntarily can generate described second according to first audio-frequency information in the step s 120 Audio-frequency information, alternatively, be that the relevant information of first audio-frequency information or the first audio-frequency information is submitted to the second electronic equipment, Second audio-frequency information is generated by the second electronic equipment, and is received by the first electronic equipment from the second electronic equipment.

Second audio-frequency information is according to the first audio-frequency information dynamic generation, second audio in the present embodiment Information may be dynamically generated completely new audio-frequency information.In the local of first electronic equipment or the second electronics of its connection May all not have in equipment.First audio-frequency information may include at least one of in the present embodiment：

Melodic information, the melodic information form melody after being played；

Cadence information, the cadence information generate certain rhythm after being played；

Timbre information, timbre information determines the frequency of sound fluctuation, and from auditory perception, the sound of user's perception is not It is consistent.

In the present embodiment, second audio-frequency information is generated based on the first audio-frequency information, first audio-frequency information There is relevance with the second audio-frequency information, this relevance is embodied in first audio-frequency information and the second audio-frequency information can At least partly identical, at the same time, the first electronic equipment itself or the second electronic equipment connecting with the first electronic equipment are logical It crosses after processing, obtains the audio-frequency information at least partly different from the first audio-frequency information, this difference can be embodied in：Melody letter At least one of breath, cadence information and timbre information.

In this way, after user hums several to the first electronic equipment, the second electronics of the first electronic equipment itself or request Equipment generates and plays associated second audio-frequency information of the first audio-frequency information with humming acquisition, in this way, being equivalent to the first electronics Humming of the equipment based on user plays the second audio-frequency information automatically and/or creates automatically and play the second audio-frequency information, in this way, really The audio that warranty family is heard every time may be all different, meets this special voice frequency listening demand of user, to promote user User satisfaction.

It is hummed due to the first audio-frequency information derived from the user of the first electronic equipment acquisition, alternatively, user selects broadcasting, Or user controls the audio-frequency information of the sound of generation by other means, the first audio information source is from the control of user in a word, Characterize user it is current the wishes such as listen to or create.If generating the second audio-frequency information according to the first audio-frequency information, generate at this time The second audio-frequency information be equally the wish for being able to reflect user, or meet user demand, in this way, realizing dynamic generation institute The characteristics of the second audio-frequency information is to meet user's current demand is stated, the intelligence and user for improving electronic equipment use satisfaction Degree.

In some embodiments, first audio-frequency information and the content of second audio-frequency information are at least partly different, Including at least one of：

First audio-frequency information is the first duration according to the playing duration that predetermined playback rate plays out；Second information It is the second duration according to the playing duration that the predetermined playback rate plays out；First duration is different from the second duration. For example, the playing duration of the first audio-frequency information is equal to the duration of user's humming, and it may just several seconds, for example, 5 seconds, 10 seconds etc..Second Second duration of audio-frequency information can be greater than first duration, for example, the playing duration of second audio-frequency information can be equal to one The average playing duration of song, for example, any duration between 2 minutes, 3 minutes, 2 to 5 minutes.In this way, being equivalent to user Several are hummed, the first electronic equipment is just triggered and obtains the song generated based on the humming, the first electronic equipment plays should Song, in this way, user hums different, to obtain song differences, user's humming is identical, and the second audio-frequency information of dynamic making is corresponding Song it is also different, so that the difference for meeting user listens to demand, the intelligence and user for improving electronic equipment use full Meaning degree.

Tone color is distinguished according to sounding body, then tone color can at least be divided into above-mentioned three kinds of tone colors, voice, musical instrument sound and voice With other tone colors other than musical instrument sound, for example, it is various using non-musical instrument simulate onomatopoeia.

And voice can be divided into male voice, female voice, child's voice, various mixing voice, for example, the tone color of men and women's compound voice；At The mixing voice of year male voice and child's voice；The mixing voice that adult schoolgirl mixes with child's voice.

The male voice is in the present embodiment：The sound that man after the change of voice phase issues, and the male that grows up can be referred to as Sound；The female voice can be：The sound that woman after the change of voice phase issues, and adult female voice can be referred to as.

Child's voice includes the various voice before the change of voice phase.

Second class tone color is the timbre information of musical instrument, for example, the timbre information of the timbre information of percussion instrument, string music, pipe The timbre information of the various musical instruments such as happy timbre information.

Third class timbre information, it may include：The sound of electronic device, the sound of Switch for door and window, the sound etc. of animal are various Sound.

As shown in Fig. 2, the step S120 may include at least one of:

Audio attribute information herein can be the information extracted from first audio-frequency information, including but not limited to melody Information, cadence information, timbre information, style of song information, music type information etc..

The customer attribute information can be the customer attribute information of the sounding user of first audio-frequency information, be also possible to The customer attribute information for holding user of first electronic equipment can also be the voice applications of the first electronic equipment operation Using the customer attribute information of account institute home subscriber.

The customer attribute information may include：The various information such as gender, age, region, occupation, hobby.

In the present embodiment, when generating the second audio-frequency information, not only in conjunction with the information itself of the first audio-frequency information, can also It is generated in conjunction with Audio attribute information and/or customer attribute information.

The Audio attribute information according to first audio-frequency information generates second audio-frequency information, including following At least one：

The melody characteristics attribute description melody feature of first audio-frequency information, for example, utilizing melody characteristics category Property describe the first audio-frequency information melody feature be it is impassioned, still releive.

The rhythm characteristic attribute description rhythm feature of first audio-frequency information, for example, the first audio-frequency information is 2/4 music clapped or the music etc. of 3/4 bat.

The tamber characteristic attribute description tone color feature of first audio-frequency information, for example, the first audio-frequency information Tone color is based on male voice or based on schoolgirl, is the tone color of musical instrument or the tone color of voice or other kinds of tone color.

The style of song attribute description music style or school of first audio-frequency information.

The music type describes the music type of the first audio-frequency information, for example, be rock music or country music, Or other kinds of music.

In some embodiments,

It is described that second audio-frequency information is generated according to the corresponding customer attribute information of first audio-frequency information, including：

For example, user may input the preference information of oneself, such as the audio of hobby；It can also be according to the broadcasting of user Record automatically generates the preference informations such as the audio of the user preferences.

The user preference information may also include in some embodiments：The singer of user preferences；Utilize user preferences The tone color of singer generates second audio-frequency information.

Audio plays record information：The audio that historical time inner electronic equipment played is had recorded, was played The Audio attribute information etc. that audio has.

The emotional state information may include：Using Image Acquisition or audio collection, pass through facial Expression Analysis in image Or the extraction of sound emotional state, the current emotional state of user is obtained, according to the emotional state information of the user, determines the The information such as melody, the rhythm of two audio-frequency informations.

First audio-frequency information and the second audio-frequency information may also include in some embodiments：Lyrics information etc.；Another First audio-frequency information described in some embodiments and the second audio-frequency information may also include：The language message of lyrics pronunciation, for example, should Language message determines that the broadcasting of the second audio-frequency information is that Chinese plays, English plays or other soundplays.

According to the emotional state information, the duration of second audio-frequency information is determined；For example, if the emotional state information Show that user also wants to listen, for example, expression is very intoxicated in music, the second audio-frequency information is just played with biggish duration；If really Timing length is then to can choose longer one there are two by alternate item；In some embodiments, emotional state can also be believed Breath scores, and using the scoring as input, calculates the duration using specific function.

Information is indicated according to the user, determines the duration of second audio-frequency information；For example, user is grasped by gesture The instructions such as work, voice operating, line of sight operation stop playing, continue to play or extend broadcasting etc., based on this come when determining described It is long.

The second audio-frequency information of the generation further includes one or more below：

In some embodiments, the step S120 may include：Using audio processing model to first audio-frequency information It is handled, output and second audio-frequency information.

Audio processing model herein can be various types of models, for example, various big data models, big data herein Model can be to train the model generated using sample data, for example, neural network model, vector machine model, regression model etc..It is logical Excessive data model is the second audio-frequency information described in input meeting dynamic generation with the first audio-frequency information.

In some embodiments, the step S110 may include at least one of：

Music score of Chinese operas information herein may include：Music notation is converted to, for example, staff or numbered musical notation file etc..

The lyrics information may include：With the lyrics of various languages, for example, the lyrics etc. that Chinese is write.

In the present embodiment, music score of Chinese operas information representation melodic information above-mentioned, cadence information etc..

In this way, after the second audio-frequency information dynamic generation, also as being converted to the first music score of Chinese operas information and the first lyrics information It is recorded, for example, being recorded in a manner of song files, if user thinks that pleasing to the ear can later again tap on is broadcast Put the audio；In this way, the dynamic for not only realizing the second audio is created, but also also achieve record and the follow-up play of song. In some embodiments, the method also includes：

The song files and/or the second audio-frequency information are forwarded to premise equipment, for example, social interaction server device, as society Hand over the component part publication of information；It is recorded storage for another example being submitted to multimedia information lib or is downloaded for other people.

In some embodiments, the step S120 may include：

Corresponding first music score of Chinese operas information of first audio-frequency information is adjusted according to preset musical note, to generate second music score of Chinese operas letter Breath.

In some embodiments, described that first music score of Chinese operas information is adjusted according to preset musical note, to generate second music score of Chinese operas Information, including at least one of：

First music score of Chinese operas information is adjusted according to sine wave rule, musical note is met with generation and meets the sine wave rule Second music score of Chinese operas information.

For example, the descant of each beat of the first music score of Chinese operas information can be connected, then with the waveform of approximate sine wave The descant is adjusted, sine wave herein can be to be greater than first threshold with the similarity of the sine wave of standard and less than the second threshold The special-shaped wave of value.The second threshold is greater than the first threshold, and the first threshold and second threshold can be between 0 to 1 Value.

For another example the double bass of each beat of the first music score of Chinese operas information can be connected, then with the wave of approximate sine wave Shape adjusts the double bass, and sine wave herein can be to be greater than third threshold value with the similarity of the sine wave of standard and less than the 4th The special-shaped wave of threshold value.The third threshold value is less than the 4th threshold value, the 4th threshold value and third threshold value can for 0 to 1 it Between value.

In further embodiments, position of each sound in wireless spectrum in the first music score of Chinese operas information is connected, line is formed Shape be aforementioned approximate sine wave.

In some embodiments, the first threshold can be equal to the third threshold value, and/or, the third threshold value can wait In the 4th threshold value.

In some embodiments, described that first music score of Chinese operas information is adjusted according to preset musical note, to generate second music score of Chinese operas Information, including：

First music score of Chinese operas information is adjusted according to the changing rule of the music introduction, elucidation of the theme, the music is met with generation and is held Turn second music score of Chinese operas information of the changing rule closed.

The changing rule of the introduction, elucidation of the theme can be used for reacting the variation of melody or rhythm, for example, the changing rule of the introduction, elucidation of the theme Audio is divided into 4 periods, is to start period, accept period, climax period and terminate period respectively.

This 4 periods meet scheduled sequencing, for example, sequencing is successively：Start period, accept period, height Damp period to terminate period.One complete second audio is needed comprising this 4 periods.Rotation between this 4 period any two Rule difference and/or tempo discrepancy meet predetermined relationship.

For example, constructing the start-up portion for starting period to collect the first audio-frequency information；It is then based on the first audio Information automatically begins to create second audio-frequency information according to the changing rule of the introduction, elucidation of the theme.

For example, the melody of climax period is most impassioned or most droning by taking melody as an example；Accept period and terminate period it is impassioned or Droning degree is slightly weaker than the climax period.Start period and is weaker than undertaking period.

For example, the rhythm of climax period is most fast by taking rhythm as an example；It accepts period and terminates period rhythm and be slower than the climax Period.Start period and is slower than undertaking period.

It is more slowly as accepting period and terminating the impassioned or droning degree of period how many or rhythm weaker than climax period It is few, then a random number can be generated based on random function, the random number based on the generation be handled, to generate described the Two audio-frequency informations.

Therefore in some embodiments, audio processing model above-mentioned can also be one meet the sine wave rule and/ Or the audio model of introduction, elucidation of the theme changing rule can be by adopting several introducings, so that raw when handling above-mentioned variation at random At the second audio-frequency information have more variability.

For example, meeting the second audio of key player on a team's wave rule, the fluctuating range of approximate sine wave can satisfy described and forward The specific power of the changing rule of conjunction, single chapter can be determined based on the random value that the random function is randomly generated.

In some embodiments, the method also includes：

Generate lyrics information, wherein the lyrics information corresponds to first music score of Chinese operas information, alternatively, the lyrics are believed Breath corresponds to second music score of Chinese operas information, and second music score of Chinese operas information is to be generated based on first music score of Chinese operas information.

For example, receiving the lyrics that user speech inputs from man-machine interactive interface, the corresponding audio-frequency information of the lyrics is turned It is changed to the corresponding lyrics information of language-specific.

In some embodiments, the generation lyrics information, including：

Acquire the second audio-frequency information；

Second audio-frequency information is converted into the lyrics information.

In some embodiments, described that second audio-frequency information is converted into the lyrics information, including：

Pronunciation markup information is generated according to second audio-frequency information；Pronunciation markup information herein can be to be directly based upon hair What sound converted, it is not limited to different language；

It is carried out on track according to the pronunciation identification information with corresponding music score of Chinese operas information corresponding；In this way, completing music score of Chinese operas letter The control of breath and lyrics information, to generate second audio-frequency information and/or the song files.

Optionally, according to the music score of Chinese operas parameter of the corresponding music score of Chinese operas information of the lyrics information, the lyrics information is automatically generated.

In some embodiments, the method also includes：

Generate playing information, wherein playing information is first music score of Chinese operas information or corresponding with first music score of Chinese operas information The second music score of Chinese operas information target object play generate audio-frequency information, the target object be target musical instrument, target organism or At least one of target object, wherein the target object is different from the target musical instrument and the target organism.The mesh Marking object can be any sounding body above-mentioned.

In some embodiments, the generation playing information, including；

According to first music score of Chinese operas information or the music score of Chinese operas parameter of second music score of Chinese operas information, the playing information is generated.

In further embodiments, described to be joined according to the music score of Chinese operas of first music score of Chinese operas information or second music score of Chinese operas information Number, generates the playing information, including：

According to the rhythmic parameters of first music score of Chinese operas information or second music score of Chinese operas information, style of song parameter, emotion in addition, institute The method of stating further includes：Synthesize the performance letter of music score of Chinese operas information, lyrics information corresponding with the music score of Chinese operas information and the music score of Chinese operas information At least two in breath, to generate song files, wherein the music score of Chinese operas information is first music score of Chinese operas information or with described first Corresponding second music score of Chinese operas information of music score of Chinese operas information.

In some embodiments, the method also includes：Detect labeling operation；The song is put according to identification operation examination Bent file modifies to the song files.In this way, user is allowed to modify the second audio-frequency information of generation, obtain Meet the audio-frequency information of self-demand.

As shown in figure 3, the present embodiment provides a kind of information processing units, including：

Acquisition module 110, for acquiring the first audio-frequency information, wherein the first audio-frequency information includes：Melodic information, rhythm At least one of information and timbre information；

Generation module 120, for generating and associated second audio-frequency information of first audio-frequency information, wherein the first sound Frequency information and the content of second audio-frequency information are at least partly different.

In further embodiments, the generation module 120 is specifically used for executing at least one of：

In further embodiments, the generation module 120 is specifically used for according to the user preference information, audio At least one for playing record information, emotional state information and user's instruction information, generates the second audio-frequency information.

In further embodiments, the generation module 120 is specifically used for using audio processing model to described first Audio-frequency information is handled, output and second audio-frequency information.

In further embodiments, the generation module 120 is specifically used for generating institute according to first audio-frequency information State the first music score of Chinese operas information of the second audio-frequency information；The first song of second audio-frequency information is generated according to first audio-frequency information Word information；The first music score of Chinese operas information and the first lyrics information that generate according to first audio-frequency information are synthesized, is generated and described the The corresponding song files of two audio-frequency informations.

Several specific examples are provided below in conjunction with above-mentioned any embodiment：

Example 1：

This example provides a kind of audio processing system, system composition summary：The system is by hardware system, software application journey Sequence, cloud network big data are constituted.

(1) hardware system composition summary：The microphone input equipment of high sampling rate, the electronic equipment one of operational performance brilliance Platform, the network broadband not less than 50M；

(2) groups of software applications is at general introduction：Audio recording based on DirectSound and application is played, is based on Central processor (CPU) adds the data processing module of image processor (GPU), is based on artificial intelligence (artificial Intelligence, AI) sample collection, frequency spectrum discerning, feature extraction, depth training, prediction and analysis technology AI learn mould Block, increase, reading, update and deletion (Create Retrieve Update Delete, CRUD) module based on database, Voice annotation generation module based on speech recognition, based on Advanced Audio Coding (Advanced Audio Coding, AAC) Advanced Audio Codec module.

(3) cloud network big data：It stores the codeless music source data of magnanimity grade and corresponding is based on frequency spectrum graph code Music language, the brief music period of this external storage magnanimity grade, length be 1 to 8 syllable differ.

The function of the system is mainly exactly to be aided with cloud big data by the voice data of input terminal capture user's humming Generate the song for meeting user's creation intention.

The workflow of the audio processing system can be as follows：

(1) it repairs logical acquisition countdown to start, after countdown 3s, the sound source of user's humming is captured using microphone as input terminal Data do not encode and are directly transferred to PC (Personal Computer, PC).

(2) PC obtains the corresponding spectrum figure of sound source data, and the spectrum signal without tone color without noise by signal processing；

(3) it indicates to select certain musical instrument based on user, it is raw to being rendered without tone color without the spectrum signal of noise for generation At carrying out audition after music.

(4) it pilots and generates audio, to facilitate user to do several times to song (corresponding to aforementioned second audio-frequency information) is generated Audition, every time to the music of audition based on initial time stamp section period carry out can increment text marking, distinguish oneself Think satisfied part and unsatisfied part；Then it uses to change with user and lead, the strategy supplemented by the big data of AI intelligence cloud, AI can modify again unsatisfied part, for example, at least one of modification rhythm, melody and volume, until satisfaction The mark file for saving spectrogram afterwards and doing for the spectrogram, is stored as predetermined format for the mark file, for example, being stored as The file format of suffix name sll.The sll spelling is written as SweetLover Language.

(5) at this point, for additionally there is the user of performance demand

<1>Can be inserted under existing spectrogram using English 26 letters pronunciation track (the pronunciation family of languages include but It is not limited only to meet the Modern Chinese Chinese speech pronunciation family of languages, the archaic Chinese pronunciation family of languages, the Japanese pronunciation family of languages, English hair of music tone The sound family of languages, Rome pronounce the family of languages, the Latin pronunciation family of languages), be stored as pronunciation file (I calls it as music voice file, after Sewing name can be slv；The slv is written as SweetLover Voice entirely).

<2>Also speech recognition class software can be used, non-language is converted into using English 26 by pronunciation described in user Letter, but whether the pronunciation mark for needing to verify speech recognition if doing so is wrong, and each pronunciation mark is specific Which note snapped on frequency spectrum needs user to handle manually.

<3>Further, it can be recorded with the performance WAV format of direct recording oneself, be then introduced into preparation synthesis.

(6) finally, the spectrogram track and every pronunciation track that user records music information to every respectively generate respectively (each spectrogram track create-rule is based on corresponding sll file, Mei Gefa to the voice of the music of specified musical instrument and specified source of sound Track road create-rule is based on slv file), these audio files ultimately generated are the WAV format of uncoded compression, and again It is secondary to save each spectrogram track and corresponding music score (optional staff or numbered musical notation) and each pronunciation track, do last wash with watercolours Dye, exporting the audio source data of the WAV format completed for the rendering of selected musical instrument, (rendering can be carried out based on period, and each period Different musical instruments can be used, the file that similarly pronounces can also be carried out based on period, and each period is applicable in different sources of sound).

The technical solution that this example provides reduces musical composition so that not having the ordinary user of higher musical quality Threshold promotes the universal of musical composition --- and allow user to catch inspiration written in water in oneself life；For professional music system Make people, composer, provides productivity higher tools.

Example 2：

PC obtains the corresponding spectrum figure of sound source data, and the spectrum signal without tone color without noise by data processing, can Including：

PC is to the data after the high frequency sampling of microphone, temporarily according to hypothesis BPM=120.0 next 128 dieresis time Interval (i.e. the time interval that 0.5s is 0.015625s divided by 128 multiplied by 4) is raw to the slice of audio stream equal equal part from the beginning to the end At blocks of audio frame；

Using the parallel blended data processing mode of GPU is serially added based on CPU, every time to the audio of one group of maximum quantity Frame carries out parallel Fast Fourier Transform (FFT) and synchronizes, until all audio frames have been processed, thus by each audio frame Time-domain signal is changed into frequency-region signal (only frequency, energy information, without the signal of any tone color)；

It reuses and serially adds the parallel blended data processing mode of GPU based on CPU, find out in each audio frame energy most Big frequency-region signal is recognized as main signal, then carries out signal denoising, gets rid of other frequency-region signals.

After noise removes, need first to carry out error correction.Herein because of some limitations (twelve-tone equal temperament), together When be also the odjective cause frequency of humming (user may be not allowed), it is necessary to according to similar figure on the basis of the several groups scale of piano 4 standard carries out error correction.Fig. 4 can be a kind of relevant information of twelve-tone equal temperament, and horizontally-arranged expression is scale；File indicates Be frequency.

It after amendment, needs to enhance the signal filtered out, time domain is reverted to by frequency domain, when the time domain of reduction is believed When time interval less than 0.015625s of number duration, need to supplement default part, the signal frequency and reduction of supplement Signal frequency it is consistent, and the signal after having supplemented must ensure entire when temporally sequentially the full time interval is filled in arrangement The amplitude of signal meets damping vibration model (the phenomenon that sound that i.e. musical instrument vibration issues gradually is decayed) in time interval；

Then according to time span and scale height generate spectrogram because processing recording when, be according to isochronous surface into Capable, so the long that a tone is constant, will become several minors after slice, but it is identical to be temporarily not connected to these Minor (such as the recording of a whole note E2 at practical BPM=120, according to one 128 points of practical BPM=120 Note is sliced, and 128 audio frames can be just cut into), specific connection is also required to that composer is transferred to determine, the later period AI be given only relevant intelligent prompt and suggestion under no tab area because following period is not for artistic creation With：

" 5--- | " and " 5555 | " is different.But the latter can be detected by AI and provide prompt, inquire user Whether it is recognized as similar to a continuous long as the former.

In frequency spectrum map generalization, the shape of this spectrogram can be saved, this spectrogram is subsequent AI according to a large amount of number The key (be based on shape Rapid matching, and amending advice is proposed according to user demand) that creation is suggested is proposed according to template.

Presented below is an example of the corresponding spectrum figure of melody selections：

2. user can choose certain musical instrument, to rendering without tone color without the spectrum signal of noise for generation, sound is generated Audition is carried out after pleasure, is equivalent to the timbre information based on the second audio of user input selection, is also possible to be selected automatically by AI model Select timbre information.

3. user does audition several times to song is generated, every time the period to the music of audition based on initial time stamp section Carry out can increment text marking.

Firstly, for novice users, AI system can provide prompt according to specified step, for confirming that oneself identity is special The user of industry composition people etc., AI system can show due prompt in each step simultaneously on entire movement.Assume below It is novice users, illustrates which prompt AI can provide in an orderly manner with this：

(1) firstly, AI system can generate a preliminary advice, this is proposed to be used in prompt user's spectrogram, continuous several Whether the identical frequency spectrum of a height can regard that (if user is not handled, AI default assumes that it is continuous for one to a long as Long), then allow user using mark confirmation, can be by marked content storage to generation after user confirm at one In one complete Markup Language file * .sll, facilitates user to cancel the state before being restored to change in time, completed to user After the mark that all users oneself in prompt need, step 2 prompt is executed；

(2) then, AI system prompt user selects the musical instrument (this way is called rendering setting) for playing specified period, and These musical instruments include but is not limited to common piano, violin etc..These musical instruments refer to these real instruments in identical dynamics The recording source of sound of the lower each tone for playing a duration 4s, because spectrogram saves the core frequency information of melody, Playing an instrument for the period will be specified to the source of sound selection mark between each region in spectrogram according to spectrogram and user, when There are the existing pianos of some period to allowing when corresponding to for the musical instrument that any one period has at least one specified in entire spectrogram Mark has the case where violin mark again, and the reality that such case indicates is meant that the period is needed while being drilled with both musical instruments Play), user just can choose confirmation step 2 mark and complete.

After the completion of user annotation, system will execute Mixed Audio Algorithm, according to mark and frequency spectrum generate this complete movement without The WAV audio file (such way is called rendering) of damage, AI system can prompt the user whether that audition oneself is needed to select at this time Period play, user can carry out audition several times, can be added after marking some period and requiring to play and render every time The corresponding portion of wav file carries out audition, and user can be inserted into timestamp label during audition, can also be in audition knot Period after beam or termination audition, between 2 adjacent time points of time point or even user oneself selection to timestamp label Then additional mark repeats (1) (2) (3) step, or request AI system is mentioned about what mark period where composition melody was corrected Show, then AI system can provide the rapid prompt of step 4.

Period aspect of the AI system to user annotation, the main music data according to spectrogram matching network cloud magnanimity, And the attribute of the period is immediately provided, including but not limited to：The spectrogram (essential attribute) of the period, style, emotion, performance Musical instrument etc., and requry the users the requirement of user：

<1>If it is dissatisfied to style, emotion, then user can input desired style emotion, then AI according to Spectrogram with the type style emotion, the example frequency caster for first generating 1=C give user's audition, and user can select at any time It selects oneself this caster of direct manual modification or AI is required to pass through mother using the related algorithm write out according to theory of composition The practical period that certain composition skill of template generation user selection generates is advised to it allows it to confirm, as according to composition The algorithm that theory is write is discussed later；

<2>It is unsatisfied with if it is to volume in musical instrument, period, after only needing the additional mark of user, again to the period Rendering replaces the corresponding portion of old WAV, this portion of techniques is exactly DirectSound, FILE I/O the relevant technologies, no Hardly possible is realized.

4. machine sings (for there is the user of performance demand)：

Pronunciation mark problem：According to the qualitative of software, and need to take into account the pronunciation scheme in country variant area, because of early stage electricity Brain uses English character, therefore labelling schemes are also selection English character mark；Again because there is pronunciation family of languages sieve in early history The movement of horseization, therefore most pronunciations without tone, whether Chinese character, Japanese, Korean etc., can use the ground of 1 byte Area encodes (maximum can represent 255 areas) and marks plus English character (serving as pronunciation character)；

Pronunciation mark and SweetLover Voice document No.：Because of some features in terms of phonetics, especially when When sending out a long, the sound sent out when long is spun out is the sound under a stable state, (and pronounce be dispute nose mouth initiate one A consonant serves as distinctive tone, then is made up of the breath under stable state), so the mark of pronunciation aspect is opened similar to flower The mask method of process specifies syllable subscript to infuse consonant, in the part of tenuto, marks the sound of state change every time at some.Such as Sing this sound of phonetic yuan, it is assumed that this sound needs to sing 4s, and singer sings and follows pronunciation law, then the process of performance In along with nozzle type variation pronunciation variation should be：Yu-yuan-an, mark should be just in corresponding note in the following, by It is marked according to similar mode.

<1>In currently assuming that 0x01 is indicated as 2052 in operating system, it is as follows to provide example：

In above-mentioned mark, the 0x01 of 1 byte instruction area belonging to pronunciation mask method below, and immediately an it is subsequent- Indicate the extension of hair this sound of an.

<2>When user is inputted using voice, it should say these three sounds of yuyuanan, then oneself adjustment position, meeting It is arduous to compare；

<3>If the sound that user's direct recording oneself is sung is WAV, it is then introduced into, is more convenient.

In some scenes, all WAV should use identical microphone input equipment, identical sample rate, sound Road number, bit depth, guard against deviations.

5. the AI judgement of cloud network big data can be as follows：

How the AI module of intelligent compositing system uses cloud network big data, automatically generates in conjunction with the theory of composition aspect Period completion.It first has to make theoretical place mat：

{1}：Period frequency spectrum (need to be risen and be forwarded according to 4 segmentations for meeting literature antithesis, rhythm sentence based on the standard of Sine-Fitting Close) generate core rhythm.Core rhythm herein is main rhythm, can be rhythm most in the second audio-frequency information；Or climax Partial rhythm.

It is the big tune card agriculture of staff version D below, quoted from Baidu's picture, staff of failing to understand is not serious, if five lines Small tadpole head in spectrum all connects, so that it may the spectrum fragmentation that Track 2 is similar in similar above-mentioned picture is formed, Make sure to keep in mind after all linking up to need to carry out all period laterally arrangement connection, i.e., I annotate in 1 end of period that marks With the beginning of period 2 (be that serial number 1 marks), it is all end to end that 2 end of period and period 3 start ... and so on Laterally arrangement can find that the rise and fall amplitude span in each period period is not that very greatly, not will form and be similar to sharply The zigzag audible spectrum figure of noise, formation is entirely regular but endless similar to SIN function f (x)=sin x fluctuating It is complete similar.

Fig. 5 is the waveform diagram of the sine wave of a standard.

Fig. 6 is a music score；Fig. 7 is that each sound of music score shown in Fig. 6 connects the waveform diagram to be formed, and the waveform diagram is to a certain degree It is upper similar with sine wave.

Fig. 8 is that each sound of another music score of Chinese operas connects the waveform diagram to be formed, and the waveform diagram is also similar with sine wave.

It is assumed that it is 1 group that the period of creation, which is substantially 4 periods (introduction, elucidation of the theme), each period contains the several of same number A trifle, then according to Sine-Fitting, it is assumed that the 1st note of the 1st period is f (x)=sin x, a certain on x ∈ [0,2 π] A point such as x=5 then counting since the starting point, then has as starting point：[5,5+ π] is that the 1st period (pays attention to 1 pleasure Section may include several trifles, and each trifle duration is all the same, for example all clap for 4/4, or all clap for 6/8, but ordinary circumstance Do not recommend the case where variation beat occur), [5+ π, 5+2 π] is the 2nd period, then not exclusively similar place appears in the 3rd period Or the 4th period, it is not exclusively similar the reason is that needing to have association because of feature in terms of being similar to the literature and art works rhythm Adjusting, complete, symmetrical beauty, but need it is with one climax following another as asymmetric, changeful, not dull beauty.

Above 4 periods, the 1st period are called undertaking period starting period, the 2nd period, and the 3rd period, which is called, transfers/pass Into period, the 4th period is called end period, referred to as 4 words：The introduction, elucidation of the theme is (progressive similar to small as mood by the 3rd period Say the climax parts processing of structure, then it is turnover period or end period that the 4th period, which can also be regarded,.

Period 4 will break the period of Sine-Fitting, i.e. permission period 3 and period 1 has that maximum similar (turnover can also be with Begun in period 3), but period 4 do not recommend it is identical with period 2, in order to avoid bring the fatigue of the aesthetic aspect of the sense of hearing).

Need now as shown in connection with fig. 8 music score shown in music score Fig. 8 be illustrated.

The content of numbered musical notation part is labeled and is grouped, and is subject to necessary explaination.Firstly, the beat section mark of numbered musical notation That infuses is bad, and class (wheat etc. is cried out also to calculate) song of expressing one's emotion is mostly 4/4 bat, it is not recommended that is labeled as 2/4 bat, is then grouped.

Grouping 1：Prelude part, the grouping corresponding to aforementioned starting period；

Grouping 2：Corresponding to the aforementioned grouping for accepting period, 1 colon is similar on image plus 1 thin 12 thick vertical lines (:| l), so that it may which the part for judging front is one group；

Grouping 3 and 4：Corresponding to the grouping of aforementioned turnover/progressive period.

Grouping 5：Corresponding to the grouping above-mentioned for terminating period.

If composition, it then follows { 1 } correlation principles such as Sine-Fitting, and according to the whole spectrum figure trend rather than it is specific Some note, beat are globally observed, so that it may generate relatively good template.

Therefore it may first have to ensure in the cloud network big data that the AI module of intelligent compositing system is used, it is necessary to store Certain audible spectrum figure is used as data sample, and all algorithms are all based on what audible spectrum figure was fitted.Another party Face, to a large amount of sample analysis, it was also found that meeting the pleasure of Sine-Fitting template (template corresponding to sine wave rule above-mentioned) In contrast people can feel more pleasing to the ear to section.

Sine-Fitting is that comparatively melodious music needs the elementary sentence abided by, the sound of even different emotion styles It is happy also to will comply with this rule.

In view of the spectrogram that the music of different emotions style has its overall trend difference to move towards, so intelligence makees bowed pastern In the cloud network big data that the AI module of system is used, substantially need to store is the different-style music based on Sine-Fitting The tendency of spectrogram.So, when obtaining the spectrogram of user's humming song, so that it may there is obvious sense according to storage The characteristic spectrogram tendency of feelings style, analysis user creation song style and song in emotion variation, thus with When family starts to make selection refinement etc. modification, relatively intelligentization suggestion is provided.Such as：The determination of song emotion keynote, style It is whether consistent, the stylistic category of this period and the amending advice of period etc., but take whether, must be determined by user.

The derivation of equation { 3 } obtains after the Compression and Expansion for doing whole equal proportion to f (x)=sin x functional image converts Image be still sinusoidal image.When deriving formula art music as a result, BPM when equal proportion is expanded or shunk, is being changed Range less (i.e. f (x)=sin x functional image deformation not serious) when, whole song stylistic differences after variation and original Less, the measurement of variation range size is determined by calculating variation front and back transverse and longitudinal coordinate ratio, which is：It is longitudinal maximum Distance/lateral a cycle, for example for image f (x)=sin x, ratio is 2/2 π, is not reduced.

The derivation of equation { 4 }, after doing whole equidistant up/down translation transformation to f (x)=sin x functional image, Obtained image is still sinusoidal image.It is right when the principle for dividing rate equally according to 12 when deriving formula art music as a result, Whole several scales of song lifting/lowering, obtained music frequency spectrum trend graph is consistent with original tendency, and only such variation will more Added with the range for helping match singer.

The derivation of equation { 5 } uses the rule to tone height according to the rule of mathematics arithmetic progression, can be to adjacent Note, trifle, period do the translation transformation of arithmetic progression, can create the progressive period of stereovision in this way；Similarly, when the rule Being used in time interval and non-pitch height, i.e., the time that adjacent each note, trifle, period are played gradually slows down or accelerates, It can also be created that gradually nervous, impassioned, or the gradually period of gentle, quiet style.

The derivation of equation { 6 } closes a first symbol melody of Sine-Fitting transformation, on the basis of BPM, according to 2 n times idempotent The musical performance content of the certain time of the inside is won out to difference duration, the new period reconstituted still conforms to sine It is fitted image, the difference duration that wherein value of N is formed can not be lower than the next demisemiquaver duration of the BPM, can not also be higher than Next complete one group of period (introduction, elucidation of the theme) duration of the BPM.

The derivation of equation { 7 }, the note of a length of T carries out the identical note for being split as X T/X duration, whole head sound when to one Happy style trend moves towards constant, emotion of expression etc. slightly change, and split scheme require herein X value be 2 n times power or Person is 2 n times power and 3 product, final value can not so that the duration of T/X is lower than the next demisemiquaver duration of the BPM, The note duration before splitting under the BPM can not be higher than.

The derivation of equation { 8 } is done each note formed after fractionation and is changed, be not all it same on the basis of { 7 } A tone, but whole fluctuating gap variation is less, then and tendency plentiful, rich in variation will be presented in the melody expressed.

When human ear hears the synaeresis that multi-Channel Acoustic source of sound is formed simultaneously, even if the tone of multi-Channel Acoustic source of sound is completely the same, and It has different timbres, then psychologically has and think the full feeling of music (sense of hearing satisfaction).

The part having different timbres is created, user can specify any monophonic, it is desirable that complete using self-ordained musical instrument The period of the part is replicated, to create multi part performance.

Chord formula fitting, user can specify any monophonic, it is desirable that and it automatically generates chord according to chord formula, than Such as 135 one groups of Major chord, and other minor triads as 5 one groups of 1 3b, increase common chords, subtract common chords etc. it is more Part.

Grace (be equivalent to and increase a decoration part) can be generated to specified note, grace is still according to the note The intelligent completion of height, one kind that user can also be adjusted or require it to specify according to oneself are moved towards in this period spectrogram The spectrogram segment of emotion style is generated.

In several embodiments provided herein, it should be understood that disclosed device and method can pass through it Its mode is realized.Apparatus embodiments described above are merely indicative, for example, the division of the unit, only A kind of logical function partition, there may be another division manner in actual implementation, such as：Multiple units or components can combine, or It is desirably integrated into another system, or some features can be ignored or not executed.In addition, shown or discussed each composition portion Mutual coupling or direct-coupling or communication connection is divided to can be through some interfaces, the INDIRECT COUPLING of equipment or unit Or communication connection, it can be electrical, mechanical or other forms.

Above-mentioned unit as illustrated by the separation member, which can be or may not be, to be physically separated, aobvious as unit The component shown can be or may not be physical unit, it can and it is in one place, it may be distributed over multiple network lists In member；Some or all of units can be selected to achieve the purpose of the solution of this embodiment according to the actual needs.

In addition, each functional unit in various embodiments of the present invention can be fully integrated into a processing module, it can also To be each unit individually as a unit, can also be integrated in one unit with two or more units；It is above-mentioned Integrated unit both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

Those of ordinary skill in the art will appreciate that：Realize that all or part of the steps of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer readable storage medium, the program When being executed, step including the steps of the foregoing method embodiments is executed；And storage medium above-mentioned includes：It is movable storage device, read-only Memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or The various media that can store program code such as person's CD.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims

1. a kind of information processing method, which is characterized in that including：

Acquire the first audio-frequency information, wherein the first audio-frequency information includes：Melodic information, cadence information and timbre information are at least One of；

It generates and associated second audio-frequency information of first audio-frequency information, wherein the first audio-frequency information and second audio The content of information is at least partly different.

2. the method according to claim 1, wherein

First audio-frequency information and the content of second audio-frequency information are at least partly different, including at least one of：

First melodic information of the second melodic information of second audio-frequency information and first audio-frequency information is at least partly not Together；

Second cadence information of second audio-frequency information and the first cadence information of first audio-frequency information are at least partly not Together；

First timbre information of the second timbre information of second audio-frequency information and first audio-frequency information is at least partly It is different.

3. according to the method described in claim 2, it is characterized in that,

First timbre information and second timbre information include at least one of：

First kind timbre information, wherein the first kind timbre information includes：The timbre information of voice；The tone color of the voice Information includes at least one of：The tone color of male voice, the tone color of female voice, the tone color of child's voice, at least two class voice are mixed to form Mix voice；

Third class timbre information, wherein the third class timbre information is：Tone color letter other than the voice and the musical instrument Breath.

4. method according to claim 1,2 or 3, which is characterized in that

The generation and associated second audio-frequency information of first audio-frequency information, including at least one of:

5. according to the method described in claim 4, it is characterized in that,

The Audio attribute information according to first audio-frequency information, generates second audio-frequency information, including it is following at least One of：

According to the melody characteristics attribute of first audio-frequency information, rhythm characteristic attribute, tamber characteristic attribute, style of song attribute and sound At least one of happy type attribute generates second audio-frequency information.

6. according to the method described in claim 4, it is characterized in that,

Play record information according to the user preference information, audio, emotional state information and user indicate information at least its One of, generate the second audio-frequency information.

7. according to the method described in claim 6, it is characterized in that,

It is described that record information, emotional state information are played according to the corresponding user preference information of first audio-frequency information, audio And user indicates at least one of information, generates the second audio-frequency information, including at least one of：

8. method according to claim 1,2 or 3, which is characterized in that the generation is associated with first audio-frequency information The second audio-frequency information, including：

9. method according to claim 1,2 or 3, which is characterized in that

The generation and associated second audio-frequency information of first audio-frequency information, including at least one of：

The first music score of Chinese operas information and the first lyrics information generated according to first audio-frequency information is synthesized, is generated and second sound The corresponding song files of frequency information.

10. a kind of information processing unit, which is characterized in that including：

Acquisition module, for acquiring the first audio-frequency information, wherein the first audio-frequency information includes：Melodic information, cadence information and sound At least one of color information；

Generation module, for generate with associated second audio-frequency information of first audio-frequency information, wherein the first audio-frequency information with The content of second audio-frequency information is at least partly different.