CN108922505A - Information processing method and device - Google Patents
Information processing method and device Download PDFInfo
- Publication number
- CN108922505A CN108922505A CN201810673919.5A CN201810673919A CN108922505A CN 108922505 A CN108922505 A CN 108922505A CN 201810673919 A CN201810673919 A CN 201810673919A CN 108922505 A CN108922505 A CN 108922505A
- Authority
- CN
- China
- Prior art keywords
- information
- audio
- frequency information
- frequency
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 15
- 238000003672 processing method Methods 0.000 title claims abstract description 10
- 238000000034 method Methods 0.000 claims abstract description 24
- 230000002996 emotional effect Effects 0.000 claims description 54
- 230000033764 rhythmic process Effects 0.000 claims description 23
- 238000012545 processing Methods 0.000 claims description 18
- 238000001228 spectrum Methods 0.000 description 22
- 238000010586 diagram Methods 0.000 description 13
- 230000008451 emotion Effects 0.000 description 12
- 230000008859 change Effects 0.000 description 8
- 238000002156 mixing Methods 0.000 description 7
- 238000009795 derivation Methods 0.000 description 6
- 238000009877 rendering Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000000750 progressive effect Effects 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 230000007306 turnover Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000003796 beauty Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000013499 data model Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 241000209140 Triticum Species 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000013016 damping Methods 0.000 description 1
- 238000005034 decoration Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 238000005194 fractionation Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000009527 percussion Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000001020 rhythmical effect Effects 0.000 description 1
- 239000011435 rock Substances 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000009958 sewing Methods 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/40—Rhythm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/056—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/076—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
Abstract
The embodiment of the invention discloses a kind of information processing method and devices.The method includes:Acquire the first audio-frequency information, wherein the first audio-frequency information includes:At least one of melodic information, cadence information and timbre information;It generates and associated second audio-frequency information of first audio-frequency information, wherein the first audio-frequency information and the content of second audio-frequency information are at least partly different.
Description
Technical field
The present invention relates to information technology field more particularly to a kind of information processing methods and device.
Background technique
In existing audio frequency broadcast system, it is to play to have deposited in local audio library or remote audio library that audio, which plays all,
Audio.But the local audio library of either audio frequency apparatus or remote audio library are all sometimes unable to satisfy user and work as
Preceding listens to demand, plays alternatively, user needs to find oneself desired audio in the audio of magnanimity, in this way, existing sound
Not enough, and the experience of user is not good enough for the intelligence of frequency play system.
Summary of the invention
An embodiment of the present invention is intended to provide a kind of information processing method and devices.
The technical proposal of the invention is realized in this way:In a first aspect, the embodiment of the present invention provides a kind of information processing side
Method, including:
Acquire the first audio-frequency information, wherein the first audio-frequency information includes:Melodic information, cadence information and timbre information
At least one;
It generates and associated second audio-frequency information of first audio-frequency information, wherein the first audio-frequency information and institute described the
The content of two audio-frequency informations is at least partly different.
In some embodiments, first audio-frequency information with second audio-frequency information content at least partly not
Together, including at least one of:
The playing duration of second audio-frequency information is different from the playing duration of first audio-frequency information;
The first melodic information at least portion of second melodic information of second audio-frequency information and first audio-frequency information
Divide difference;
The first cadence information at least portion of second cadence information of second audio-frequency information and first audio-frequency information
Divide difference;
First timbre information of the second timbre information of second audio-frequency information and first audio-frequency information is at least
Part is different.
In some embodiments, first timbre information and second timbre information include at least one of:
First kind timbre information, wherein the first kind timbre information includes:The timbre information of voice;The voice
Timbre information includes at least one of:The tone color of male voice, the tone color of female voice, the tone color of child's voice, at least two class voice mix shape
At mixing voice;
Second class timbre information, wherein the second class tone color information includes:The timbre information of musical instrument;
Third class timbre information, wherein the third class timbre information is:Tone color other than the voice and the musical instrument
Information.
In some embodiments, the generation and associated second audio-frequency information of first audio-frequency information, including it is following
At least one:
According to the Audio attribute information of first audio-frequency information, second audio-frequency information is generated;
According to the corresponding customer attribute information of first audio-frequency information, second audio-frequency information is generated.
In some embodiments, the Audio attribute information according to first audio-frequency information generates second sound
Frequency information, including at least one of:
According to the melody characteristics attribute of first audio-frequency information, rhythm characteristic attribute, tamber characteristic attribute, style of song attribute
And at least one of music type attribute, generate second audio-frequency information.
In some embodiments, described according to the corresponding customer attribute information of first audio-frequency information, generate described the
Two audio-frequency informations, including:
Record information, emotional state information and user, which are played, according to the user preference information, audio indicates information extremely
It is one of few, generate the second audio-frequency information.
In some embodiments, described that note is played according to the corresponding user preference information of first audio-frequency information, audio
Record information, emotional state information and user indicate information at least one, generate the second audio-frequency information, including it is following at least
One of:
According to the emotional state information, the duration of second audio-frequency information is determined;
Information is indicated according to the user, determines the duration of second audio-frequency information;
Information is indicated according to the user, continues to generate second audio-frequency information;
Information is indicated according to the user, restores to generate second audio-frequency information;
Information is indicated according to the user, stops generating second audio-frequency information;
According to the emotional state information, continue to generate second audio-frequency information;
According to the emotional state information, stop generating second audio-frequency information;
According to the emotional state information, restore to generate second audio-frequency information
Information is indicated according to the emotional state information and the user, determines the duration of second audio;
Information is indicated according to the emotional state information and the user, continues to generate second audio;
Information is indicated according to the emotional state information and the user, stops generating second audio;
Information is indicated according to the emotional state information and the user, restores to generate second audio.
In some embodiments, the generation and associated second audio-frequency information of first audio-frequency information, including:
First audio-frequency information is handled using audio processing model, output and second audio-frequency information.
In some embodiments, the generation and associated second audio-frequency information of first audio-frequency information, including it is following
At least one:
The first music score of Chinese operas information of second audio-frequency information is generated according to first audio-frequency information;
The first lyrics information of second audio-frequency information is generated according to first audio-frequency information;
The first music score of Chinese operas information and the first lyrics information that generate according to first audio-frequency information are synthesized, is generated and described the
The corresponding song files of two audio-frequency informations.
Second aspect, a kind of information processing unit, which is characterized in that including:
Acquisition module, for acquiring the first audio-frequency information, wherein the first audio-frequency information includes:Melodic information, cadence information
And at least one of timbre information;
Generation module, for generating and associated second audio-frequency information of first audio-frequency information, wherein the first audio letter
Cease from second audio-frequency information content it is at least partly different.
In some embodiments, first audio-frequency information with second audio-frequency information content at least partly not
Together, including at least one of:
The playing duration of second audio-frequency information is different from the playing duration of first audio-frequency information;
The first melodic information at least portion of second melodic information of second audio-frequency information and first audio-frequency information
Divide difference;
The first cadence information at least portion of second cadence information of second audio-frequency information and first audio-frequency information
Divide difference;
First timbre information of the second timbre information of second audio-frequency information and first audio-frequency information is at least
Part is different.
In some embodiments, first timbre information and second timbre information include at least one of:
First kind timbre information, wherein the first kind timbre information includes:The timbre information of voice;The voice
Timbre information includes at least one of:The tone color of male voice, the tone color of female voice, the tone color of child's voice, at least two class voice mix shape
At mixing voice;
Second class timbre information, wherein the second class tone color information includes:The timbre information of musical instrument;
Third class timbre information, wherein the third class timbre information is:Tone color other than the voice and the musical instrument
Information.
In some embodiments, the generation module is specifically used for executing at least one of:
According to the Audio attribute information of first audio-frequency information, second audio-frequency information is generated;
According to the corresponding customer attribute information of first audio-frequency information, second audio-frequency information is generated.
In some embodiments, the generation module is specifically used for executing at least one of:
According to the melody characteristics attribute of first audio-frequency information, rhythm characteristic attribute, tamber characteristic attribute, style of song attribute
And at least one of music type attribute, generate second audio-frequency information.
In some embodiments, the generation module is specifically used for playing record according to the user preference information, audio
Information, emotional state information and user indicate at least one of information, generate the second audio-frequency information.
In some embodiments, the generation module is specifically used for executing at least one of:
According to the emotional state information, the duration of second audio-frequency information is determined;
Information is indicated according to the user, determines the duration of second audio-frequency information;
Information is indicated according to the user, continues to generate second audio-frequency information;
Information is indicated according to the user, restores to generate second audio-frequency information;
Information is indicated according to the user, stops generating second audio-frequency information;
According to the emotional state information, continue to generate second audio-frequency information;
According to the emotional state information, stop generating second audio-frequency information;
According to the emotional state information, restore to generate second audio-frequency information
Information is indicated according to the emotional state information and the user, determines the duration of second audio;
Information is indicated according to the emotional state information and the user, continues to generate second audio;
Information is indicated according to the emotional state information and the user, stops generating second audio;
Information is indicated according to the emotional state information and the user, restores to generate second audio.
In some embodiments, the generation module is specifically used for using audio processing model to first audio-frequency information
It is handled, output and second audio-frequency information.
In some embodiments, the generation module is specifically used for generating described second according to first audio-frequency information
First music score of Chinese operas information of audio-frequency information;First lyrics letter of second audio-frequency information is generated according to first audio-frequency information
Breath;The first music score of Chinese operas information and the first lyrics information generated according to first audio-frequency information is synthesized, is generated and second sound
The corresponding song files of frequency information.
Information processing method and device provided in an embodiment of the present invention, can be certainly after collecting first audio-frequency information
It is dynamic to generate associated second audio-frequency information of the first audio-frequency information of language, in this way, can first audio-frequency information based on acquisition it is dynamic
State generates second audio-frequency information related to the first audio-frequency information and that content is at least partly different, and being equivalent to electronic equipment can
Audio is created automatically with the audio-frequency information acquired based on one, is currently associated with the first audio-frequency information to meet listening to for user
And the second audio-frequency information of dynamic generation demand, electronic equipment has the characteristics that intelligent high and user satisfaction is high.
Detailed description of the invention
Fig. 1 is the flow diagram of the first information processing method provided in an embodiment of the present invention;
Fig. 2 is that a kind of second audio-frequency information provided in an embodiment of the present invention generates schematic diagram;
Fig. 3 is the structural schematic diagram of the first information processing unit provided in an embodiment of the present invention;
Fig. 4 is a kind of melody schematic diagram of twelve-tone equal temperament provided in an embodiment of the present invention;
Fig. 5 is a kind of schematic diagram of standard sine wave provided in an embodiment of the present invention;
Fig. 6 is a kind of staff schematic diagram of music score provided in an embodiment of the present invention;
Fig. 7 is the schematic diagram of line on music score shown in Fig. 6 provided in an embodiment of the present invention;
Fig. 8 is the schematic diagram of another music score line provided in an embodiment of the present invention.
Specific embodiment
Technical solution of the present invention is further described in detail with reference to the accompanying drawings and specific embodiments of the specification.
As shown in Figure 1, the embodiment of the present invention provides a kind of information processing method, including:
Step S110:Acquire the first audio-frequency information, wherein the first audio-frequency information includes:Melodic information, cadence information and sound
At least one of color information;
Step S120:Generate with associated second audio-frequency information of first audio-frequency information, wherein the first audio-frequency information with
The content of second audio-frequency information is at least partly different.
In the present embodiment, the information processing method can be for applied in the first electronic equipment, by the first electronic equipment
First audio-frequency information is acquired using microphone etc., first audio-frequency information can be space where first electronic equipment
The audio-frequency information generated after interior any sound is collected is acquired for example, user hums song by first electronic equipment
Generate first audio-frequency information.
First electronic equipment voluntarily can generate described second according to first audio-frequency information in the step s 120
Audio-frequency information, alternatively, be that the relevant information of first audio-frequency information or the first audio-frequency information is submitted to the second electronic equipment,
Second audio-frequency information is generated by the second electronic equipment, and is received by the first electronic equipment from the second electronic equipment.
Second audio-frequency information is according to the first audio-frequency information dynamic generation, second audio in the present embodiment
Information may be dynamically generated completely new audio-frequency information.In the local of first electronic equipment or the second electronics of its connection
May all not have in equipment.First audio-frequency information may include at least one of in the present embodiment:
Melodic information, the melodic information form melody after being played;
Cadence information, the cadence information generate certain rhythm after being played;
Timbre information, timbre information determines the frequency of sound fluctuation, and from auditory perception, the sound of user's perception is not
It is consistent.
In the present embodiment, second audio-frequency information is generated based on the first audio-frequency information, first audio-frequency information
There is relevance with the second audio-frequency information, this relevance is embodied in first audio-frequency information and the second audio-frequency information can
At least partly identical, at the same time, the first electronic equipment itself or the second electronic equipment connecting with the first electronic equipment are logical
It crosses after processing, obtains the audio-frequency information at least partly different from the first audio-frequency information, this difference can be embodied in:Melody letter
At least one of breath, cadence information and timbre information.
In this way, after user hums several to the first electronic equipment, the second electronics of the first electronic equipment itself or request
Equipment generates and plays associated second audio-frequency information of the first audio-frequency information with humming acquisition, in this way, being equivalent to the first electronics
Humming of the equipment based on user plays the second audio-frequency information automatically and/or creates automatically and play the second audio-frequency information, in this way, really
The audio that warranty family is heard every time may be all different, meets this special voice frequency listening demand of user, to promote user
User satisfaction.
It is hummed due to the first audio-frequency information derived from the user of the first electronic equipment acquisition, alternatively, user selects broadcasting,
Or user controls the audio-frequency information of the sound of generation by other means, the first audio information source is from the control of user in a word,
Characterize user it is current the wishes such as listen to or create.If generating the second audio-frequency information according to the first audio-frequency information, generate at this time
The second audio-frequency information be equally the wish for being able to reflect user, or meet user demand, in this way, realizing dynamic generation institute
The characteristics of the second audio-frequency information is to meet user's current demand is stated, the intelligence and user for improving electronic equipment use satisfaction
Degree.
In some embodiments, first audio-frequency information and the content of second audio-frequency information are at least partly different,
Including at least one of:
The playing duration of second audio-frequency information is different from the playing duration of first audio-frequency information;
The first melodic information at least portion of second melodic information of second audio-frequency information and first audio-frequency information
Divide difference;
The first cadence information at least portion of second cadence information of second audio-frequency information and first audio-frequency information
Divide difference;
First timbre information of the second timbre information of second audio-frequency information and first audio-frequency information is at least
Part is different.
First audio-frequency information is the first duration according to the playing duration that predetermined playback rate plays out;Second information
It is the second duration according to the playing duration that the predetermined playback rate plays out;First duration is different from the second duration.
For example, the playing duration of the first audio-frequency information is equal to the duration of user's humming, and it may just several seconds, for example, 5 seconds, 10 seconds etc..Second
Second duration of audio-frequency information can be greater than first duration, for example, the playing duration of second audio-frequency information can be equal to one
The average playing duration of song, for example, any duration between 2 minutes, 3 minutes, 2 to 5 minutes.In this way, being equivalent to user
Several are hummed, the first electronic equipment is just triggered and obtains the song generated based on the humming, the first electronic equipment plays should
Song, in this way, user hums different, to obtain song differences, user's humming is identical, and the second audio-frequency information of dynamic making is corresponding
Song it is also different, so that the difference for meeting user listens to demand, the intelligence and user for improving electronic equipment use full
Meaning degree.
In some embodiments, first timbre information and second timbre information include at least one of:
First kind timbre information, wherein the first kind timbre information includes:The timbre information of voice;The voice
Timbre information includes at least one of:The tone color of male voice, the tone color of female voice, the tone color of child's voice, at least two class voice mix shape
At mixing voice;
Second class timbre information, wherein the second class tone color information includes:The timbre information of musical instrument;
Third class timbre information, wherein the third class timbre information is:Tone color other than the voice and the musical instrument
Information.
Tone color is distinguished according to sounding body, then tone color can at least be divided into above-mentioned three kinds of tone colors, voice, musical instrument sound and voice
With other tone colors other than musical instrument sound, for example, it is various using non-musical instrument simulate onomatopoeia.
And voice can be divided into male voice, female voice, child's voice, various mixing voice, for example, the tone color of men and women's compound voice;At
The mixing voice of year male voice and child's voice;The mixing voice that adult schoolgirl mixes with child's voice.
The male voice is in the present embodiment:The sound that man after the change of voice phase issues, and the male that grows up can be referred to as
Sound;The female voice can be:The sound that woman after the change of voice phase issues, and adult female voice can be referred to as.
Child's voice includes the various voice before the change of voice phase.
Second class tone color is the timbre information of musical instrument, for example, the timbre information of the timbre information of percussion instrument, string music, pipe
The timbre information of the various musical instruments such as happy timbre information.
Third class timbre information, it may include:The sound of electronic device, the sound of Switch for door and window, the sound etc. of animal are various
Sound.
As shown in Fig. 2, the step S120 may include at least one of:
According to the Audio attribute information of first audio-frequency information, second audio-frequency information is generated;
According to the corresponding customer attribute information of first audio-frequency information, second audio-frequency information is generated.
Audio attribute information herein can be the information extracted from first audio-frequency information, including but not limited to melody
Information, cadence information, timbre information, style of song information, music type information etc..
The customer attribute information can be the customer attribute information of the sounding user of first audio-frequency information, be also possible to
The customer attribute information for holding user of first electronic equipment can also be the voice applications of the first electronic equipment operation
Using the customer attribute information of account institute home subscriber.
The customer attribute information may include:The various information such as gender, age, region, occupation, hobby.
In the present embodiment, when generating the second audio-frequency information, not only in conjunction with the information itself of the first audio-frequency information, can also
It is generated in conjunction with Audio attribute information and/or customer attribute information.
The Audio attribute information according to first audio-frequency information generates second audio-frequency information, including following
At least one:
According to the melody characteristics attribute of first audio-frequency information, rhythm characteristic attribute, tamber characteristic attribute, style of song attribute
And at least one of music type attribute, generate second audio-frequency information.
The melody characteristics attribute description melody feature of first audio-frequency information, for example, utilizing melody characteristics category
Property describe the first audio-frequency information melody feature be it is impassioned, still releive.
The rhythm characteristic attribute description rhythm feature of first audio-frequency information, for example, the first audio-frequency information is
2/4 music clapped or the music etc. of 3/4 bat.
The tamber characteristic attribute description tone color feature of first audio-frequency information, for example, the first audio-frequency information
Tone color is based on male voice or based on schoolgirl, is the tone color of musical instrument or the tone color of voice or other kinds of tone color.
The style of song attribute description music style or school of first audio-frequency information.
The music type describes the music type of the first audio-frequency information, for example, be rock music or country music,
Or other kinds of music.
In some embodiments,
It is described that second audio-frequency information is generated according to the corresponding customer attribute information of first audio-frequency information, including:
Record information, emotional state information and user, which are played, according to the user preference information, audio indicates information extremely
It is one of few, generate the second audio-frequency information.
For example, user may input the preference information of oneself, such as the audio of hobby;It can also be according to the broadcasting of user
Record automatically generates the preference informations such as the audio of the user preferences.
The user preference information may also include in some embodiments:The singer of user preferences;Utilize user preferences
The tone color of singer generates second audio-frequency information.
Audio plays record information:The audio that historical time inner electronic equipment played is had recorded, was played
The Audio attribute information etc. that audio has.
The emotional state information may include:Using Image Acquisition or audio collection, pass through facial Expression Analysis in image
Or the extraction of sound emotional state, the current emotional state of user is obtained, according to the emotional state information of the user, determines the
The information such as melody, the rhythm of two audio-frequency informations.
First audio-frequency information and the second audio-frequency information may also include in some embodiments:Lyrics information etc.;Another
First audio-frequency information described in some embodiments and the second audio-frequency information may also include:The language message of lyrics pronunciation, for example, should
Language message determines that the broadcasting of the second audio-frequency information is that Chinese plays, English plays or other soundplays.
In some embodiments, described that note is played according to the corresponding user preference information of first audio-frequency information, audio
Record information, emotional state information and user indicate information at least one, generate the second audio-frequency information, including it is following at least
One of:
According to the emotional state information, the duration of second audio-frequency information is determined;For example, if the emotional state information
Show that user also wants to listen, for example, expression is very intoxicated in music, the second audio-frequency information is just played with biggish duration;If really
Timing length is then to can choose longer one there are two by alternate item;In some embodiments, emotional state can also be believed
Breath scores, and using the scoring as input, calculates the duration using specific function.
Information is indicated according to the user, determines the duration of second audio-frequency information;For example, user is grasped by gesture
The instructions such as work, voice operating, line of sight operation stop playing, continue to play or extend broadcasting etc., based on this come when determining described
It is long.
The second audio-frequency information of the generation further includes one or more below:
Information is indicated according to the user, continues to generate second audio-frequency information;
Information is indicated according to the user, restores to generate second audio-frequency information;
Information is indicated according to the user, stops generating second audio-frequency information;
According to the emotional state information, continue to generate second audio-frequency information;
According to the emotional state information, stop generating second audio-frequency information;
According to the emotional state information, restore to generate second audio-frequency information
Information is indicated according to the emotional state information and the user, determines the duration of second audio;
Information is indicated according to the emotional state information and the user, continues to generate second audio;
Information is indicated according to the emotional state information and the user, stops generating second audio;
Information is indicated according to the emotional state information and the user, restores to generate second audio.
In some embodiments, the step S120 may include:Using audio processing model to first audio-frequency information
It is handled, output and second audio-frequency information.
Audio processing model herein can be various types of models, for example, various big data models, big data herein
Model can be to train the model generated using sample data, for example, neural network model, vector machine model, regression model etc..It is logical
Excessive data model is the second audio-frequency information described in input meeting dynamic generation with the first audio-frequency information.
In some embodiments, the step S110 may include at least one of:
The first music score of Chinese operas information of second audio-frequency information is generated according to first audio-frequency information;
The first lyrics information of second audio-frequency information is generated according to first audio-frequency information;
The first music score of Chinese operas information and the first lyrics information that generate according to first audio-frequency information are synthesized, is generated and described the
The corresponding song files of two audio-frequency informations.
Music score of Chinese operas information herein may include:Music notation is converted to, for example, staff or numbered musical notation file etc..
The lyrics information may include:With the lyrics of various languages, for example, the lyrics etc. that Chinese is write.
In the present embodiment, music score of Chinese operas information representation melodic information above-mentioned, cadence information etc..
In this way, after the second audio-frequency information dynamic generation, also as being converted to the first music score of Chinese operas information and the first lyrics information
It is recorded, for example, being recorded in a manner of song files, if user thinks that pleasing to the ear can later again tap on is broadcast
Put the audio;In this way, the dynamic for not only realizing the second audio is created, but also also achieve record and the follow-up play of song.
In some embodiments, the method also includes:
The song files and/or the second audio-frequency information are forwarded to premise equipment, for example, social interaction server device, as society
Hand over the component part publication of information;It is recorded storage for another example being submitted to multimedia information lib or is downloaded for other people.
In some embodiments, the step S120 may include:
Corresponding first music score of Chinese operas information of first audio-frequency information is adjusted according to preset musical note, to generate second music score of Chinese operas letter
Breath.
In some embodiments, described that first music score of Chinese operas information is adjusted according to preset musical note, to generate second music score of Chinese operas
Information, including at least one of:
First music score of Chinese operas information is adjusted according to sine wave rule, musical note is met with generation and meets the sine wave rule
Second music score of Chinese operas information.
For example, the descant of each beat of the first music score of Chinese operas information can be connected, then with the waveform of approximate sine wave
The descant is adjusted, sine wave herein can be to be greater than first threshold with the similarity of the sine wave of standard and less than the second threshold
The special-shaped wave of value.The second threshold is greater than the first threshold, and the first threshold and second threshold can be between 0 to 1
Value.
For another example the double bass of each beat of the first music score of Chinese operas information can be connected, then with the wave of approximate sine wave
Shape adjusts the double bass, and sine wave herein can be to be greater than third threshold value with the similarity of the sine wave of standard and less than the 4th
The special-shaped wave of threshold value.The third threshold value is less than the 4th threshold value, the 4th threshold value and third threshold value can for 0 to 1 it
Between value.
In further embodiments, position of each sound in wireless spectrum in the first music score of Chinese operas information is connected, line is formed
Shape be aforementioned approximate sine wave.
In some embodiments, the first threshold can be equal to the third threshold value, and/or, the third threshold value can wait
In the 4th threshold value.
In some embodiments, described that first music score of Chinese operas information is adjusted according to preset musical note, to generate second music score of Chinese operas
Information, including:
First music score of Chinese operas information is adjusted according to the changing rule of the music introduction, elucidation of the theme, the music is met with generation and is held
Turn second music score of Chinese operas information of the changing rule closed.
The changing rule of the introduction, elucidation of the theme can be used for reacting the variation of melody or rhythm, for example, the changing rule of the introduction, elucidation of the theme
Audio is divided into 4 periods, is to start period, accept period, climax period and terminate period respectively.
This 4 periods meet scheduled sequencing, for example, sequencing is successively:Start period, accept period, height
Damp period to terminate period.One complete second audio is needed comprising this 4 periods.Rotation between this 4 period any two
Rule difference and/or tempo discrepancy meet predetermined relationship.
For example, constructing the start-up portion for starting period to collect the first audio-frequency information;It is then based on the first audio
Information automatically begins to create second audio-frequency information according to the changing rule of the introduction, elucidation of the theme.
For example, the melody of climax period is most impassioned or most droning by taking melody as an example;Accept period and terminate period it is impassioned or
Droning degree is slightly weaker than the climax period.Start period and is weaker than undertaking period.
For example, the rhythm of climax period is most fast by taking rhythm as an example;It accepts period and terminates period rhythm and be slower than the climax
Period.Start period and is slower than undertaking period.
It is more slowly as accepting period and terminating the impassioned or droning degree of period how many or rhythm weaker than climax period
It is few, then a random number can be generated based on random function, the random number based on the generation be handled, to generate described the
Two audio-frequency informations.
Therefore in some embodiments, audio processing model above-mentioned can also be one meet the sine wave rule and/
Or the audio model of introduction, elucidation of the theme changing rule can be by adopting several introducings, so that raw when handling above-mentioned variation at random
At the second audio-frequency information have more variability.
For example, meeting the second audio of key player on a team's wave rule, the fluctuating range of approximate sine wave can satisfy described and forward
The specific power of the changing rule of conjunction, single chapter can be determined based on the random value that the random function is randomly generated.
In some embodiments, the method also includes:
Generate lyrics information, wherein the lyrics information corresponds to first music score of Chinese operas information, alternatively, the lyrics are believed
Breath corresponds to second music score of Chinese operas information, and second music score of Chinese operas information is to be generated based on first music score of Chinese operas information.
For example, receiving the lyrics that user speech inputs from man-machine interactive interface, the corresponding audio-frequency information of the lyrics is turned
It is changed to the corresponding lyrics information of language-specific.
In some embodiments, the generation lyrics information, including:
Acquire the second audio-frequency information;
Second audio-frequency information is converted into the lyrics information.
In some embodiments, described that second audio-frequency information is converted into the lyrics information, including:
Pronunciation markup information is generated according to second audio-frequency information;Pronunciation markup information herein can be to be directly based upon hair
What sound converted, it is not limited to different language;
It is carried out on track according to the pronunciation identification information with corresponding music score of Chinese operas information corresponding;In this way, completing music score of Chinese operas letter
The control of breath and lyrics information, to generate second audio-frequency information and/or the song files.
Optionally, according to the music score of Chinese operas parameter of the corresponding music score of Chinese operas information of the lyrics information, the lyrics information is automatically generated.
In some embodiments, the method also includes:
Generate playing information, wherein playing information is first music score of Chinese operas information or corresponding with first music score of Chinese operas information
The second music score of Chinese operas information target object play generate audio-frequency information, the target object be target musical instrument, target organism or
At least one of target object, wherein the target object is different from the target musical instrument and the target organism.The mesh
Marking object can be any sounding body above-mentioned.
In some embodiments, the generation playing information, including;
According to first music score of Chinese operas information or the music score of Chinese operas parameter of second music score of Chinese operas information, the playing information is generated.
In further embodiments, described to be joined according to the music score of Chinese operas of first music score of Chinese operas information or second music score of Chinese operas information
Number, generates the playing information, including:
According to the rhythmic parameters of first music score of Chinese operas information or second music score of Chinese operas information, style of song parameter, emotion in addition, institute
The method of stating further includes:Synthesize the performance letter of music score of Chinese operas information, lyrics information corresponding with the music score of Chinese operas information and the music score of Chinese operas information
At least two in breath, to generate song files, wherein the music score of Chinese operas information is first music score of Chinese operas information or with described first
Corresponding second music score of Chinese operas information of music score of Chinese operas information.
In some embodiments, the method also includes:Detect labeling operation;The song is put according to identification operation examination
Bent file modifies to the song files.In this way, user is allowed to modify the second audio-frequency information of generation, obtain
Meet the audio-frequency information of self-demand.
As shown in figure 3, the present embodiment provides a kind of information processing units, including:
Acquisition module 110, for acquiring the first audio-frequency information, wherein the first audio-frequency information includes:Melodic information, rhythm
At least one of information and timbre information;
Generation module 120, for generating and associated second audio-frequency information of first audio-frequency information, wherein the first sound
Frequency information and the content of second audio-frequency information are at least partly different.
In some embodiments, first audio-frequency information and the content of second audio-frequency information are at least partly different,
Including at least one of:
The playing duration of second audio-frequency information is different from the playing duration of first audio-frequency information;
The first melodic information at least portion of second melodic information of second audio-frequency information and first audio-frequency information
Divide difference;
The first cadence information at least portion of second cadence information of second audio-frequency information and first audio-frequency information
Divide difference;
First timbre information of the second timbre information of second audio-frequency information and first audio-frequency information is at least
Part is different.
In some embodiments, first timbre information and second timbre information include at least one of:
First kind timbre information, wherein the first kind timbre information includes:The timbre information of voice;The voice
Timbre information includes at least one of:The tone color of male voice, the tone color of female voice, the tone color of child's voice, at least two class voice mix shape
At mixing voice;
Second class timbre information, wherein the second class tone color information includes:The timbre information of musical instrument;
Third class timbre information, wherein the third class timbre information is:Tone color other than the voice and the musical instrument
Information.
In further embodiments, the generation module 120 is specifically used for executing at least one of:
According to the Audio attribute information of first audio-frequency information, second audio-frequency information is generated;
According to the corresponding customer attribute information of first audio-frequency information, second audio-frequency information is generated.
In further embodiments, the generation module 120 is specifically used for executing at least one of:
According to the melody characteristics attribute of first audio-frequency information, rhythm characteristic attribute, tamber characteristic attribute, style of song attribute
And at least one of music type attribute, generate second audio-frequency information.
In further embodiments, the generation module 120 is specifically used for according to the user preference information, audio
At least one for playing record information, emotional state information and user's instruction information, generates the second audio-frequency information.
In further embodiments, the generation module 120 is specifically used for executing at least one of:
According to the emotional state information, the duration of second audio-frequency information is determined;
Information is indicated according to the user, determines the duration of second audio-frequency information;
Information is indicated according to the user, continues to generate second audio-frequency information;
Information is indicated according to the user, restores to generate second audio-frequency information;
Information is indicated according to the user, stops generating second audio-frequency information;
According to the emotional state information, continue to generate second audio-frequency information;
According to the emotional state information, stop generating second audio-frequency information;
According to the emotional state information, restore to generate second audio-frequency information
Information is indicated according to the emotional state information and the user, determines the duration of second audio;
Information is indicated according to the emotional state information and the user, continues to generate second audio;
Information is indicated according to the emotional state information and the user, stops generating second audio;
Information is indicated according to the emotional state information and the user, restores to generate second audio.
In further embodiments, the generation module 120 is specifically used for using audio processing model to described first
Audio-frequency information is handled, output and second audio-frequency information.
In further embodiments, the generation module 120 is specifically used for generating institute according to first audio-frequency information
State the first music score of Chinese operas information of the second audio-frequency information;The first song of second audio-frequency information is generated according to first audio-frequency information
Word information;The first music score of Chinese operas information and the first lyrics information that generate according to first audio-frequency information are synthesized, is generated and described the
The corresponding song files of two audio-frequency informations.
Several specific examples are provided below in conjunction with above-mentioned any embodiment:
Example 1:
This example provides a kind of audio processing system, system composition summary:The system is by hardware system, software application journey
Sequence, cloud network big data are constituted.
(1) hardware system composition summary:The microphone input equipment of high sampling rate, the electronic equipment one of operational performance brilliance
Platform, the network broadband not less than 50M;
(2) groups of software applications is at general introduction:Audio recording based on DirectSound and application is played, is based on
Central processor (CPU) adds the data processing module of image processor (GPU), is based on artificial intelligence (artificial
Intelligence, AI) sample collection, frequency spectrum discerning, feature extraction, depth training, prediction and analysis technology AI learn mould
Block, increase, reading, update and deletion (Create Retrieve Update Delete, CRUD) module based on database,
Voice annotation generation module based on speech recognition, based on Advanced Audio Coding (Advanced Audio Coding, AAC)
Advanced Audio Codec module.
(3) cloud network big data:It stores the codeless music source data of magnanimity grade and corresponding is based on frequency spectrum graph code
Music language, the brief music period of this external storage magnanimity grade, length be 1 to 8 syllable differ.
The function of the system is mainly exactly to be aided with cloud big data by the voice data of input terminal capture user's humming
Generate the song for meeting user's creation intention.
The workflow of the audio processing system can be as follows:
(1) it repairs logical acquisition countdown to start, after countdown 3s, the sound source of user's humming is captured using microphone as input terminal
Data do not encode and are directly transferred to PC (Personal Computer, PC).
(2) PC obtains the corresponding spectrum figure of sound source data, and the spectrum signal without tone color without noise by signal processing;
(3) it indicates to select certain musical instrument based on user, it is raw to being rendered without tone color without the spectrum signal of noise for generation
At carrying out audition after music.
(4) it pilots and generates audio, to facilitate user to do several times to song (corresponding to aforementioned second audio-frequency information) is generated
Audition, every time to the music of audition based on initial time stamp section period carry out can increment text marking, distinguish oneself
Think satisfied part and unsatisfied part;Then it uses to change with user and lead, the strategy supplemented by the big data of AI intelligence cloud,
AI can modify again unsatisfied part, for example, at least one of modification rhythm, melody and volume, until satisfaction
The mark file for saving spectrogram afterwards and doing for the spectrogram, is stored as predetermined format for the mark file, for example, being stored as
The file format of suffix name sll.The sll spelling is written as SweetLover Language.
(5) at this point, for additionally there is the user of performance demand
<1>Can be inserted under existing spectrogram using English 26 letters pronunciation track (the pronunciation family of languages include but
It is not limited only to meet the Modern Chinese Chinese speech pronunciation family of languages, the archaic Chinese pronunciation family of languages, the Japanese pronunciation family of languages, English hair of music tone
The sound family of languages, Rome pronounce the family of languages, the Latin pronunciation family of languages), be stored as pronunciation file (I calls it as music voice file, after
Sewing name can be slv;The slv is written as SweetLover Voice entirely).
<2>Also speech recognition class software can be used, non-language is converted into using English 26 by pronunciation described in user
Letter, but whether the pronunciation mark for needing to verify speech recognition if doing so is wrong, and each pronunciation mark is specific
Which note snapped on frequency spectrum needs user to handle manually.
<3>Further, it can be recorded with the performance WAV format of direct recording oneself, be then introduced into preparation synthesis.
(6) finally, the spectrogram track and every pronunciation track that user records music information to every respectively generate respectively
(each spectrogram track create-rule is based on corresponding sll file, Mei Gefa to the voice of the music of specified musical instrument and specified source of sound
Track road create-rule is based on slv file), these audio files ultimately generated are the WAV format of uncoded compression, and again
It is secondary to save each spectrogram track and corresponding music score (optional staff or numbered musical notation) and each pronunciation track, do last wash with watercolours
Dye, exporting the audio source data of the WAV format completed for the rendering of selected musical instrument, (rendering can be carried out based on period, and each period
Different musical instruments can be used, the file that similarly pronounces can also be carried out based on period, and each period is applicable in different sources of sound).
The technical solution that this example provides reduces musical composition so that not having the ordinary user of higher musical quality
Threshold promotes the universal of musical composition --- and allow user to catch inspiration written in water in oneself life;For professional music system
Make people, composer, provides productivity higher tools.
Example 2:
PC obtains the corresponding spectrum figure of sound source data, and the spectrum signal without tone color without noise by data processing, can
Including:
PC is to the data after the high frequency sampling of microphone, temporarily according to hypothesis BPM=120.0 next 128 dieresis time
Interval (i.e. the time interval that 0.5s is 0.015625s divided by 128 multiplied by 4) is raw to the slice of audio stream equal equal part from the beginning to the end
At blocks of audio frame;
Using the parallel blended data processing mode of GPU is serially added based on CPU, every time to the audio of one group of maximum quantity
Frame carries out parallel Fast Fourier Transform (FFT) and synchronizes, until all audio frames have been processed, thus by each audio frame
Time-domain signal is changed into frequency-region signal (only frequency, energy information, without the signal of any tone color);
It reuses and serially adds the parallel blended data processing mode of GPU based on CPU, find out in each audio frame energy most
Big frequency-region signal is recognized as main signal, then carries out signal denoising, gets rid of other frequency-region signals.
After noise removes, need first to carry out error correction.Herein because of some limitations (twelve-tone equal temperament), together
When be also the odjective cause frequency of humming (user may be not allowed), it is necessary to according to similar figure on the basis of the several groups scale of piano
4 standard carries out error correction.Fig. 4 can be a kind of relevant information of twelve-tone equal temperament, and horizontally-arranged expression is scale;File indicates
Be frequency.
It after amendment, needs to enhance the signal filtered out, time domain is reverted to by frequency domain, when the time domain of reduction is believed
When time interval less than 0.015625s of number duration, need to supplement default part, the signal frequency and reduction of supplement
Signal frequency it is consistent, and the signal after having supplemented must ensure entire when temporally sequentially the full time interval is filled in arrangement
The amplitude of signal meets damping vibration model (the phenomenon that sound that i.e. musical instrument vibration issues gradually is decayed) in time interval;
Then according to time span and scale height generate spectrogram because processing recording when, be according to isochronous surface into
Capable, so the long that a tone is constant, will become several minors after slice, but it is identical to be temporarily not connected to these
Minor (such as the recording of a whole note E2 at practical BPM=120, according to one 128 points of practical BPM=120
Note is sliced, and 128 audio frames can be just cut into), specific connection is also required to that composer is transferred to determine, the later period
AI be given only relevant intelligent prompt and suggestion under no tab area because following period is not for artistic creation
With:
" 5--- | " and " 5555 | " is different.But the latter can be detected by AI and provide prompt, inquire user
Whether it is recognized as similar to a continuous long as the former.
In frequency spectrum map generalization, the shape of this spectrogram can be saved, this spectrogram is subsequent AI according to a large amount of number
The key (be based on shape Rapid matching, and amending advice is proposed according to user demand) that creation is suggested is proposed according to template.
Presented below is an example of the corresponding spectrum figure of melody selections:
2. user can choose certain musical instrument, to rendering without tone color without the spectrum signal of noise for generation, sound is generated
Audition is carried out after pleasure, is equivalent to the timbre information based on the second audio of user input selection, is also possible to be selected automatically by AI model
Select timbre information.
3. user does audition several times to song is generated, every time the period to the music of audition based on initial time stamp section
Carry out can increment text marking.
Firstly, for novice users, AI system can provide prompt according to specified step, for confirming that oneself identity is special
The user of industry composition people etc., AI system can show due prompt in each step simultaneously on entire movement.Assume below
It is novice users, illustrates which prompt AI can provide in an orderly manner with this:
(1) firstly, AI system can generate a preliminary advice, this is proposed to be used in prompt user's spectrogram, continuous several
Whether the identical frequency spectrum of a height can regard that (if user is not handled, AI default assumes that it is continuous for one to a long as
Long), then allow user using mark confirmation, can be by marked content storage to generation after user confirm at one
In one complete Markup Language file * .sll, facilitates user to cancel the state before being restored to change in time, completed to user
After the mark that all users oneself in prompt need, step 2 prompt is executed;
(2) then, AI system prompt user selects the musical instrument (this way is called rendering setting) for playing specified period, and
These musical instruments include but is not limited to common piano, violin etc..These musical instruments refer to these real instruments in identical dynamics
The recording source of sound of the lower each tone for playing a duration 4s, because spectrogram saves the core frequency information of melody,
Playing an instrument for the period will be specified to the source of sound selection mark between each region in spectrogram according to spectrogram and user, when
There are the existing pianos of some period to allowing when corresponding to for the musical instrument that any one period has at least one specified in entire spectrogram
Mark has the case where violin mark again, and the reality that such case indicates is meant that the period is needed while being drilled with both musical instruments
Play), user just can choose confirmation step 2 mark and complete.
After the completion of user annotation, system will execute Mixed Audio Algorithm, according to mark and frequency spectrum generate this complete movement without
The WAV audio file (such way is called rendering) of damage, AI system can prompt the user whether that audition oneself is needed to select at this time
Period play, user can carry out audition several times, can be added after marking some period and requiring to play and render every time
The corresponding portion of wav file carries out audition, and user can be inserted into timestamp label during audition, can also be in audition knot
Period after beam or termination audition, between 2 adjacent time points of time point or even user oneself selection to timestamp label
Then additional mark repeats (1) (2) (3) step, or request AI system is mentioned about what mark period where composition melody was corrected
Show, then AI system can provide the rapid prompt of step 4.
Period aspect of the AI system to user annotation, the main music data according to spectrogram matching network cloud magnanimity,
And the attribute of the period is immediately provided, including but not limited to:The spectrogram (essential attribute) of the period, style, emotion, performance
Musical instrument etc., and requry the users the requirement of user:
<1>If it is dissatisfied to style, emotion, then user can input desired style emotion, then AI according to
Spectrogram with the type style emotion, the example frequency caster for first generating 1=C give user's audition, and user can select at any time
It selects oneself this caster of direct manual modification or AI is required to pass through mother using the related algorithm write out according to theory of composition
The practical period that certain composition skill of template generation user selection generates is advised to it allows it to confirm, as according to composition
The algorithm that theory is write is discussed later;
<2>It is unsatisfied with if it is to volume in musical instrument, period, after only needing the additional mark of user, again to the period
Rendering replaces the corresponding portion of old WAV, this portion of techniques is exactly DirectSound, FILE I/O the relevant technologies, no
Hardly possible is realized.
4. machine sings (for there is the user of performance demand):
Pronunciation mark problem:According to the qualitative of software, and need to take into account the pronunciation scheme in country variant area, because of early stage electricity
Brain uses English character, therefore labelling schemes are also selection English character mark;Again because there is pronunciation family of languages sieve in early history
The movement of horseization, therefore most pronunciations without tone, whether Chinese character, Japanese, Korean etc., can use the ground of 1 byte
Area encodes (maximum can represent 255 areas) and marks plus English character (serving as pronunciation character);
Pronunciation mark and SweetLover Voice document No.:Because of some features in terms of phonetics, especially when
When sending out a long, the sound sent out when long is spun out is the sound under a stable state, (and pronounce be dispute nose mouth initiate one
A consonant serves as distinctive tone, then is made up of the breath under stable state), so the mark of pronunciation aspect is opened similar to flower
The mask method of process specifies syllable subscript to infuse consonant, in the part of tenuto, marks the sound of state change every time at some.Such as
Sing this sound of phonetic yuan, it is assumed that this sound needs to sing 4s, and singer sings and follows pronunciation law, then the process of performance
In along with nozzle type variation pronunciation variation should be:Yu-yuan-an, mark should be just in corresponding note in the following, by
It is marked according to similar mode.
<1>In currently assuming that 0x01 is indicated as 2052 in operating system, it is as follows to provide example:
In above-mentioned mark, the 0x01 of 1 byte instruction area belonging to pronunciation mask method below, and immediately an it is subsequent-
Indicate the extension of hair this sound of an.
<2>When user is inputted using voice, it should say these three sounds of yuyuanan, then oneself adjustment position, meeting
It is arduous to compare;
<3>If the sound that user's direct recording oneself is sung is WAV, it is then introduced into, is more convenient.
In some scenes, all WAV should use identical microphone input equipment, identical sample rate, sound
Road number, bit depth, guard against deviations.
5. the AI judgement of cloud network big data can be as follows:
How the AI module of intelligent compositing system uses cloud network big data, automatically generates in conjunction with the theory of composition aspect
Period completion.It first has to make theoretical place mat:
{1}:Period frequency spectrum (need to be risen and be forwarded according to 4 segmentations for meeting literature antithesis, rhythm sentence based on the standard of Sine-Fitting
Close) generate core rhythm.Core rhythm herein is main rhythm, can be rhythm most in the second audio-frequency information;Or climax
Partial rhythm.
It is the big tune card agriculture of staff version D below, quoted from Baidu's picture, staff of failing to understand is not serious, if five lines
Small tadpole head in spectrum all connects, so that it may the spectrum fragmentation that Track 2 is similar in similar above-mentioned picture is formed,
Make sure to keep in mind after all linking up to need to carry out all period laterally arrangement connection, i.e., I annotate in 1 end of period that marks
With the beginning of period 2 (be that serial number 1 marks), it is all end to end that 2 end of period and period 3 start ... and so on
Laterally arrangement can find that the rise and fall amplitude span in each period period is not that very greatly, not will form and be similar to sharply
The zigzag audible spectrum figure of noise, formation is entirely regular but endless similar to SIN function f (x)=sin x fluctuating
It is complete similar.
Fig. 5 is the waveform diagram of the sine wave of a standard.
Fig. 6 is a music score;Fig. 7 is that each sound of music score shown in Fig. 6 connects the waveform diagram to be formed, and the waveform diagram is to a certain degree
It is upper similar with sine wave.
Fig. 8 is that each sound of another music score of Chinese operas connects the waveform diagram to be formed, and the waveform diagram is also similar with sine wave.
It is assumed that it is 1 group that the period of creation, which is substantially 4 periods (introduction, elucidation of the theme), each period contains the several of same number
A trifle, then according to Sine-Fitting, it is assumed that the 1st note of the 1st period is f (x)=sin x, a certain on x ∈ [0,2 π]
A point such as x=5 then counting since the starting point, then has as starting point:[5,5+ π] is that the 1st period (pays attention to 1 pleasure
Section may include several trifles, and each trifle duration is all the same, for example all clap for 4/4, or all clap for 6/8, but ordinary circumstance
Do not recommend the case where variation beat occur), [5+ π, 5+2 π] is the 2nd period, then not exclusively similar place appears in the 3rd period
Or the 4th period, it is not exclusively similar the reason is that needing to have association because of feature in terms of being similar to the literature and art works rhythm
Adjusting, complete, symmetrical beauty, but need it is with one climax following another as asymmetric, changeful, not dull beauty.
Above 4 periods, the 1st period are called undertaking period starting period, the 2nd period, and the 3rd period, which is called, transfers/pass
Into period, the 4th period is called end period, referred to as 4 words:The introduction, elucidation of the theme is (progressive similar to small as mood by the 3rd period
Say the climax parts processing of structure, then it is turnover period or end period that the 4th period, which can also be regarded,.
Period 4 will break the period of Sine-Fitting, i.e. permission period 3 and period 1 has that maximum similar (turnover can also be with
Begun in period 3), but period 4 do not recommend it is identical with period 2, in order to avoid bring the fatigue of the aesthetic aspect of the sense of hearing).
Need now as shown in connection with fig. 8 music score shown in music score Fig. 8 be illustrated.
The content of numbered musical notation part is labeled and is grouped, and is subject to necessary explaination.Firstly, the beat section mark of numbered musical notation
That infuses is bad, and class (wheat etc. is cried out also to calculate) song of expressing one's emotion is mostly 4/4 bat, it is not recommended that is labeled as 2/4 bat, is then grouped.
Grouping 1:Prelude part, the grouping corresponding to aforementioned starting period;
Grouping 2:Corresponding to the aforementioned grouping for accepting period, 1 colon is similar on image plus 1 thin 12 thick vertical lines
(:| l), so that it may which the part for judging front is one group;
Grouping 3 and 4:Corresponding to the grouping of aforementioned turnover/progressive period.
Grouping 5:Corresponding to the grouping above-mentioned for terminating period.
If composition, it then follows { 1 } correlation principles such as Sine-Fitting, and according to the whole spectrum figure trend rather than it is specific
Some note, beat are globally observed, so that it may generate relatively good template.
Therefore it may first have to ensure in the cloud network big data that the AI module of intelligent compositing system is used, it is necessary to store
Certain audible spectrum figure is used as data sample, and all algorithms are all based on what audible spectrum figure was fitted.Another party
Face, to a large amount of sample analysis, it was also found that meeting the pleasure of Sine-Fitting template (template corresponding to sine wave rule above-mentioned)
In contrast people can feel more pleasing to the ear to section.
Sine-Fitting is that comparatively melodious music needs the elementary sentence abided by, the sound of even different emotion styles
It is happy also to will comply with this rule.
In view of the spectrogram that the music of different emotions style has its overall trend difference to move towards, so intelligence makees bowed pastern
In the cloud network big data that the AI module of system is used, substantially need to store is the different-style music based on Sine-Fitting
The tendency of spectrogram.So, when obtaining the spectrogram of user's humming song, so that it may there is obvious sense according to storage
The characteristic spectrogram tendency of feelings style, analysis user creation song style and song in emotion variation, thus with
When family starts to make selection refinement etc. modification, relatively intelligentization suggestion is provided.Such as:The determination of song emotion keynote, style
It is whether consistent, the stylistic category of this period and the amending advice of period etc., but take whether, must be determined by user.
The derivation of equation { 3 } obtains after the Compression and Expansion for doing whole equal proportion to f (x)=sin x functional image converts
Image be still sinusoidal image.When deriving formula art music as a result, BPM when equal proportion is expanded or shunk, is being changed
Range less (i.e. f (x)=sin x functional image deformation not serious) when, whole song stylistic differences after variation and original
Less, the measurement of variation range size is determined by calculating variation front and back transverse and longitudinal coordinate ratio, which is:It is longitudinal maximum
Distance/lateral a cycle, for example for image f (x)=sin x, ratio is 2/2 π, is not reduced.
The derivation of equation { 4 }, after doing whole equidistant up/down translation transformation to f (x)=sin x functional image,
Obtained image is still sinusoidal image.It is right when the principle for dividing rate equally according to 12 when deriving formula art music as a result,
Whole several scales of song lifting/lowering, obtained music frequency spectrum trend graph is consistent with original tendency, and only such variation will more
Added with the range for helping match singer.
The derivation of equation { 5 } uses the rule to tone height according to the rule of mathematics arithmetic progression, can be to adjacent
Note, trifle, period do the translation transformation of arithmetic progression, can create the progressive period of stereovision in this way;Similarly, when the rule
Being used in time interval and non-pitch height, i.e., the time that adjacent each note, trifle, period are played gradually slows down or accelerates,
It can also be created that gradually nervous, impassioned, or the gradually period of gentle, quiet style.
The derivation of equation { 6 } closes a first symbol melody of Sine-Fitting transformation, on the basis of BPM, according to 2 n times idempotent
The musical performance content of the certain time of the inside is won out to difference duration, the new period reconstituted still conforms to sine
It is fitted image, the difference duration that wherein value of N is formed can not be lower than the next demisemiquaver duration of the BPM, can not also be higher than
Next complete one group of period (introduction, elucidation of the theme) duration of the BPM.
The derivation of equation { 7 }, the note of a length of T carries out the identical note for being split as X T/X duration, whole head sound when to one
Happy style trend moves towards constant, emotion of expression etc. slightly change, and split scheme require herein X value be 2 n times power or
Person is 2 n times power and 3 product, final value can not so that the duration of T/X is lower than the next demisemiquaver duration of the BPM,
The note duration before splitting under the BPM can not be higher than.
The derivation of equation { 8 } is done each note formed after fractionation and is changed, be not all it same on the basis of { 7 }
A tone, but whole fluctuating gap variation is less, then and tendency plentiful, rich in variation will be presented in the melody expressed.
When human ear hears the synaeresis that multi-Channel Acoustic source of sound is formed simultaneously, even if the tone of multi-Channel Acoustic source of sound is completely the same, and
It has different timbres, then psychologically has and think the full feeling of music (sense of hearing satisfaction).
The part having different timbres is created, user can specify any monophonic, it is desirable that complete using self-ordained musical instrument
The period of the part is replicated, to create multi part performance.
Chord formula fitting, user can specify any monophonic, it is desirable that and it automatically generates chord according to chord formula, than
Such as 135 one groups of Major chord, and other minor triads as 5 one groups of 1 3b, increase common chords, subtract common chords etc. it is more
Part.
Grace (be equivalent to and increase a decoration part) can be generated to specified note, grace is still according to the note
The intelligent completion of height, one kind that user can also be adjusted or require it to specify according to oneself are moved towards in this period spectrogram
The spectrogram segment of emotion style is generated.
In several embodiments provided herein, it should be understood that disclosed device and method can pass through it
Its mode is realized.Apparatus embodiments described above are merely indicative, for example, the division of the unit, only
A kind of logical function partition, there may be another division manner in actual implementation, such as:Multiple units or components can combine, or
It is desirably integrated into another system, or some features can be ignored or not executed.In addition, shown or discussed each composition portion
Mutual coupling or direct-coupling or communication connection is divided to can be through some interfaces, the INDIRECT COUPLING of equipment or unit
Or communication connection, it can be electrical, mechanical or other forms.
Above-mentioned unit as illustrated by the separation member, which can be or may not be, to be physically separated, aobvious as unit
The component shown can be or may not be physical unit, it can and it is in one place, it may be distributed over multiple network lists
In member;Some or all of units can be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
In addition, each functional unit in various embodiments of the present invention can be fully integrated into a processing module, it can also
To be each unit individually as a unit, can also be integrated in one unit with two or more units;It is above-mentioned
Integrated unit both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
Those of ordinary skill in the art will appreciate that:Realize that all or part of the steps of above method embodiment can pass through
The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer readable storage medium, the program
When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes:It is movable storage device, read-only
Memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or
The various media that can store program code such as person's CD.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (10)
1. a kind of information processing method, which is characterized in that including:
Acquire the first audio-frequency information, wherein the first audio-frequency information includes:Melodic information, cadence information and timbre information are at least
One of;
It generates and associated second audio-frequency information of first audio-frequency information, wherein the first audio-frequency information and second audio
The content of information is at least partly different.
2. the method according to claim 1, wherein
First audio-frequency information and the content of second audio-frequency information are at least partly different, including at least one of:
The playing duration of second audio-frequency information is different from the playing duration of first audio-frequency information;
First melodic information of the second melodic information of second audio-frequency information and first audio-frequency information is at least partly not
Together;
Second cadence information of second audio-frequency information and the first cadence information of first audio-frequency information are at least partly not
Together;
First timbre information of the second timbre information of second audio-frequency information and first audio-frequency information is at least partly
It is different.
3. according to the method described in claim 2, it is characterized in that,
First timbre information and second timbre information include at least one of:
First kind timbre information, wherein the first kind timbre information includes:The timbre information of voice;The tone color of the voice
Information includes at least one of:The tone color of male voice, the tone color of female voice, the tone color of child's voice, at least two class voice are mixed to form
Mix voice;
Second class timbre information, wherein the second class tone color information includes:The timbre information of musical instrument;
Third class timbre information, wherein the third class timbre information is:Tone color letter other than the voice and the musical instrument
Breath.
4. method according to claim 1,2 or 3, which is characterized in that
The generation and associated second audio-frequency information of first audio-frequency information, including at least one of:
According to the Audio attribute information of first audio-frequency information, second audio-frequency information is generated;
According to the corresponding customer attribute information of first audio-frequency information, second audio-frequency information is generated.
5. according to the method described in claim 4, it is characterized in that,
The Audio attribute information according to first audio-frequency information, generates second audio-frequency information, including it is following at least
One of:
According to the melody characteristics attribute of first audio-frequency information, rhythm characteristic attribute, tamber characteristic attribute, style of song attribute and sound
At least one of happy type attribute generates second audio-frequency information.
6. according to the method described in claim 4, it is characterized in that,
It is described that second audio-frequency information is generated according to the corresponding customer attribute information of first audio-frequency information, including:
Play record information according to the user preference information, audio, emotional state information and user indicate information at least its
One of, generate the second audio-frequency information.
7. according to the method described in claim 6, it is characterized in that,
It is described that record information, emotional state information are played according to the corresponding user preference information of first audio-frequency information, audio
And user indicates at least one of information, generates the second audio-frequency information, including at least one of:
According to the emotional state information, the duration of second audio-frequency information is determined;
Information is indicated according to the user, determines the duration of second audio-frequency information;
Information is indicated according to the user, continues to generate second audio-frequency information;
Information is indicated according to the user, restores to generate second audio-frequency information;
Information is indicated according to the user, stops generating second audio-frequency information;
According to the emotional state information, continue to generate second audio-frequency information;
According to the emotional state information, stop generating second audio-frequency information;
According to the emotional state information, restore to generate second audio-frequency information
Information is indicated according to the emotional state information and the user, determines the duration of second audio;
Information is indicated according to the emotional state information and the user, continues to generate second audio;
Information is indicated according to the emotional state information and the user, stops generating second audio;
Information is indicated according to the emotional state information and the user, restores to generate second audio.
8. method according to claim 1,2 or 3, which is characterized in that the generation is associated with first audio-frequency information
The second audio-frequency information, including:
First audio-frequency information is handled using audio processing model, output and second audio-frequency information.
9. method according to claim 1,2 or 3, which is characterized in that
The generation and associated second audio-frequency information of first audio-frequency information, including at least one of:
The first music score of Chinese operas information of second audio-frequency information is generated according to first audio-frequency information;
The first lyrics information of second audio-frequency information is generated according to first audio-frequency information;
The first music score of Chinese operas information and the first lyrics information generated according to first audio-frequency information is synthesized, is generated and second sound
The corresponding song files of frequency information.
10. a kind of information processing unit, which is characterized in that including:
Acquisition module, for acquiring the first audio-frequency information, wherein the first audio-frequency information includes:Melodic information, cadence information and sound
At least one of color information;
Generation module, for generate with associated second audio-frequency information of first audio-frequency information, wherein the first audio-frequency information with
The content of second audio-frequency information is at least partly different.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810673919.5A CN108922505B (en) | 2018-06-26 | 2018-06-26 | Information processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810673919.5A CN108922505B (en) | 2018-06-26 | 2018-06-26 | Information processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108922505A true CN108922505A (en) | 2018-11-30 |
CN108922505B CN108922505B (en) | 2023-11-21 |
Family
ID=64421511
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810673919.5A Active CN108922505B (en) | 2018-06-26 | 2018-06-26 | Information processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108922505B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110070885A (en) * | 2019-02-28 | 2019-07-30 | 北京字节跳动网络技术有限公司 | Audio originates point detecting method and device |
CN113066458A (en) * | 2021-03-17 | 2021-07-02 | 平安科技(深圳)有限公司 | Melody generation method, device and equipment based on LISP-like chain data and storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060230909A1 (en) * | 2005-04-18 | 2006-10-19 | Lg Electronics Inc. | Operating method of a music composing device |
EP1785891A1 (en) * | 2005-11-09 | 2007-05-16 | Sony Deutschland GmbH | Music information retrieval using a 3D search algorithm |
CN101313477A (en) * | 2005-12-21 | 2008-11-26 | Lg电子株式会社 | Music generating device and operating method thereof |
US20140052282A1 (en) * | 2012-08-17 | 2014-02-20 | Be Labs, Llc | Music generator |
CN103854644A (en) * | 2012-12-05 | 2014-06-11 | 中国传媒大学 | Automatic duplicating method and device for single track polyphonic music signals |
US20150179157A1 (en) * | 2013-12-20 | 2015-06-25 | Samsung Electronics Co., Ltd. | Multimedia apparatus, music composing method thereof, and song correcting method thereof |
CN105161081A (en) * | 2015-08-06 | 2015-12-16 | 蔡雨声 | APP humming composition system and method thereof |
CN106652997A (en) * | 2016-12-29 | 2017-05-10 | 腾讯音乐娱乐(深圳)有限公司 | Audio synthesis method and terminal |
CN106649586A (en) * | 2016-11-18 | 2017-05-10 | 腾讯音乐娱乐(深圳)有限公司 | Playing method of audio files and device of audio files |
CN107863095A (en) * | 2017-11-21 | 2018-03-30 | 广州酷狗计算机科技有限公司 | Acoustic signal processing method, device and storage medium |
CN108197185A (en) * | 2017-12-26 | 2018-06-22 | 努比亚技术有限公司 | A kind of music recommends method, terminal and computer readable storage medium |
-
2018
- 2018-06-26 CN CN201810673919.5A patent/CN108922505B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060230909A1 (en) * | 2005-04-18 | 2006-10-19 | Lg Electronics Inc. | Operating method of a music composing device |
EP1785891A1 (en) * | 2005-11-09 | 2007-05-16 | Sony Deutschland GmbH | Music information retrieval using a 3D search algorithm |
US20070131094A1 (en) * | 2005-11-09 | 2007-06-14 | Sony Deutschland Gmbh | Music information retrieval using a 3d search algorithm |
CN101313477A (en) * | 2005-12-21 | 2008-11-26 | Lg电子株式会社 | Music generating device and operating method thereof |
US20140052282A1 (en) * | 2012-08-17 | 2014-02-20 | Be Labs, Llc | Music generator |
CN103854644A (en) * | 2012-12-05 | 2014-06-11 | 中国传媒大学 | Automatic duplicating method and device for single track polyphonic music signals |
US20150179157A1 (en) * | 2013-12-20 | 2015-06-25 | Samsung Electronics Co., Ltd. | Multimedia apparatus, music composing method thereof, and song correcting method thereof |
CN105161081A (en) * | 2015-08-06 | 2015-12-16 | 蔡雨声 | APP humming composition system and method thereof |
CN106649586A (en) * | 2016-11-18 | 2017-05-10 | 腾讯音乐娱乐(深圳)有限公司 | Playing method of audio files and device of audio files |
CN106652997A (en) * | 2016-12-29 | 2017-05-10 | 腾讯音乐娱乐(深圳)有限公司 | Audio synthesis method and terminal |
CN107863095A (en) * | 2017-11-21 | 2018-03-30 | 广州酷狗计算机科技有限公司 | Acoustic signal processing method, device and storage medium |
CN108197185A (en) * | 2017-12-26 | 2018-06-22 | 努比亚技术有限公司 | A kind of music recommends method, terminal and computer readable storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110070885A (en) * | 2019-02-28 | 2019-07-30 | 北京字节跳动网络技术有限公司 | Audio originates point detecting method and device |
CN113066458A (en) * | 2021-03-17 | 2021-07-02 | 平安科技(深圳)有限公司 | Melody generation method, device and equipment based on LISP-like chain data and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108922505B (en) | 2023-11-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108806655B (en) | Automatic generation of songs | |
US9595256B2 (en) | System and method for singing synthesis | |
CN108806656B (en) | Automatic generation of songs | |
CN108492817B (en) | Song data processing method based on virtual idol and singing interaction system | |
Datta et al. | Signal analysis of Hindustani classical music | |
Umbert et al. | Expression control in singing voice synthesis: Features, approaches, evaluation, and challenges | |
Schneider | Music and gestures: A historical introduction and survey of earlier research | |
JP7424359B2 (en) | Information processing device, singing voice output method, and program | |
CN107430849A (en) | Sound control apparatus, audio control method and sound control program | |
JP5598516B2 (en) | Voice synthesis system for karaoke and parameter extraction device | |
JP2022092032A (en) | Singing synthesis system and singing synthesis method | |
Nikolsky | Emergence of the distinction between “verbal” and “musical” in early childhood development | |
CN108922505B (en) | Information processing method and device | |
Quinto et al. | Composers and performers have different capacities to manipulate arousal and valence. | |
Berliner | The art of Mbira: Musical inheritance and legacy | |
JP4808641B2 (en) | Caricature output device and karaoke device | |
TWI377558B (en) | Singing synthesis systems and related synthesis methods | |
CN110782866A (en) | Singing sound converter | |
Lebon | The Versatile Vocalist: Singing Authentically in Contrasting Styles and Idioms | |
Subramanian | Modelling gamakas of Carnatic music as a synthesizer for sparse prescriptive notation | |
Daffern | Distinguishing characteristics of vocal techniques in the specialist performance of early music | |
Kouroupetroglou et al. | Formant tuning in Byzantine chanting | |
Hardman | Experiencing Sonic Change: Acoustic Properties as Form-and Meter-Bearing Elements in Popular Music Vocals | |
Chrysochoidis et al. | Formant tuning in Byzantine chant | |
Nguyen | A Study on Correlates of Acoustic Features to Emotional Singing Voice Synthesis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |