CN108922505B - Information processing method and device - Google Patents

Information processing method and device Download PDF

Info

Publication number
CN108922505B
CN108922505B CN201810673919.5A CN201810673919A CN108922505B CN 108922505 B CN108922505 B CN 108922505B CN 201810673919 A CN201810673919 A CN 201810673919A CN 108922505 B CN108922505 B CN 108922505B
Authority
CN
China
Prior art keywords
information
audio
audio information
generating
tone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810673919.5A
Other languages
Chinese (zh)
Other versions
CN108922505A (en
Inventor
方田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN201810673919.5A priority Critical patent/CN108922505B/en
Publication of CN108922505A publication Critical patent/CN108922505A/en
Application granted granted Critical
Publication of CN108922505B publication Critical patent/CN108922505B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/40Rhythm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/056Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/076Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection

Abstract

The embodiment of the invention discloses an information processing method and device. The method comprises the following steps: collecting first audio information, wherein the first audio information comprises: at least one of melody information, rhythm information, and tone information; second audio information associated with the first audio information is generated, wherein the first audio information is at least partially different from the content of the second audio information.

Description

Information processing method and device
Technical Field
The present invention relates to the field of information technologies, and in particular, to an information processing method and apparatus.
Background
In existing audio playback systems, audio playback is to play audio already in a local audio library or a remote audio library. However, the local audio library or the remote audio library of the audio device cannot meet the current listening requirement of the user, or the user needs to search for the audio playing of the user's own desire in a huge amount of audio, so that the intelligence of the existing audio playing system is not enough, and the user experience is not good enough.
Disclosure of Invention
The embodiment of the invention provides an information processing method and device.
The technical scheme of the invention is realized as follows: in a first aspect, an embodiment of the present invention provides an information processing method, including:
Collecting first audio information, wherein the first audio information comprises: at least one of melody information, rhythm information, and tone information;
generating second audio information associated with the first audio information, wherein the first audio information is at least partially different from the content of the second audio information.
In some embodiments, the first audio information is at least partially different from the second audio information in content, including at least one of:
the playing time length of the second audio information is different from that of the first audio information;
the second melody information of the second audio information is at least partially different from the first melody information of the first audio information;
the second tempo information of the second audio information is at least partially different from the first tempo information of the first audio information;
the second tone color information of the second audio information is different from at least a portion of the first tone color information of the first audio information.
In some embodiments, the first tone color information and the second tone color information include at least one of:
a first type of tone color information, wherein the first type of tone color information includes: tone information of the human voice; the tone color information of the voice includes at least one of the following: the mixed human voice is formed by mixing at least two human voices, namely the tone of male voice, the tone of female voice, the tone of child voice and the tone of at least two human voices;
A second type of tone color information, wherein the second type of tone color information includes: tone information of the musical instrument;
third-class tone information, wherein the third-class tone information is: and the voice and tone information outside the musical instrument.
In some embodiments, the generating the second audio information associated with the first audio information includes at least one of:
generating the second audio information according to the audio attribute information of the first audio information;
and generating the second audio information according to the user attribute information corresponding to the first audio information.
In some embodiments, the generating the second audio information according to the audio attribute information of the first audio information includes at least one of:
and generating the second audio information according to at least one of melody characteristic attribute, rhythm characteristic attribute, tone characteristic attribute, wind attribute and music type attribute of the first audio information.
In some embodiments, the generating the second audio information according to the user attribute information corresponding to the first audio information includes:
and generating second audio information according to at least one of the user preference information, the audio playing record information, the emotion state information and the user indication information.
In some embodiments, the generating the second audio information according to at least one of the user preference information, the audio playing record information, the emotion state information and the user indication information corresponding to the first audio information includes at least one of:
determining the duration of the second audio information according to the emotion state information;
determining the duration of the second audio information according to the user indication information;
continuously generating the second audio information according to the user indication information;
restoring to generate the second audio information according to the user indication information;
stopping generating the second audio information according to the user indication information;
continuing to generate the second audio information according to the emotion state information;
stopping generating the second audio information according to the emotion state information;
restoring generation of the second audio information based on the emotional state information
Determining the duration of the second audio according to the emotion state information and the user indication information;
continuously generating the second audio according to the emotion state information and the user indication information;
stopping generating the second audio according to the emotion state information and the user indication information;
And restoring to generate the second audio according to the emotion state information and the user indication information.
In some embodiments, the generating the second audio information associated with the first audio information includes:
and processing the first audio information by using an audio processing model, and outputting the second audio information.
In some embodiments, the generating the second audio information associated with the first audio information includes at least one of:
generating first music spectrum information of the second audio information according to the first audio information;
generating first lyric information of the second audio information according to the first audio information;
and synthesizing the first music spectrum information and the first lyric information generated according to the first audio information, and generating a song file corresponding to the second audio information.
A second aspect is an information processing apparatus, comprising:
the collection module is used for collecting first audio information, wherein the first audio information comprises: at least one of melody information, rhythm information, and tone information;
and the generation module is used for generating second audio information associated with the first audio information, wherein the first audio information is at least partially different from the second audio information in content.
In some embodiments, the first audio information is at least partially different from the second audio information in content, including at least one of:
the playing time length of the second audio information is different from that of the first audio information;
the second melody information of the second audio information is at least partially different from the first melody information of the first audio information;
the second tempo information of the second audio information is at least partially different from the first tempo information of the first audio information;
the second tone color information of the second audio information is different from at least a portion of the first tone color information of the first audio information.
In some embodiments, the first tone color information and the second tone color information include at least one of:
a first type of tone color information, wherein the first type of tone color information includes: tone information of the human voice; the tone color information of the voice includes at least one of the following: the mixed human voice is formed by mixing at least two human voices, namely the tone of male voice, the tone of female voice, the tone of child voice and the tone of at least two human voices;
a second type of tone color information, wherein the second type of tone color information includes: tone information of the musical instrument;
third-class tone information, wherein the third-class tone information is: and the voice and tone information outside the musical instrument.
In some embodiments, the generating module is specifically configured to perform at least one of:
generating the second audio information according to the audio attribute information of the first audio information;
and generating the second audio information according to the user attribute information corresponding to the first audio information.
In some embodiments, the generating module is specifically configured to perform at least one of:
and generating the second audio information according to at least one of melody characteristic attribute, rhythm characteristic attribute, tone characteristic attribute, wind attribute and music type attribute of the first audio information.
In some embodiments, the generating module is specifically configured to generate the second audio information according to at least one of the user preference information, the audio playing record information, the emotional state information, and the user indication information.
In some embodiments, the generating module is specifically configured to perform at least one of:
determining the duration of the second audio information according to the emotion state information;
determining the duration of the second audio information according to the user indication information;
continuously generating the second audio information according to the user indication information;
Restoring to generate the second audio information according to the user indication information;
stopping generating the second audio information according to the user indication information;
continuing to generate the second audio information according to the emotion state information;
stopping generating the second audio information according to the emotion state information;
restoring generation of the second audio information based on the emotional state information
Determining the duration of the second audio according to the emotion state information and the user indication information;
continuously generating the second audio according to the emotion state information and the user indication information;
stopping generating the second audio according to the emotion state information and the user indication information;
and restoring to generate the second audio according to the emotion state information and the user indication information.
In some embodiments, the generating module is specifically configured to process the first audio information by using an audio processing model, and output the second audio information.
In some embodiments, the generating module is specifically configured to generate first music spectrum information of the second audio information according to the first audio information; generating first lyric information of the second audio information according to the first audio information; and synthesizing the first music spectrum information and the first lyric information generated according to the first audio information, and generating a song file corresponding to the second audio information.
According to the information processing method and the information processing device, the second audio information related to the first audio information can be automatically generated after the first audio information is acquired, so that the second audio information which is related to the first audio information and at least partially different in content can be dynamically generated based on the acquired first audio information, which is equivalent to the fact that the electronic equipment can automatically create audio based on the acquired audio information, and therefore the requirements of a user for listening to the second audio information which is related to the first audio information and dynamically generating the second audio information are met, and the electronic equipment has the characteristics of being high in intelligence and high in using satisfaction.
Drawings
Fig. 1 is a flow chart of a first information processing method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of generating second audio information according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a first information processing apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a twelve-tone rhythm melody according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a standard sine wave according to an embodiment of the present invention;
fig. 6 is a staff diagram of a music score according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a wiring on the score of FIG. 6 according to an embodiment of the present invention;
fig. 8 is a schematic diagram of another score line according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further elaborated below by referring to the drawings in the specification and the specific embodiments.
As shown in fig. 1, an embodiment of the present invention provides an information processing method, including:
step S110: collecting first audio information, wherein the first audio information comprises: at least one of melody information, rhythm information, and tone information;
step S120: second audio information associated with the first audio information is generated, wherein the first audio information is at least partially different from the content of the second audio information.
In this embodiment, the information processing method may be applied to a first electronic device, where the first electronic device collects the first audio information by using a microphone and the like, and the first audio information may be audio information generated after any sound in a space where the first electronic device is located is collected, for example, a humming song of a user is collected by the first electronic device to generate the first audio information.
In step S120, the first electronic device may generate the second audio information according to the first audio information by itself, or submit the first audio information or information related to the first audio information to the second electronic device, where the second audio information is generated by the second electronic device and received by the first electronic device from the second electronic device.
In this embodiment, the second audio information is dynamically generated according to the first audio information, and the second audio information may be completely new audio information. May not be present in either the local or the connected second electronic device of the first electronic device. The first audio information may include at least one of:
melody information forming a melody after being played;
rhythm information, which generates a certain rhythm after being played;
tone information, which determines the frequency of sound fluctuations, is inconsistent in terms of auditory perception.
In this embodiment, the second audio information is generated based on the first audio information, where the first audio information and the second audio information have a correlation, where the correlation may be at least partially the same between the first audio information and the second audio information, and at the same time, after the first electronic device itself or the second electronic device connected to the first electronic device performs processing, the first electronic device obtains at least partially different audio information from the first audio information, where the difference may be: at least one of melody information, rhythm information, and tone information.
Therefore, after the user humms several sentences to the first electronic equipment, the first electronic equipment generates and plays the second audio information related to the first audio information acquired by humming or requests the second electronic equipment, so that the first electronic equipment automatically plays the second audio information and/or automatically creates and plays the second audio information based on humming of the user, the situation that audio heard by the user each time is possibly different is ensured, the special audio listening requirement of the user is met, and the use satisfaction of the user is improved.
Because the first audio information originates from the audio information of the sound generated by the humming of the user collected by the first electronic device or selected to be played by the user or controlled by the user in other ways, in summary, the first audio information originates from the control of the user, and the current listening or creating will of the user is represented. If the second audio information is generated according to the first audio information, the generated second audio information can reflect the intention of the user or meet the requirement of the user, so that the characteristic that the second audio information is dynamically generated to meet the current requirement of the user is realized, and the intelligence and the use satisfaction degree of the user of the electronic equipment are improved.
In some embodiments, the first audio information is at least partially different from the second audio information in content, including at least one of:
the playing time length of the second audio information is different from that of the first audio information;
the second melody information of the second audio information is at least partially different from the first melody information of the first audio information;
the second tempo information of the second audio information is at least partially different from the first tempo information of the first audio information;
the second tone color information of the second audio information is different from at least a portion of the first tone color information of the first audio information.
The playing time length of the first audio information which is played according to the preset playing rate is a first time length; the playing time length of the second information according to the preset playing speed is a second time length; the first duration is different from the second duration. For example, the playing duration of the first audio information is equal to the duration of the humming of the user, and may be several seconds, for example, 5 seconds, 10 seconds, etc. The second duration of the second audio information may be greater than the first duration, e.g., the playing duration of the second audio information may be equal to an average playing duration of a song, e.g., any duration between 2 minutes, 3 minutes, 2 to 5 minutes. Therefore, the first electronic device is triggered to obtain a song generated based on the humming, and the first electronic device plays the song, so that the user humming is different, the obtained song is different, the user humming is the same, the dynamically-made song corresponding to the second audio information is different, different listening requirements of the user are met, and the intelligence and the user using satisfaction of the electronic device are improved.
In some embodiments, the first tone color information and the second tone color information include at least one of:
a first type of tone color information, wherein the first type of tone color information includes: tone information of the human voice; the tone color information of the voice includes at least one of the following: the mixed human voice is formed by mixing at least two human voices, namely the tone of male voice, the tone of female voice, the tone of child voice and the tone of at least two human voices;
a second type of tone color information, wherein the second type of tone color information includes: tone information of the musical instrument;
third-class tone information, wherein the third-class tone information is: and the voice and tone information outside the musical instrument.
The tone color is classified according to the sounding body, and the tone color may be classified into at least three tone colors described above, human voice, instrument voice, and tone colors other than human voice and instrument voice, for example, various simulated sounds using non-instrument simulation.
The human voice can be divided into male voice, female voice, child voice and various mixed human voice, for example, the tone color of male and female mixed voice; mixed human voice of adult male voice and child voice; mixed human voice in which adult girls and child voices are mixed.
The male voices in this embodiment are: the sound made by a man after the period of sound change can also be called adult man's sound; the female voice may be: the sound made by women after the period of sound change may also be referred to as adult female sound.
The child's voice contains various human voices before the sound transition period.
The second type of tone is tone information of musical instruments, for example, tone information of percussion instruments, tone information of string instruments, tone information of wind instruments, and the like.
A third type of timbre information may include: sound of electronic devices, sound of door and window opening and closing, sound of animals, and the like.
As shown in fig. 2, the step S120 may include at least one of:
generating the second audio information according to the audio attribute information of the first audio information;
and generating the second audio information according to the user attribute information corresponding to the first audio information.
The audio attribute information herein may be information extracted from the first audio information, including, but not limited to, melody information, rhythm information, tone information, wind information, music type information, and the like.
The user attribute information may be user attribute information of a sounding user of the first audio information, or user attribute information of a holding user of the first electronic device, or user attribute information of a user to which an application account of an audio application running by the first electronic device belongs.
The user attribute information may include: sex, age, region, occupation, preference, etc.
In this embodiment, when the second audio information is generated, not only the information itself of the first audio information but also the audio attribute information and/or the user attribute information are combined.
The generating the second audio information according to the audio attribute information of the first audio information comprises at least one of the following steps:
and generating the second audio information according to at least one of melody characteristic attribute, rhythm characteristic attribute, tone characteristic attribute, wind attribute and music type attribute of the first audio information.
The melody feature attribute describes a melody feature of the first audio information, for example, whether the melody feature of the first audio information is aggressive or relaxed using the melody feature attribute.
The tempo feature attribute describes a tempo feature of the first audio information, for example, the first audio information is 2/4 beats of music, 3/4 beats of music, or the like.
The tone characteristic attribute describes tone characteristics of the first audio information, for example, whether the tone of the first audio information is dominant in male voices, dominant in female voices, tone of musical instruments, tone of human voices, or tone of other types.
The wind attribute describes a music style or genre of the first audio information.
The music type describes a music type of the first audio information, for example, whether rock music, country music, or other types of music.
In some embodiments of the present invention, in some embodiments,
the generating the second audio information according to the user attribute information corresponding to the first audio information includes:
and generating second audio information according to at least one of the user preference information, the audio playing record information, the emotion state information and the user indication information.
For example, the user may input his or her own preference information, such as preferred audio; and preference information such as audio of the user preference can be automatically generated according to the play record of the user.
The user preference information may further include in some embodiments: singer preference of user; and generating the second audio information by using the tone color of the singer favored by the user.
The audio play record information may include: audio played by the electronic device in the history time, audio attribute information of the played audio, and the like are recorded.
The emotional state information may include: and acquiring the current emotion state of the user by utilizing image acquisition or audio acquisition and analyzing the facial expression in the image or extracting the emotion state of the sound, and determining the melody, rhythm and other information of the second audio information according to the emotion state information of the user.
In some embodiments the first audio information and the second audio information may further comprise: lyric information, etc.; in other embodiments the first audio information and the second audio information may further comprise: language information of the pronunciation of the lyrics, for example, determines whether the second audio information is played in chinese, english or other languages.
In some embodiments, the generating the second audio information according to at least one of the user preference information, the audio playing record information, the emotion state information and the user indication information corresponding to the first audio information includes at least one of:
determining the duration of the second audio information according to the emotion state information; for example, if the emotional state information indicates that the user still wants to listen, for example, the expression is very drunk in music, the second audio information is played for a longer period of time; if the time length is determined to be two alternatives, a longer one can be selected; in some embodiments, the emotional state information may also be scored, with the score as input, and one of the durations calculated using a particular function.
Determining the duration of the second audio information according to the user indication information; for example, the user instructs to stop playing, to continue playing, to lengthen playing, or the like through a gesture operation, a voice operation, a line-of-sight operation, or the like, and the time period is determined based on this.
The generating the second audio information further includes one or more of:
continuously generating the second audio information according to the user indication information;
restoring to generate the second audio information according to the user indication information;
stopping generating the second audio information according to the user indication information;
continuing to generate the second audio information according to the emotion state information;
stopping generating the second audio information according to the emotion state information;
restoring generation of the second audio information based on the emotional state information
Determining the duration of the second audio according to the emotion state information and the user indication information;
continuously generating the second audio according to the emotion state information and the user indication information;
stopping generating the second audio according to the emotion state information and the user indication information;
and restoring to generate the second audio according to the emotion state information and the user indication information.
In some embodiments, the step S120 may include: and processing the first audio information by using an audio processing model, and outputting the second audio information.
The audio processing model herein may be various types of models, for example, various big data models, and the big data model herein may be a model generated by training using sample data, for example, a neural network model, a vector machine model, a regression model, and the like. And dynamically generating the second audio information by taking the first audio information as input through a big data model.
In some embodiments, the step S110 may include at least one of:
generating first music spectrum information of the second audio information according to the first audio information;
generating first lyric information of the second audio information according to the first audio information;
and synthesizing the first music spectrum information and the first lyric information generated according to the first audio information, and generating a song file corresponding to the second audio information.
The music score information herein may include: to musical scores, such as staff or numbered musical notation files, etc.
The lyric information may include: lyrics written in various languages, for example, lyrics written in chinese, and the like.
In this embodiment, the melody information characterizes the foregoing melody information, rhythm information, and the like.
Thus, after the second audio information is dynamically generated, the second audio information is also converted into the first music score information and the first lyric information, for example, the first music score information and the first lyric information are recorded in a song file mode, and if the user feels good, the user can click to play the audio again; thus, not only is the dynamic creation of the second audio realized, but also the recording and subsequent playing of songs are realized. In some embodiments, the method further comprises:
Forwarding the song file and/or the second audio information to a predetermined device, e.g., a social server, for posting as an integral part of the social information; as another example, a submission to a multimedia information repository is recorded for storage or download by others.
In some embodiments, the step S120 may include:
and adjusting the first music spectrum information corresponding to the first audio information according to a preset rhythm to generate second music spectrum information.
In some embodiments, the adjusting the first music score information according to a preset musical rhythm to generate second music score information includes at least one of:
and adjusting the first cursive spectrum information according to the sine wave rule to generate the second cursive spectrum information meeting the sine wave rule.
For example, the treble of each beat of the first music score information may be connected and then adjusted in a waveform approximating a sine wave, where the sine wave may be a profiled wave having a similarity with a standard sine wave greater than a first threshold and less than a second threshold. The second threshold is greater than the first threshold, and both the first and second thresholds may be values between 0 and 1.
For another example, the lowest tones of each beat of the first music score information may be connected and then adjusted in a waveform approximating a sine wave, where the sine wave may be a profiled wave having a similarity with a standard sine wave greater than a third threshold and less than a fourth threshold. The third threshold is smaller than the fourth threshold, and both the fourth threshold and the third threshold can be values between 0 and 1.
In other embodiments, the positions of the respective tones in the first music spectrum information in the wireless spectrum are connected, and the connection line is formed in a shape similar to the sine wave.
In some embodiments, the first threshold may be equal to the third threshold, and/or the third threshold may be equal to the fourth threshold.
In some embodiments, the adjusting the first music score information according to a preset musical rhythm to generate second music score information includes:
and adjusting the first music score information according to the change rule of the music playing and turning, so as to generate the second music score information meeting the change rule of the music playing and turning.
The change rule of the take-up and transfer can be used to reflect the change of the melody or the rhythm, for example, the change rule of the take-up and transfer divides the audio into 4 music pieces, namely a beginning music piece, a receiving music piece, an climax music piece and an ending music piece.
The 4 music pieces satisfy a predetermined order, for example, the order is: beginning music, receiving music, climax music to ending music. A complete second audio needs to contain the 4 music pieces. The melody difference and/or the rhythm difference between any two of the 4 music pieces satisfies a predetermined relationship.
For example, constructing a beginning portion of the beginning music piece with the collected first audio information; and then automatically starting to create the second audio information according to the change rule of the bearing and turning on the basis of the first audio information.
For example, taking melodies as an example, the melodies of the climax musical piece are the most exciting or the lowest sinking; the receiving music piece and the ending music piece are slightly weaker in the degree of being excited or sinking than the climax music piece. The beginning music piece is weaker than the receiving music piece.
For example, taking the rhythm as an example, the rhythm of the climax music passage is the fastest; the rhythm of the receiving music piece and the ending music piece is slower than that of the climax music piece. The beginning music piece is slower than the receiving music piece.
As for how much the receiving music piece and the ending music piece are fresher or lower in degree than the high-tide music piece or slower in rhythm, a random number can be generated based on the random function, and processing is performed based on the generated random number, thereby generating the second audio information.
Therefore, in some embodiments, the foregoing audio processing model may also be an audio model that satisfies the sine wave rule and/or the sine wave rule of variation, and when processing the above variation, the generated second audio information may be made more variable by introducing a random number of bits.
For example, the second audio satisfying the positive selection rule may satisfy the variation rule of the set-up and set-down, and the specific strength of the single chapter may be determined based on the random value randomly generated by the random function.
In some embodiments, the method further comprises:
generating lyric information, wherein the lyric information corresponds to the first music score information or the lyric information corresponds to the second music score information, and the second music score information is generated based on the first music score information.
For example, lyrics input by user voice are received from a man-machine interaction interface, and audio information corresponding to the lyrics is converted into lyric information corresponding to a specific language.
In some embodiments, the generating lyric information includes:
collecting second audio information;
the second audio information is converted into the lyric information.
In some embodiments, the converting the second audio information into the lyric information comprises:
generating pronunciation annotation information according to the second audio information; the pronunciation marking information can be directly converted based on pronunciation, and is not limited to different languages;
Corresponding to the corresponding music score information on the audio track according to the pronunciation identification information; thus, the comparison of the music score information and the lyric information is completed, so as to generate the second audio information and/or the song file.
Optionally, according to the music spectrum parameters of the music spectrum information corresponding to the lyric information, automatically generating the lyric information.
In some embodiments, the method further comprises:
and generating performance information, wherein the performance information is audio information generated by playing a target object of the first music spectrum information or the second music spectrum information corresponding to the first music spectrum information, and the target object is at least one of a target musical instrument, a target living being or a target object, and the target object is different from the target musical instrument and the target living being. The target object may be any of the sounding bodies described above.
In some embodiments, the generating performance information includes;
and generating the performance information according to the music spectrum parameters of the first music spectrum information or the second music spectrum information.
In other embodiments, the generating the performance information according to the score parameter of the first score information or the second score information includes:
According to the rhythm parameter, the wind parameter and the emotion of the first music score information or the second music score information, the method further comprises the following steps: synthesizing at least two of melody information, lyric information corresponding to the melody information and performance information of the melody information to generate a song file, wherein the melody information is the first melody information or second melody information corresponding to the first melody information.
In some embodiments, the method further comprises: detecting a labeling operation; and trying to put the song file or modifying the song file according to the identification operation. Thus, the user is allowed to modify the generated second audio information, and the audio information meeting the own requirements is obtained.
As shown in fig. 3, the present embodiment provides an information processing apparatus including:
the acquisition module 110 is configured to acquire first audio information, where the first audio information includes: at least one of melody information, rhythm information, and tone information;
the generating module 120 is configured to generate second audio information associated with the first audio information, where the first audio information is at least partially different from the second audio information.
In some embodiments, the first audio information is at least partially different from the second audio information in content, including at least one of:
the playing time length of the second audio information is different from that of the first audio information;
the second melody information of the second audio information is at least partially different from the first melody information of the first audio information;
the second tempo information of the second audio information is at least partially different from the first tempo information of the first audio information;
the second tone color information of the second audio information is different from at least a portion of the first tone color information of the first audio information.
In some embodiments, the first tone color information and the second tone color information include at least one of:
a first type of tone color information, wherein the first type of tone color information includes: tone information of the human voice; the tone color information of the voice includes at least one of the following: the mixed human voice is formed by mixing at least two human voices, namely the tone of male voice, the tone of female voice, the tone of child voice and the tone of at least two human voices;
a second type of tone color information, wherein the second type of tone color information includes: tone information of the musical instrument;
third-class tone information, wherein the third-class tone information is: and the voice and tone information outside the musical instrument.
In other embodiments, the generating module 120 is specifically configured to perform at least one of the following:
generating the second audio information according to the audio attribute information of the first audio information;
and generating the second audio information according to the user attribute information corresponding to the first audio information.
In still other embodiments, the generating module 120 is specifically configured to perform at least one of:
and generating the second audio information according to at least one of melody characteristic attribute, rhythm characteristic attribute, tone characteristic attribute, wind attribute and music type attribute of the first audio information.
In still other embodiments, the generating module 120 is specifically configured to generate the second audio information according to at least one of the user preference information, the audio playing record information, the emotional state information, and the user indication information.
In still other embodiments, the generating module 120 is specifically configured to perform at least one of:
determining the duration of the second audio information according to the emotion state information;
determining the duration of the second audio information according to the user indication information;
continuously generating the second audio information according to the user indication information;
Restoring to generate the second audio information according to the user indication information;
stopping generating the second audio information according to the user indication information;
continuing to generate the second audio information according to the emotion state information;
stopping generating the second audio information according to the emotion state information;
restoring generation of the second audio information based on the emotional state information
Determining the duration of the second audio according to the emotion state information and the user indication information;
continuously generating the second audio according to the emotion state information and the user indication information;
stopping generating the second audio according to the emotion state information and the user indication information;
and restoring to generate the second audio according to the emotion state information and the user indication information.
In still other embodiments, the generating module 120 is specifically configured to process the first audio information by using an audio processing model, and output the second audio information.
In still other embodiments, the generating module 120 is specifically configured to generate first music score information of the second audio information according to the first audio information; generating first lyric information of the second audio information according to the first audio information; and synthesizing the first music spectrum information and the first lyric information generated according to the first audio information, and generating a song file corresponding to the second audio information.
Several specific examples are provided below in connection with any of the embodiments described above:
example 1:
the present example provides an audio processing system that constitutes a brief description: the system is composed of a hardware system, a software application program and cloud network big data.
(1) The hardware system composition is briefly described: a microphone input device with high sampling rate, one electronic device with excellent operation performance, and a network broadband not lower than 50M;
(2) Software application composition overview: audio recording and playing application based on DirectSound, a data processing module based on a Central Processing Unit (CPU) and an image processor (GPU), an AI learning module based on artificial intelligence (artificial intelligence, AI) sample collection, spectrum recognition, feature extraction, depth training, analysis and prediction technology, a speech annotation generation module based on speech recognition, an advanced audio codec module based on advanced audio coding (Advanced Audio Coding, AAC) based on database addition, reading, updating and deleting (Create Retrieve Update Delete, CRUD) module.
(3) Cloud network big data: the method comprises the steps of storing sea level non-coding music source data and corresponding music language based on spectrogram coding, and storing massive short music pieces with unequal lengths of 1-8 syllables.
The system mainly has the functions that the input end captures the voice data of the humming of the user, and the cloud big data is used for assisting in generating the track which accords with the creation intention of the user.
The workflow of the audio processing system may be as follows:
(1) And (3) starting the countdown of the trimming acquisition, capturing the humming sound source data of the user by using the microphone as an input end after the countdown is finished for 3 seconds, and directly transmitting the humming sound source data to a personal computer (Personal Computer, PC) without encoding.
(2) The PC obtains a corresponding spectrogram of the sound source data and a spectrum signal without tone and noise through signal processing;
(3) And based on the user instruction, selecting a certain musical instrument, rendering the generated spectrum signal without tone and noise, generating music, and then listening in trial.
(4) The audio is generated by trial playing, so that a user can conveniently perform trial listening on the generated track (corresponding to the second audio information) for a plurality of times, and each time, incremental text labeling is performed on the music of the trial listening on the basis of the music piece of the initial time stamp interval, and the part which is considered to be satisfactory and the part which is not satisfactory are distinguished; and then, using a strategy using the main user modification and the AI intelligent cloud big data as the auxiliary, the AI can carry out re-modification on the unsatisfactory part, for example, at least one of rhythm, melody and volume is modified until the unsatisfactory part is satisfied, a spectrogram and a labeling file for the spectrogram are saved, and the labeling file is stored as a preset format, for example, a file format of a suffix name sll. The sll full spelling is SweetLover Language.
(5) At this time, for the user who additionally has a singing demand
<1> a pronunciation track marked with english 26 letters (pronunciation language system includes but is not limited to modern chinese pronunciation language system, old chinese pronunciation language system, japanese pronunciation language system, english pronunciation language system, roman pronunciation language system, latin pronunciation language system) can be inserted under the existing spectrogram and stored as a pronunciation file (i call it a music pronunciation file, suffix name can be slv; the slv is written as sweetlever Voice).
<2> the pronunciation spoken by the user can also be converted to an english 26 letter label using a voice recognition class software, but doing so requires verifying if the voice recognition pronunciation labels are wrong, and each pronunciation label specifically which note aligned to the spectrum requires manual handling by the user.
<3> further, the own singing WAV format recording can be directly recorded and then imported to be synthesized.
(6) Finally, the user generates the music of the designated musical instrument and the voice of the designated sound source for each spectrogram track and each pronunciation track of the recorded music information respectively (each spectrogram track generation rule is based on the corresponding sll file, each pronunciation track generation rule is based on the slv file), the finally generated sound files are in the uncoded compressed WAV format, each spectrogram track and the corresponding music score (optional staff or numbered musical notation) are saved again, and each pronunciation track is rendered finally, and audio source data in the WAV format which is rendered for the selected musical instrument is output (rendering can be performed based on music pieces, each music piece can be performed by different musical instruments, the same-haircut sound file can be performed based on music pieces, and each music piece is applicable to different sound sources).
The technical scheme provided by the example enables the common user without higher music literacy to reduce the threshold of music creation, promotes the popularization of music creation-and enables the user to grasp the sense of inspiration of the user about the life; to provide a production tool with higher productivity for professional music producers and composers.
Example 2:
the PC obtains a corresponding spectrogram of sound source data and a spectrum signal without tone and noise through data processing, which may include:
the PC samples the data after the microphone at high frequency, and temporarily slices the audio stream equally divided from the beginning to the end according to the next 128-symbol time interval (namely, the time interval of 0.5s multiplied by 4 divided by 128 is 0.015625 s) assuming that BPM=120.0, so as to generate an audio frame of one slice;
using a mixed data processing mode based on CPU serial plus GPU parallel, carrying out parallel fast Fourier transform and synchronization on a group of maximum audio frames each time until all audio frames are processed, thereby converting the time domain signal of each audio frame into a frequency domain signal (only frequency and energy information, no signal of any tone);
and then a mixed data processing mode based on CPU serial plus GPU parallel is used for finding out the frequency domain signal with the maximum energy in each audio frame, and the frequency domain signal is regarded as a main signal, and then signal denoising is carried out to remove other frequency domain signals.
After the noise is removed, error correction is needed. Here, because of some limitations (twelve-tone law), and also for objective reasons (the frequency of humming by the user may be inaccurate), error correction must be performed on the basis of several sets of scales of the piano according to the standard similar to fig. 4. FIG. 4 may be a twelve-tone scale representation of information relating to a twelve-tone law; the columns represent frequencies.
After correction is finished, the screened signals need to be enhanced and restored from a frequency domain to a time domain, when the time duration of the restored time domain signals is less than the time interval of 0.015625s, default parts need to be supplemented, the frequency of the supplemented signals is consistent with that of the restored signals, and when the time interval is filled with the signals after the supplementation in a time sequence arrangement, the amplitude of the signals in the whole time interval needs to be ensured to be consistent with a damping vibration model (namely, the phenomenon that sound generated by vibration of a musical instrument is gradually attenuated);
then generating spectrograms according to the time length and scale, because the recording is processed according to the time slicing, a long tone with unchanged tone becomes a plurality of short tones after slicing, but the same short tones are not connected temporarily (such as recording of a whole note E2 at actual bpm=120, slicing according to a 128-component note at actual bpm=120, and cutting into 128 audio frames), the specific connection also needs to be decided by the composer himself, and the later AI only gives relevant intelligent prompts and suggestions under the unmarked area, because the following music pieces are different for artistic creation:
"5- - - - - - - - - -" and "5 5 5 5" - - - "are different. The latter is detected by the AI and gives a prompt asking the user if he considers it as a continuous long tone like the former.
In the generation of the spectrogram, the shape of the spectrogram is saved, and the spectrogram is the key of the subsequent AI to make authoring suggestions according to a large number of data templates (quick matching based on the shape and making modification suggestions according to the requirements of users).
The following is an example of a corresponding spectral pattern of melody segments:
2. the user may select a certain musical instrument, render the generated spectrum signal without tone and noise, generate music, and listen on trial, which is equivalent to selecting tone information of the second audio based on user input, or automatically select tone information by the AI model.
3. The user listens to the generated track for a plurality of times, and each time the music listened to in the trial is subjected to incremental text labeling based on the music piece of the initial time stamp interval.
Firstly, the AI system gives prompts according to the appointed steps for novice users, and for users with the identity of professional composers and the like, the AI system can simultaneously display the prompts in each step on the whole music path. The following is assumed to be a novice user, thereby sequentially explaining which cues the AI will give:
(1) Firstly, an AI system generates a preliminary suggestion which is used for prompting a user whether a plurality of frequency spectrums with the same height can be regarded as a long sound or not (if the user does not process the frequency spectrums, the AI defaults to assume that the frequency spectrums are a continuous long sound), then the user uses a mark to confirm, and the mark content is stored into a generated complete mark language file after the user confirms a place every time;
(2) Next, the AI system prompts the user to select an instrument (this practice is called a rendering setting) playing the specified music piece, and these instruments include, but are not limited to, a common piano, violin, and the like. These instruments refer to the recording sound sources of the real instruments playing each tone for 4s in a duration under the same strength, because the spectrogram stores the core frequency information of the melody, the playing instrument of the music piece is designated according to the spectrogram and the sound source selection label between the various areas in the spectrogram by the user, when any music piece in the whole spectrogram has at least 1 designated instrument corresponding to the music piece (the condition that a certain music piece is allowed to have both piano label and violin label is allowed to exist), the reality meaning that the music piece needs to be played by the two instruments at the same time is that the user can choose to confirm that the 2 nd step label is completed.
After the user marks, the system will execute a mixing algorithm, and generates the complete music lossless WAV audio file according to the marks and the frequency spectrum (such a method is called rendering), at this time, the AI system will prompt the user whether to listen to the music playing selected by the user, the user can listen to the music playing for several times, each time the user can add marks to a certain music playing and request to play the corresponding part of the rendered WAV file to listen to the music playing, the user can insert a time stamp mark in the listening process, or after the listening is finished or terminated, the time stamp mark time points or even music playing between 2 adjacent time points selected by the user himself/herself are added marks, then repeating (1) (2) (3), or requesting the AI system to correct the music playing mark, and the AI system will give the prompt of step 4.
The AI system is used for matching massive music data of the cloud of the network according to the spectrogram, and timely giving out the attribute of the music, including but not limited to: a spectrogram (essential attribute) of the music piece, a style, emotion, a musical instrument, and the like, and inquires the user about the user's requirements:
<1> if the style and emotion are not satisfied, the user can input the wanted style emotion, then AI generates 1=C example frequency mother template for the user to listen on trial according to the spectrogram matching the style emotion, the user can choose to directly and manually modify the mother template or ask AI to use the related algorithm compiled according to the composing theory to generate the actual music passage generated by a composing skill selected by the user through the mother template to make advice for confirmation, so as to make the algorithm compiled according to the composing theory discussed later;
If the volume of the musical instrument and the music piece is unsatisfactory, the user only needs to add marks, and then render the music piece again to replace the corresponding part of the old WAV, and the technology is the DirectSound and FILE I/O related technology, so that the method is not difficult to realize.
4. Machine singing (for users with singing requirements):
pronunciation annotation problem: according to the qualitative of the software, pronunciation schemes in different countries are required to be considered, and the early-stage computers use English characters, so that the labeling scheme is also to select English character labeling; because of the motion of the pronunciation system Luo Mahua in the early stage, most of the pronunciation without tone, whether Chinese characters, japanese, korean and the like, can be marked by using 1 byte region codes (maximally representing 255 regions) plus English characters (serving as pronunciation characters);
pronunciation annotation and SweetLover Voice file coding: because some features of the aspect of pronunciation, especially when a long sound is generated, the sound generated when the long sound is prolonged is a sound in a stable state, (and the pronunciation is that a consonant initiated by the mouth, tongue and nose is used as a feature sound and then is formed by the breath in the stable state), the labeling of the aspect of pronunciation is similar to the labeling method of the opening process of flowers, the consonant is labeled on a specific syllable, and the sound with state change is labeled each time in the part of the prolonged sound. If the sound of the singing pinyin is provided that the sound needs to be singed for 4s and singer singing follows pronunciation rules, the pronunciation change accompanied by mouth shape change in the singing process should be: yu-yuan-an, the label should be under the corresponding note, in a similar manner.
<1> now assume that 0x01 is represented as 2052 in the operating system, an example is given as follows:
in the above notation, 0x01 of 1 byte indicates the region to which the following pronunciation annotation method belongs, and immediately following an-means that the extension of the tone of an is made.
<2> when the user uses voice input, three voices of yuyuuanan should be said, and then the user adjusts the position himself, which is very hard;
<3> it is more convenient if the user directly records his own singing voice as WAV and then imports it.
In some scenarios, all WAVs should use the same microphone input device, the same sample rate, channel number, bit depth, preventing bias.
5. The AI determination of cloud network big data may be as follows:
how the AI module of the intelligent composition system uses cloud network big data and automatically generates music piece complement by combining the theory of composition. Firstly, theoretical bedding is needed:
{1}: the music piece frequency spectrum needs to generate a core rhythm according to 4 segments (bearing and turning) conforming to literature pairs and rhyme sentences based on sine fitting standards. The core rhythm is the main rhythm and can be the most rhythm in the second audio information; or the cadence of the climax part.
The following is that the staff version D is large-tuning kannong, and is led from hundred-degree pictures, so that the staff is not understood, if the heads of the tadpoles in the staff are all connected, the spectrum segments similar to Track 2 in the pictures can be formed, all music pieces need to be transversely arranged and connected after all the notes are connected, namely the tail of music piece 1 and the beginning of music piece 2 (all marked by serial number 1) marked in the notes of me are transversely arranged, the tail of music piece 2 and the beginning … … of music piece 3 are transversely arranged in an end-to-end connection by the way, the ascending and descending amplitude span of each music piece period can be found not to be very large, the saw-tooth audio spectrogram similar to sharp noise can not be formed, and all the formed audio spectrogram is similar to the sine function f (x) =sin x fluctuation law but not completely similar.
Fig. 5 is a waveform diagram of a standard sine wave.
FIG. 6 is a musical score; fig. 7 is a waveform diagram of the connection of the individual sounds of the score of fig. 6, which waveform diagram is somewhat similar to a sine wave.
Fig. 8 is a waveform diagram of the connection of individual tones of another music score, which is also similar to a sine wave.
Assuming that the created music piece is basically a group of 4 music pieces (take-up and take-off), each music piece contains a number of bars of the same number, then, according to the sine fitting, assuming that the 1 st note of the 1 st music piece is f (x) =sin x, a point on x e [0,2 pi ] such as x=5 is taken as a starting point, then, from this starting point, there is: [5, 5+pi ] is the 1 st music piece (note that 1 music piece may contain several bars, each bar has the same duration, for example, 4/4 beats, or 6/8 beats, but the situation of changing beats is not recommended in general, while [ 5+pi, 5+2pi ] is the 2 nd music piece, then the incompletely similar place appears in the 3 rd music piece or the 4 th music piece, and the incompletely similar reason is because the characteristics similar to the prosody aspect of the literature art work are needed to have a coordinated, complete and symmetrical beauty, and also needs to have the wave-like asymmetric, rich in variation and non-monotonic beauty.
The 4 music pieces are called a starting music piece, a 2 nd music piece is called a receiving music piece, a 3 rd music piece is called a turning/progressive music piece, a 4 th music piece is called an ending music piece, and the 4 th music piece is called 4 words for short: take up and turn (treat the 3 rd music passage as a climax part of emotion progression similar to a novel structure, then the 4 th music passage can also be regarded as a turning music passage or an ending music passage.
Music piece 4 must break the cycle of the sinusoidal fit, i.e. allow music piece 3 and music piece 1 to have maximum similarity (the break may also start at music piece 3), but music piece 4 does not recommend the same as music piece 2, in order to avoid audible aesthetic fatigue.
The description will now be made in connection with a score shown in a music chart 8 shown in fig. 8.
The content of the numbered musical notation parts is marked and grouped and explained as necessary. Firstly, beat parts of the numbered musical notation are poorly marked, the number of the lyrics (shouting, etc. also calculated) is more than 4/4 beats, the marking of 2/4 beats is not recommended, and then grouping is carried out.
Packet 1: a pre-playing part corresponding to the grouping of the initial music piece;
group 2: corresponding to the grouping of the receiving music piece, the image is similar to 1 colon plus 1 thin 1 thick 2 vertical lines (|l), and the former part can be judged to be a group;
Packets 3 and 4: packets corresponding to the aforementioned break/progression music pieces.
Group 5: packets corresponding to the ending music passage described above.
If the composition is composed, a better template can be generated according to the related principles such as {1} sine fitting and the like and according to the complete spectrogram trend instead of a specific note and beat, the whole observation is carried out.
Therefore, firstly, it is necessary to ensure that a certain audio spectrogram is stored in cloud network big data used by an AI module of the intelligent composition system to be used as a data sample, and all algorithms are fitted based on the audio spectrogram. On the other hand, analysis of a large number of samples also found that music pieces conforming to a sinusoidal fit template (a template corresponding to the aforementioned sine wave law) were perceived as relatively audible.
Sinusoidal fitting is a fundamental proposition to which relatively better-sounding music needs to adhere, even though music of different emotional styles will follow this law.
Considering that music with different emotion styles has spectrograms with different trend of the overall trend, the trend of the music spectrograms with different styles based on sine fitting is essentially required to be stored in cloud network big data applied by an AI module of the intelligent composition system. Therefore, when the spectrogram of the user humming song is obtained, the song style created by the user and the emotion change in the song can be analyzed according to the stored characteristic spectrogram trend with obvious emotion style, so that relative intelligent advice is given when the user starts to make modifications such as selection refinement. Such as: the determination of the mood of the song, whether the styles are consistent, the style of the music piece, the modification advice of the music piece, etc., but the taking or not must be decided by the user.
The formula derives {3}, when the f (x) =sin x function image is subjected to overall equal-proportion stretching compression transformation, the obtained image is still a sinusoidal image. Therefore, when deriving the formula to create music, when expanding or shrinking the BPM in equal proportion, and when the variation range is not large (i.e. f (x) =sin x function image deformation is not serious), the variation of the style of the whole changed music is not large with the original style, and the measurement of the size of the variation range is determined by calculating the ratio of the horizontal coordinates before and after the variation, wherein the ratio is: the maximum distance in the longitudinal direction/one period in the transverse direction, for example, for the image f (x) =sinx, the ratio is 2/2 pi, and does not divide.
The formula derives {4}, when the f (x) =sin x function image is subjected to overall equidistant up/down translation transformation, the resulting image remains a sinusoidal image. Therefore, when the formula is deduced to create music, the whole music rises/falls by a plurality of musical scales according to the principle of twelve halving rate, the obtained music frequency spectrum trend graph is consistent with the original trend, and only the change is more helpful to match the musical range of singers.
The formula deduces {5}, according to the rule of mathematical arithmetic progression, the rule is applied to tone height, and adjacent notes, bars and music pieces can be subjected to translational transformation of the arithmetic progression, so that music pieces with progressive layering can be created; similarly, when the rule is applied at intervals other than at intervals of high or low pitch, i.e., the time of playing each note, bar, or music piece is gradually slowed down or accelerated, a music piece with gradually tense, jerky, or gradually gentle and quiet style can be created.
The formula deduces {6}, and extracts the music performance content of a period of time in the music according to N times of idempotent differential time length of 2 by taking BPM as a reference, wherein the reconstructed new music still accords with the sinusoidal fitting image, and the differential time length formed by the value of N can not be lower than the next 32 partials Fu Shichang of the BPM or higher than the next complete set of music (rising and falling) time length of the BPM.
The formula derives {7}, a note with a duration of T is split into X identical notes with a duration of T/X, the trend of the style of the whole piece of music is unchanged, expressed emotion and the like are slightly changed, the splitting scheme requires that X is taken as the power N of 2 or as the product of the power N of 2 and 3, and the final value cannot enable the duration of T/X to be lower than the duration of the next 32 partials Fu Shichang of the BPM or higher than the duration of the note before splitting under the BPM.
The formula deduces {8}, and on the basis of {7}, each note formed after splitting is changed so that the notes are not all the same tone, but the integral fluctuation gap is not changed greatly, and the expressed melody presents plump and varied trends.
When the human ear hears the synthesized sound formed by the multiple sound sources at the same time, even if the tones of the multiple sound sources are completely identical and the timbres are different, the sense of fullness of the music (auditory satisfaction) is psychologically perceived.
Creating sound parts with different tone colors, the user can designate any single sound part, and the music piece of the sound part is required to be completely duplicated by using a self-formulated musical instrument, so that a multi-sound part playing is created.
The chord formula fit may specify any single part that is required to automatically generate chords according to the chord formula, such as major tri-chords of the group 1 3b 5, and other multi-parts like minor tri-chords, major tri-chords, minor tri-chords, etc. of the group 1 3b 5.
The decorative sound can be generated for the designated notes (which is equivalent to adding a decorative sound part), the decorative sound can be intelligently complemented according to the trend of the notes in the spectrogram of the music section, and the user can also adjust or request to generate the decorative sound according to the spectrogram section of an emotion style designated by the user.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present invention may be integrated in one processing module, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, or the like, which can store program codes.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An information processing method, characterized by comprising:
collecting first audio information, wherein the first audio information comprises: at least one of melody information, rhythm information and tone information, wherein the first audio information is generated after any sound in a space where the first electronic equipment is located is collected;
generating second audio information associated with the first audio information; the generating second audio information associated with the first audio information includes at least: adjusting first music spectrum information corresponding to the first audio information according to a preset rhythm to generate second music spectrum information corresponding to the second audio information; the preset rhythms comprise sine wave laws, in particular: connecting the highest sound of each beat of the first curvelet information, and then adjusting the highest sound by using a waveform similar to a sine wave, wherein the similarity between the sine wave and a standard sine wave is a special-shaped wave with a similarity greater than a first threshold value and less than a second threshold value; the second threshold is greater than the first threshold, and both the first threshold and the second threshold can be values between 0 and 1; or connecting the lowest tones of each beat of the first music score information, and then adjusting the lowest tones with a waveform similar to a sine wave, wherein the sine wave can be a special-shaped wave with the similarity with a standard sine wave being larger than a third threshold value and smaller than a fourth threshold value; the third threshold is smaller than the fourth threshold, and the fourth threshold and the third threshold can be values between 0 and 1;
Wherein the first audio information is at least partially different from the content of the second audio information.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the first audio information is at least partially different from the second audio information in content, including at least one of:
the playing time length of the second audio information is different from that of the first audio information;
the second melody information of the second audio information is at least partially different from the first melody information of the first audio information;
the second tempo information of the second audio information is at least partially different from the first tempo information of the first audio information;
the second tone color information of the second audio information is different from at least a portion of the first tone color information of the first audio information.
3. The method of claim 2, wherein the step of determining the position of the substrate comprises,
the first tone color information and the second tone color information include at least one of:
a first type of tone color information, wherein the first type of tone color information includes: tone information of the human voice; the tone color information of the voice includes at least one of the following: the mixed human voice is formed by mixing at least two human voices, namely the tone of male voice, the tone of female voice, the tone of child voice and the tone of at least two human voices;
A second type of tone color information, wherein the second type of tone color information includes: tone information of the musical instrument;
third-class tone information, wherein the third-class tone information is: and the voice and tone information outside the musical instrument.
4. A method according to claim 1, 2 or 3, characterized in that,
the generating second audio information associated with the first audio information includes at least one of:
generating the second audio information according to the audio attribute information of the first audio information;
and generating the second audio information according to the user attribute information corresponding to the first audio information.
5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,
the generating the second audio information according to the audio attribute information of the first audio information comprises at least one of the following steps:
and generating the second audio information according to at least one of melody characteristic attribute, rhythm characteristic attribute, tone characteristic attribute, wind attribute and music type attribute of the first audio information.
6. The method of claim 4, wherein the step of determining the position of the first electrode is performed,
the generating the second audio information according to the user attribute information corresponding to the first audio information includes:
Generating second audio information according to at least one of user preference information, audio playing record information, emotion state information and user indication information.
7. The method of claim 6, wherein the step of providing the first layer comprises,
generating second audio information according to at least one of user preference information, audio playing record information, emotion state information and user indication information corresponding to the first audio information, wherein the second audio information comprises at least one of the following:
determining the duration of the second audio information according to the emotion state information;
determining the duration of the second audio information according to the user indication information;
continuously generating the second audio information according to the user indication information;
restoring to generate the second audio information according to the user indication information;
stopping generating the second audio information according to the user indication information;
continuing to generate the second audio information according to the emotion state information;
stopping generating the second audio information according to the emotion state information;
restoring generation of the second audio information based on the emotional state information
Determining the duration of the second audio according to the emotion state information and the user indication information;
Continuously generating the second audio according to the emotion state information and the user indication information;
stopping generating the second audio according to the emotion state information and the user indication information;
and restoring to generate the second audio according to the emotion state information and the user indication information.
8. A method according to claim 1, 2 or 3, wherein said generating second audio information associated with said first audio information comprises:
and processing the first audio information by using an audio processing model, and outputting the second audio information.
9. A method according to claim 1, 2 or 3, characterized in that,
the generating second audio information associated with the first audio information includes at least one of:
generating first music spectrum information of the second audio information according to the first audio information;
generating first lyric information of the second audio information according to the first audio information;
and synthesizing the first music spectrum information and the first lyric information generated according to the first audio information, and generating a song file corresponding to the second audio information.
10. An information processing apparatus, characterized by comprising:
The collection module is used for collecting first audio information, wherein the first audio information comprises: at least one of melody information, rhythm information and tone information, wherein the first audio information is generated after any sound in a space where the first electronic equipment is located is collected;
a generating module, configured to generate second audio information associated with the first audio information, where the generating of the second audio information associated with the first audio information includes at least: adjusting first music spectrum information corresponding to the first audio information according to a preset rhythm to generate second music spectrum information corresponding to the second audio information; the preset rhythms comprise sine wave laws, in particular: connecting the highest sound of each beat of the first curvelet information, and then adjusting the highest sound by using a waveform similar to a sine wave, wherein the similarity between the sine wave and a standard sine wave is a special-shaped wave with a similarity greater than a first threshold value and less than a second threshold value; the second threshold is greater than the first threshold, and both the first threshold and the second threshold can be values between 0 and 1; or connecting the lowest tones of each beat of the first music score information, and then adjusting the lowest tones with a waveform similar to a sine wave, wherein the sine wave can be a special-shaped wave with the similarity with a standard sine wave being larger than a third threshold value and smaller than a fourth threshold value; the third threshold is smaller than the fourth threshold, and the fourth threshold and the third threshold can be values between 0 and 1; wherein the first audio information is at least partially different from the content of the second audio information.
CN201810673919.5A 2018-06-26 2018-06-26 Information processing method and device Active CN108922505B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810673919.5A CN108922505B (en) 2018-06-26 2018-06-26 Information processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810673919.5A CN108922505B (en) 2018-06-26 2018-06-26 Information processing method and device

Publications (2)

Publication Number Publication Date
CN108922505A CN108922505A (en) 2018-11-30
CN108922505B true CN108922505B (en) 2023-11-21

Family

ID=64421511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810673919.5A Active CN108922505B (en) 2018-06-26 2018-06-26 Information processing method and device

Country Status (1)

Country Link
CN (1) CN108922505B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070885B (en) * 2019-02-28 2021-12-24 北京字节跳动网络技术有限公司 Audio starting point detection method and device
CN113066458A (en) * 2021-03-17 2021-07-02 平安科技(深圳)有限公司 Melody generation method, device and equipment based on LISP-like chain data and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1785891A1 (en) * 2005-11-09 2007-05-16 Sony Deutschland GmbH Music information retrieval using a 3D search algorithm
CN101313477A (en) * 2005-12-21 2008-11-26 Lg电子株式会社 Music generating device and operating method thereof
CN103854644A (en) * 2012-12-05 2014-06-11 中国传媒大学 Automatic duplicating method and device for single track polyphonic music signals
CN105161081A (en) * 2015-08-06 2015-12-16 蔡雨声 APP humming composition system and method thereof
CN106652997A (en) * 2016-12-29 2017-05-10 腾讯音乐娱乐(深圳)有限公司 Audio synthesis method and terminal
CN106649586A (en) * 2016-11-18 2017-05-10 腾讯音乐娱乐(深圳)有限公司 Playing method of audio files and device of audio files
CN107863095A (en) * 2017-11-21 2018-03-30 广州酷狗计算机科技有限公司 Acoustic signal processing method, device and storage medium
CN108197185A (en) * 2017-12-26 2018-06-22 努比亚技术有限公司 A kind of music recommends method, terminal and computer readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101203904A (en) * 2005-04-18 2008-06-18 Lg电子株式会社 Operating method of a music composing device
US8812144B2 (en) * 2012-08-17 2014-08-19 Be Labs, Llc Music generator
KR20150072597A (en) * 2013-12-20 2015-06-30 삼성전자주식회사 Multimedia apparatus, Method for composition of music, and Method for correction of song thereof

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1785891A1 (en) * 2005-11-09 2007-05-16 Sony Deutschland GmbH Music information retrieval using a 3D search algorithm
CN101313477A (en) * 2005-12-21 2008-11-26 Lg电子株式会社 Music generating device and operating method thereof
CN103854644A (en) * 2012-12-05 2014-06-11 中国传媒大学 Automatic duplicating method and device for single track polyphonic music signals
CN105161081A (en) * 2015-08-06 2015-12-16 蔡雨声 APP humming composition system and method thereof
CN106649586A (en) * 2016-11-18 2017-05-10 腾讯音乐娱乐(深圳)有限公司 Playing method of audio files and device of audio files
CN106652997A (en) * 2016-12-29 2017-05-10 腾讯音乐娱乐(深圳)有限公司 Audio synthesis method and terminal
CN107863095A (en) * 2017-11-21 2018-03-30 广州酷狗计算机科技有限公司 Acoustic signal processing method, device and storage medium
CN108197185A (en) * 2017-12-26 2018-06-22 努比亚技术有限公司 A kind of music recommends method, terminal and computer readable storage medium

Also Published As

Publication number Publication date
CN108922505A (en) 2018-11-30

Similar Documents

Publication Publication Date Title
US10789921B2 (en) Audio extraction apparatus, machine learning apparatus and audio reproduction apparatus
CN108806656B (en) Automatic generation of songs
CN108806655B (en) Automatic generation of songs
Juslin et al. Expression and communication of emotion in music performance
Schneider Music and gestures: A historical introduction and survey of earlier research
CN106652984A (en) Automatic song creation method via computer
Umbert et al. Expression control in singing voice synthesis: Features, approaches, evaluation, and challenges
JP2016136251A (en) Automatic transcription of musical content and real-time musical accompaniment
Datta et al. Signal analysis of Hindustani classical music
Parncutt Accents and expression in piano performance
CN108053814B (en) Speech synthesis system and method for simulating singing voice of user
Canazza et al. Caro 2.0: an interactive system for expressive music rendering
Umbert et al. Generating singing voice expression contours based on unit selection
Coutinho et al. Singing and emotion
CN108922505B (en) Information processing method and device
Hill et al. Low-level articulatory synthesis: A working text-to-speech solution and a linguistic tool1
Danielsen et al. Shaping rhythm: Timing and sound in five groove-based genres
Emmerson Timbre composition in electroacoustic music
CN110782866A (en) Singing sound converter
Oh et al. LOLOL: Laugh Out Loud On Laptop.
Hosken The pocket: a theory of beats as domains
Siegel Timbral Transformations in Kaija Saariaho's From the Grammar of Dreams
CN110853457B (en) Interactive music teaching guidance method
Subramanian Modelling gamakas of Carnatic music as a synthesizer for sparse prescriptive notation
Blaauw Modeling timbre for neural singing synthesis: methods for data-efficient, reduced effort voice creation, and fast and stable inference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant