CN113178182A - Information processing method, information processing device, electronic equipment and storage medium - Google Patents

Information processing method, information processing device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113178182A
CN113178182A CN202110448803.3A CN202110448803A CN113178182A CN 113178182 A CN113178182 A CN 113178182A CN 202110448803 A CN202110448803 A CN 202110448803A CN 113178182 A CN113178182 A CN 113178182A
Authority
CN
China
Prior art keywords
processing
music
word
file
mixing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110448803.3A
Other languages
Chinese (zh)
Inventor
苑盛成
陈正扬
刘晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Smart Sound Technology Co ltd
Original Assignee
Beijing Smart Sound Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Smart Sound Technology Co ltd filed Critical Beijing Smart Sound Technology Co ltd
Priority to CN202110448803.3A priority Critical patent/CN113178182A/en
Publication of CN113178182A publication Critical patent/CN113178182A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/105Composing aid, e.g. for supporting creation, edition or modification of a piece of music
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/111Automatic composing, i.e. using predefined musical rules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Abstract

The application discloses an information processing method, an information processing device, electronic equipment and a storage medium, and the specific implementation scheme is as follows: obtaining shared musical material, the musical material comprising: non-original audio for characterizing at least one of a character, a style, and an emotion; and carrying out audio synthesis according to the music material and at least one processing logic of material processing, word and music processing, music editing processing, sound mixing processing, master tape post processing and duplication checking processing to obtain a target object. By adopting the method and the device, various types of original music can be automatically or semi-automatically produced, various music synthesis application scenes are supported, and the efficiency of music content production is greatly improved.

Description

Information processing method, information processing device, electronic equipment and storage medium
Technical Field
The present application relates to the field of digital music, and in particular, to an information processing method and apparatus, an electronic device, and a storage medium.
Background
Since the information revolution, the way music and multimedia are spread has changed in a short time. This variety of qualities has led to a dramatic increase in market demand for various types of music: a great deal of original music is required for either single songs, albums, MVs, karaoke, which are major elements of popular music or artistic creations, short videos, advertisements, animations, trailers, and movie works using music as an auxiliary, or radio stations, broadcasters, public space music using music as background content. How to provide high-quality original music meeting the requirements of users quickly and at low cost becomes a technical problem to be solved urgently in the field.
Disclosure of Invention
The application provides an information processing method, an information processing device, electronic equipment and a storage medium.
According to an aspect of the present application, there is provided an information processing method including:
obtaining shared musical material, the musical material comprising: non-original audio for characterizing at least one of a character, a style, and an emotion;
and carrying out audio synthesis according to the music material and at least one processing logic of material processing, word and music processing, music editing processing, sound mixing processing, master tape post processing and duplication checking processing to obtain a target object.
According to another aspect of the present application, there is provided an information processing apparatus including:
an acquisition module for acquiring shared musical material, the musical material comprising: non-original audio for characterizing at least one of a character, a style, and an emotion;
and the synthesis module is used for carrying out audio synthesis according to the music material and at least one processing logic of material processing, word and music processing, music editing processing, sound mixing processing, master tape post-processing and duplication checking processing to obtain a target object.
According to another aspect of the present application, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as provided by any one of the embodiments of the present application.
According to another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method provided by any one of the embodiments of the present application.
By adopting the method and the device, the shared music materials can be obtained, one or more modes are selected from material processing, word and music processing, music composing processing, sound mixing processing, master tape post-processing and duplication checking processing to process the music materials, and finally the synthesized audio object is obtained.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a schematic flow chart diagram of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic flow diagram of various subsystem operations according to an embodiment of the present application;
FIG. 3 is a schematic view of a workflow of further subsystems according to an embodiment of the present application;
FIG. 4 is a schematic view of a workflow of further subsystems according to embodiments of the present application;
FIG. 5 is a detailed workflow diagram according to an embodiment of the present application;
FIG. 6 is a schematic post-production subsystem workflow diagram according to an embodiment of the present application;
FIG. 7 is a schematic diagram of an original audit subsystem workflow according to an embodiment of the application;
FIG. 8 is a block diagram of a data processing apparatus according to an embodiment of the present application;
fig. 9 is a block diagram of an electronic device for implementing the data processing method according to the embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The term "at least one" herein means any combination of at least two of any one or more of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C. The terms "first" and "second" used herein refer to and distinguish one from another in the similar art, without necessarily implying a sequence or order, or implying only two, such as first and second, to indicate that there are two types/two, first and second, and first and second may also be one or more.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present application.
Some of the english abbreviations herein and their corresponding descriptions are shown in table 1 below:
Figure BDA0003037976460000031
TABLE 1
With the continuous abundance of internet applications and the increasing maturity of mobile communication technologies such as 5G, audio and video information is no longer stored and transmitted by means of tapes, Compact Discs (CDs) or any other physical media. People only need a mobile terminal (such as a mobile phone) or a community platform (such as a website) which is connected to the internet to listen to music from all over the world. People who originally can only rely on letters to transmit emotion can share subjective feelings with streaming media with lower cost, higher speed and richer content.
Revolutionary changes in the way of propagation have brought about a large-scale demand for original music, which, however, are difficult to meet in reality. On the one hand, there are huge amounts of music material on the market, and it may take hours or even days to search for suitable material for a period of minutes. Even if a suitable material can be found, it is likely that it cannot be used directly due to a license restriction or the like. On the other hand, self-composing music requires a lot of expertise, such as harmony, alignment, orchestration, synthesizer, recording, mixing, etc., which most music users do not have. The way for people to create and use music is trapped in a dilemma, and further the whole market of the music industry is limited in a one-way consumption path for creation of the masses and listening to the masses and is difficult to overstepping and expanding.
In such a background, music creation using computers and artificial intelligence techniques to assist or replace humans has become a potential solution to break this market situation. This approach is intended to revive the intelligence, inspiration, and expertise of music in musical practitioners and excellent works into computer systems. The user without music background can operate and use the knowledge without knowing the knowledge and give the computer to synthesize original music required by the user. Once the computer system can know more about how to correctly express emotion by music than the general public, the scheme can greatly expand the emotion communication modes of people, and further expand the whole music industry and market.
The technology of the related music composition is described as follows:
many algorithmic composing techniques and interactive music synthesis systems have provided the ability to assist the composer in synthesizing new music in a particular style, as well as a number of aids. But they are still human-oriented auxiliary creations in nature, and have strong music knowledge barriers, and cannot meet the requirement of non-musician users on full-automatic music synthesis. With the advent of the object-oriented Music synthesis language "Common Music" (Common Music) written in Lisp language, a series of functions including data preprocessing, event editing, playing, playback, and the like are provided, and hidden markov models and other probability-based models are widely used as core algorithms thereof. However, due to its overly tedious design, the user of this solution must simultaneously master the knowledge of the programmer and the musician, plus the limitations of the Lisp language itself, which is even higher knowledge barrier than the traditional techniques and therefore not widely accepted by the market.
In addition, there are many other music synthesis systems, and various algorithms and models for generating grammars, transfer networks, chaos and self-similarity (fractal), genetic algorithms, cellular automata, neural networks, and artificial intelligence, for example, a Web-based software that can automatically create movie soundtracks that match emotion tags. The user first controls the generation of the acoustic sound by entering the mood label and the style label and positioning them into the video product. The system then prepares multiple pre-recorded loops (Loop) using these tags and composes the musical accompaniment. However, to successfully use the system, the user must understand how to do the loudness equalization, must know how to adjust parameters for the instrument, and must know how to identify the timbre of the instrument (e.g., orchestra, synthesizer). This requires the user to have a sufficiently high music literacy that does not meet the conditions for use by the general public. In terms of music quality, the scheme is not accepted by the market due to limited effects, and the application range is small.
For the extraction (scoprify) system, which is also used for video dubbing, the music generation can be divided into three parts of melody, harmony and bass, the selection of instruments, styles and speeds is supported, and the artificial intelligence is used for generating the music composition aiming at the video. However, it still requires the user to understand classical music terminology in terms of knowledge barriers and to be able to identify the timbre of each instrument. Furthermore, it does not allow the music created by the system to be modified, nor does it allow the user to create music independent of the video.
In addition, there is the music Master System (SoniFire Pro). The system uses a library of musical material to score video content and provides Web and desktop based applications. Because of the use of the library of musical materials, but with limited music change schemes, the total amount of music it provides to the user is limited, and the user needs to spend a significant amount of time on the overall length of the musical composition, limited customization features. Therefore, the solution is significantly limited to music material libraries and cannot be applied to the market on a large scale. In addition, there are Band in a Box system, amp Score system, and the like, and there are a series of problems. Although some of the above techniques aim to assist musicians in creating music and even some attempt to fully automatically synthesize music to automatically complete one or more music production links, until now, the above techniques cannot replace manual automatic music generation and are not widely used in the field.
The above-mentioned related art, in addition to the knowledge barriers mentioned above, has a problem of music quality. The music creation industry has extremely high requirements on the quality of each link, such as lyric creation, melody creation, song composition creation, sound mixing technology, human voice quality, master tape processing and the like, and the error of any link can cause serious quality problems. At present, the requirements of the links on the music quality are generally based on subjective feedback listened by musicians, and a deterministic and universal evaluation standard can hardly be found. However, the evaluation index is an element that can not be avoided in almost all existing automatic synthesis systems. Therefore, automatic synthesis systems aimed at creating high quality music have not been well solved. Almost all high-quality music in the market is still created by human-dominated music through a Digital music Workstation (DAW), and has no automatic production condition.
In summary, the market demand of the general public for automatically synthesizing music cannot be met by the prior art, because, firstly, the application range of the existing partial computer music production technology is narrow, and the prior art can only be used for specific scenes (such as arpeggio making and the like), and an interpretable creation path is not realized; secondly, the existing partial computer music production technology needs an excessively high use threshold, and a user needs to have enough knowledge of music knowledge to operate the computer music production technology to produce music; thirdly, the existing partial computer music production technology has too strong closure, all the operations of the user must be completed in the system, and the user cannot use any model or material outside the system, so the creation freedom is too low; fourthly, music different from the content of the material library cannot be created by using the existing music material splicing technology; fifthly, no matter what kind of all existing computer music generation technologies, the music generation method can only assist human beings to finish one or more music production links of 'mixing and mixing of words and music', and can not be separated from human operations to realize full-automatic or semi-automatic production.
From another perspective, most existing systems do not allow for user operation with non-musical backgrounds, or provide only a limited total amount of musical composition; although a small part of systems can initially achieve the purpose of enabling the general public to automatically synthesize music, the method has strong closure, cannot support model access and creation with higher degree of freedom, and is not enough to become a solution in the field.
According to an embodiment of the present application, there is provided an information processing method, and fig. 1 is a flowchart of the information processing method according to the embodiment of the present application, which can be applied to an information processing apparatus, for example, which can perform music material processing, vocabulary processing, composition processing, remix processing, mastering post-processing, duplication checking processing, and the like, in the case where the apparatus is deployed in a terminal or a server or other processing device. Among them, the terminal may be a User Equipment (UE), a mobile device, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, and so on. In some possible implementations, the method may also be implemented by a processor calling computer readable instructions stored in a memory. As shown in fig. 1, includes:
s101, obtaining shared music materials, wherein the music materials comprise: non-original audio for characterizing at least one of a character, a style, and an emotion;
in one example, the music material is used only to provide music wisdom, theory, and inspiration for the music authoring process, not a specific piece of audio. In order to distinguish from musical materials used in the related art, the musical materials used in the present application may be referred to as musical ideas, which may include: lyrics, melody, chord, composition, timbre. It should be noted that the music material is different from the music material used in the traditional splicing technology, the traditional splicing technology does not change the music material itself, and the splicing technology is mainly used to combine a plurality of music materials into new audio, so that the audio produced by the traditional splicing technology directly contains the original material in the material library. In the present application, the music material library storing the music material stores the music idea after being processed by the system, and the music idea is an abstract expression of contents such as pitch, rhythm, and body. Furthermore, the music material can be changed, recombined and the like through an artificial intelligence algorithm and a model, so that the finally generated audio does not contain the original audio in the music material library.
And S102, carrying out audio synthesis according to the music material and at least one processing logic of material processing, word and music processing, music editing processing, sound mixing processing, master tape post processing and duplication checking processing to obtain a target object.
In one example, the target object may be a final audio file or an intermediate processing result, such as a word song file, a MIDI file, a "production process parameter file" corresponding to each processing logic. In one example, for example, the system may be implemented by a plurality of subsystems of the machine, such as a material preparation subsystem for implementing material processing, a word composition creation subsystem for implementing word composition processing, a composition mixing subsystem for implementing composition mixing processing, a post-production subsystem for master tape post-processing, and a creative review subsystem for reviewing, and after performing audio synthesis of word composition, mixing and master tape processing in a fully automatic or semi-automatic manner, an audio file (e.g., synthesized singing) is obtained, and the audio file at least meets at least one characteristic of characters, styles and emotions of the music material, so that a user can be supported to customize different creative audio files according to the requirement of the user. Further, not only the final audio file but also intermediate processing results, such as a vocabulary file, a MIDI file, a "production process parameter file" corresponding to each processing logic, may be obtained. Wherein, the production process parameter file is a system parameter file which can completely repeat the production process of the current audio file.
By adopting the method and the device, word making, music composing, sound mixing and master tape processing can be performed automatically or semi-automatically by a machine aiming at the music related information (including but not limited to characters, styles, emotions and the like) customized by a user, finally, a piece of original music is generated, the required original audio file can be obtained without the user having the background knowledge of the music specialty in the whole operation process, the efficiency of music content production is greatly improved, and the labor cost is reduced.
In one embodiment, the audio synthesis may be performed in a fully automatic manner according to the music material and all processing logic including material processing, word and music processing, music composition processing, sound mixing processing, master tape post-processing, and duplication checking processing, so as to obtain an original audio file, and the original audio file is used as the target object. Specifically, the full-automatic mode means that links such as music material preparation, word making, music composing, sound mixing, master tape processing and the like can be automatically executed by a machine, and any intervention of manual auxiliary operation is not needed.
In one embodiment, the audio synthesis may be performed in a semi-automatic manner by adding human assistance to any processing logic including material processing, vocabulary processing, song composition processing, sound mixing processing, master tape post-processing, and duplication checking processing according to the music material, so as to obtain an intermediate processing result or an original audio file, and the intermediate processing result or the original audio file is used as the target object. Specifically, the semi-automatic mode refers to any step of links such as music material preparation, word making, music composing, sound mixing, master tape processing and the like, and allows intervention of manual auxiliary operation and replacement or modification of contents without influencing automatic execution of other links based on a machine. Wherein the intermediate processing result comprises: the system comprises a word song file, a Digital Interface (MIDI) file for audio production, and a production process parameter file corresponding to any processing logic.
In one embodiment, the music material may be used as sample data, and target model training is performed according to the sample data to obtain a trained target model; the trained target model comprises: at least one type of material processing, word and song processing, composition processing, mixing processing, mother tape post processing and duplication checking processing. Specifically, the music material as sample data is trained in an artificial intelligence processing manner, and the trained target model may include: statistical learning based models (e.g., neural network algorithms) and expert system based models. The model based on the statistical learning can perform parameter training on the music material serving as a training set of sample data to obtain a group of internal parameter sets, so that the trained target model can finally generate corresponding music content by using the parameter sets; based on the model of the expert system, the music materials can be used as templates for analysis, and then the analysis results are transferred to new music contents. The trained model may be interfaced to existing process steps or modules via a model interface.
In one embodiment, when the processing logic is a material processing, the method further includes: in response to a first input operation, processing and determining a parameter space and a randomization parameter for at least one of composition, composing and composing; and obtaining a word parameter file according to the parameter space and the randomization parameter. For example, the material processing may be implemented by a material preparation subsystem, and in particular, in response to an input operation, the material preparation subsystem may receive a user input, determine a parameter space and provide randomized parameters according to the user input, and finally output the vocabulary parameter file. The word and song parameter file is a system parameter file capable of completely repeating the current music production process, and at least comprises system parameters required for providing necessary input data and production for word composition, song composition and the like.
In one embodiment, when the processing logic is a vocabulary processing, the processing logic further includes: responding to the second input operation to obtain a word graph parameter; obtaining a main melody MIDI sequence and a corresponding lyric text sequence according to the lyric parameters; and obtaining a word song file according to the MIDI sequence of the main melody and the text sequence of the lyrics. For example, the word and song creation subsystem may be used to implement the word and song processing, and specifically, after obtaining the word and song parameter file, the word and song parameters in the word and song parameter file may be analyzed, and based on the word and song parameters, the word and song creation may be implemented to obtain the main melody MIDI sequence and the corresponding lyrics text sequence. And obtaining a word song file according to the MIDI sequence of the main melody and the text sequence of the lyrics. An artificial intelligence composition model and an artificial intelligence word model (the model can adopt the music materials as sample data and carry out model training according to the sample data, and the model type is not limited to RNN, Transformer, BERT, GPT2 or other language models and the like) can be adopted in the creation process, so that the artificial intelligence composition model is guided to make the compositions based on the word composition parameters, and the artificial intelligence word model is guided to make the words based on the word composition parameters.
In an embodiment, when the processing logic is a composition processing or a mixing processing, the processing logic further includes: responding to the third input operation to obtain a word song file and a composing parameter; and obtaining files for composing and mixing sound according to the word and music files and the composing parameters, wherein the files for composing and mixing sound are used for representing at least one characteristic of single-track MIDI, multi-track MIDI, musical instruments and timbres. For example, the composing and mixing subsystem may be used to implement composing and mixing, and the composing and mixing subsystem may receive the word and music files output by the word and music creation subsystem, and output the files to obtain the composing and mixing in combination with the composing parameters. The compilation and mixing subsystem can adopt an artificial intelligence compilation model and an artificial intelligence mixing model (the model can adopt the music materials as sample data and carry out model training according to the sample data, the model type is not limited to RNN, Transformer, BERT, GPT2 or other language models and the like), so as to guide the artificial intelligence compilation model to compile music based on compilation parameters and guide the artificial intelligence mixing model to mix sound based on the compilation parameters.
In one embodiment, when the processing logic is master tape post-processing, the method further includes: responding to the fourth input operation to obtain a word and music file and a file for editing music and mixing sound; performing audio rendering processing according to the word and music file and the files for editing music and mixing sound, and obtaining an audio file according to an audio rendering result; wherein the audio rendering process comprises: single track synthesis, rendering and track combination, and audio processing related to volume adjustment. For example, the post-processing of the master tape can be realized by a post-processing subsystem, and the post-processing subsystem performs audio mixing and master tape processing on the word and song files, the composing and audio mixing files, and outputs the audio files (such as an integral digital audio file or an audio file composed of a plurality of audio tracks).
In one embodiment, when the processing logic is a duplicate checking process, the processing logic further includes: responding to the fifth input operation to obtain a word and music file and a file for editing music and mixing sound; respectively carrying out duplication checking processing on the word and music file and the files for editing music and mixing sound based on similarity comparison with the music material to obtain duplication checking processing results; if the similarity comparison is lower than the threshold, the duplication checking result is determined to be matched with the expected target. For example, the original content auditing including the duplication checking processing is realized by the original auditing subsystem, and the similarity between the original content (such as the intermediate processing results of the vocabularies, the composing and mixing subsystem, the post-production subsystem and the like) obtained by the above systems (such as the vocabularies creating subsystem, the composing and mixing subsystem, and the post-production subsystem) and the music material stored in the music material library can be automatically verified, so as to ensure that the similarity between the generated original content and the original music material is lower than a specified threshold (such as the proportion of the original content), and the lower the similarity is, the lower the similarity is between the generated content and any music material used by the machine in the learning, training and application stages is, and the originality of the generated content is ensured.
By adopting the method and the system, the artificial intelligence model and the expert system are accessed through the model interface, original music can be automatically or semi-automatically produced, the model interface not only supports different types of models to be accessed into the system, but also can support a human-computer interaction interface for manual operation of a user, so that the application freedom degree of the technical scheme is improved, the efficiency of music content production is greatly improved, and unnecessary manual operation can be eliminated from the music production; compared with the traditional music storage mode, the parameter management method and the data storage mode used in the production process have smaller volume and more complete information. Further, repeated production of music can be realized using a parameter file (e.g., a word composition parameter file, a composition parameter file), and thus the aforementioned parameter file can be used as a proof material for a music production process; in addition, the high-degree-of-freedom flow of the interface and the modularization enables the technical scheme to be used as a bottom layer technology to support various music synthesis application scenes; finally, the duplication checking process in the present application can provide similarity checking, protecting the originality of the musical content and the originality of the original material.
Application example:
this application relates to a library of music material and five subsystems: under normal conditions, a music material library and five subsystems cooperatively process and complete tasks, wherein a macro work flow of each subsystem is shown as a figure 2, wherein the first processing flow of the application embodiment comprises the following contents:
-obtaining shared musical material, the musical material comprising: non-original audio for characterizing at least one of a character, a style, and an emotion;
and secondly, performing audio synthesis according to the music material and at least one processing logic of material processing, word and music processing, music editing processing, sound mixing processing, master tape post processing and duplication checking processing to obtain a target object.
Wherein the musical material of the first step and all the material involved in the second step are derived from or stored in a shared library of musical materials. The music material library is responsible for providing human-authored music content for all steps as system-produced raw material.
The traditional splicing technology does not change the materials, and the materials are mainly combined into new audio by utilizing the splicing technology, so that the produced audio directly contains the original materials in a material library; different from the traditional technology, the content of the material library is stored in the method, the music idea processed by the system, namely the abstract expression of the content such as pitch, rhythm, body and the like, and new music is generated without adopting a direct splicing mode. In addition, the similarity degree of the new music content and the original materials is also detected in the duplication checking processing steps after the application, so that the phenomenon of direct splicing is prevented. If the similarity is too high, the verification of the system cannot be passed. This ensures that the musical material in this application is only used to provide musical wisdom, theory and inspiration for the music authoring process, not a specific piece of audio. In order to distinguish from the material used in the conventional art, the musical material used in the present application is also referred to as musical idea.
Specifically, music ideas fall into five categories:
(1) lyric: the text information, word number and vowel taking the phrase as a unit are provided in a text format or an equivalent mode;
(2) melody: the pitch and rhythm information of the monaural part with the phrase as a unit is provided in a single-track MIDI, Music XML format or an equivalent mode;
(3) and (3) chord making: chord information with paragraph as unit is provided in Music XML format of popular chord notation or equivalent mode;
(4) and (3) composing a template: the performance information of the orchestrator and the organization taking the paragraph as the unit is provided in a multi-track MIDI format or an equivalent way.
(5) Tone color template: specific musical instrument or human voice timbre material, provided in audio or equivalent form.
The music material library is only responsible for storing music ideas, and no provision is made for how other parts are used. Therefore, the other steps can freely decide how these musical ideas are used. For example, a model based on statistical learning (e.g., a neural network algorithm) may perform parameter training on music ideas as a training set, resulting in a set of internal parameter sets, which are ultimately used by the model to generate music content; the expert system based model will analyze music ideas as templates and migrate the analysis results to new music content.
It should be emphasized that the system does not use the conventional material splicing technology unless the user actively closes the duplication checking processing step corresponding to the original auditing subsystem. This is because there is a possibility that the duplication checking process conflicts with the stitching technique. The technology based on material splicing may generate content exactly the same as the musical idea in the material library with a certain probability, and thus, the duplication checking processing module may not be used to implement the duplication checking processing. The music material library is used in the application, music wisdom, theory and inspiration are provided for the automatic creation process of music, a new audio frequency is generated based on the music idea provided by the music material library, the splicing of direct original audio frequency materials in the prior art is avoided, the produced content is different from the original materials, and the high flexibility and originality are realized.
In one example, the target object is obtained by performing audio synthesis through at least one processing logic of material processing, word and music processing, music composing processing, sound mixing processing, master tape post-processing and duplicate checking processing, and the process can be fully automatic or semi-automatic, wherein the fully automatic means that all the links can be completed by a machine without any human operation intervention, and the semi-automatic means that any step in the links allows human intervention and content replacement or modification without affecting the automatic operation of other links. Each of the above processing logics is executed or realized by a subsystem, specifically, the steps of material processing are executed by the material preparation subsystem, the steps of vocabulary processing are executed by the vocabulary creation subsystem, the steps of compilation processing or mixing processing are executed by the compilation mixing subsystem, the steps of master tape post processing are executed by the post production subsystem, and the steps of review processing are executed by the original auditing subsystem, fig. 2 shows the work of each subsystem in the whole workflow of music creation, and the workflow shown in fig. 2 can be executed automatically or semi-automatically, so that the efficiency of music content production is greatly improved, and unnecessary manual operations can be eliminated from music production.
During the execution process, according to different specific types of the authoring content, adaptive adjustment can be made on the steps, such as adjusting the sequence of some steps or skipping some steps. Specifically, as shown in fig. 3, there is no need for creating lyrics for the video soundtrack requirement, and therefore, the artificial intelligence word-making step included in the word composition subsystem can be skipped; as shown in fig. 4, the execution sequence can be adjusted, that is, the steps in the song composition and mixing subsystem responsible for composing the song are executed first, and then the steps in the word composition and composing subsystem responsible for composing the lyrics are executed. The method and the device can adjust the sequence of the execution steps according to the specific types of the creation contents, have high flexibility, and are suitable for various types of music creation.
In one example, the music material is used as sample data, and target model training is performed according to the sample data to obtain a trained target model; the trained target model comprises: at least one type of material processing, word and song processing, composition processing, mixing processing, mother tape post processing and duplication checking processing. The method can train samples in an artificial intelligence mode to generate various models and use the models in the process of music creation, and particularly, the trained models can be accessed through a model interface.
Fig. 5 is a detailed workflow diagram according to an embodiment of the present application, in which the functional blocks of the logo (1) are executed by a material preparation subsystem, the functional blocks of the logo (2) are executed by a vocabulary creation subsystem, the functional blocks of the logo (3) are executed by a transcription and mixing subsystem, the functional modules of the logo (4) are executed by a post-production subsystem, and the functional modules of the logo (5) are executed by an original audit subsystem.
"start" and "finish" in the schematic refer to the starting and ending points of a single production run; in a single production process, a user is required to input non-original audio with characteristics such as style, emotion and the like, and after material processing steps such as randomization, word and song parameters and the like, the word and song processing steps are carried out, wherein artificial intelligence composition or artificial intelligence word composition is included, the two steps are provided with corresponding model interfaces, a word and song file is obtained after word composition, and after repeated searching of the word and song, the next step of composition work can be carried out; certainly, for some music which does not need to be composed by vocabularies, the music can be processed directly by materials and then enters into the processes of composing and mixing sound based on non-original audio input by users, wherein artificial intelligence composing has a corresponding model interface, a MIDI file is obtained after composing and mixing sound, and the next step of rendering work can be carried out after composition and duplication checking; after receiving the audio frequency after checking the weight, the audio frequency can be rendered, the step is also provided with a corresponding model interface, an audio frequency file is obtained after rendering, and the whole music creation process is completed after recording parameters.
The system comprises an artificial intelligence composition module, an artificial intelligence composition module and an audio rendering module, wherein the artificial intelligence composition module, the artificial intelligence composition module and the audio rendering module are provided with model interfaces which are used for butt joint of algorithms, models and human-computer interaction interfaces capable of completing corresponding functions. The system can be accessed as long as the model satisfies the following constraint conditions: 1) passing parameters in the manner specified in the application; 2) providing the determined parameter list and the corresponding parameter space for the system; 3) when the parameters are unchanged, the model will produce a unique unchanged result; and 4) the probability of the model producing different results when the parameters are varied approaches 1. Under the condition of meeting the constraint conditions, the model and the operation interface can be freely accessed or replaced, or other methods and devices compatible with the interface can be ensured to be fully expanded in the technical scheme. The design of the model interface also promotes the application freedom of the technical scheme. The system has the capability of supporting different models to be accessed into the system and also can support a human-computer interaction interface for manual operation of a user.
The method and the device have the advantages that the interface and modularization high-freedom-degree process is realized, so that the technical scheme can be used as a bottom layer technology to support various music synthesis application scenes. For example, AI-based music synthesis and performance systems, in whole or in part; a performance system used in a human band; a virtual musical instrument system; physical instruments, toys and robots; a multimedia soundtrack system; digital music workstations and companion plug-ins; an integrated circuit system for hardwiring a process; a programmable music composition engine; internet-based systems, websites, cloud services, application services, data services, mobile services, enterprise-level system support, mobile terminals, web communities, and the like; and sharing the contents of music, music scores, engineering files, data files and the like produced by the system.
In one example, the step of processing the material is performed by the material preparation subsystem, which specifically includes: in response to a first input operation, processing and determining a parameter space and a randomization parameter for at least one of composition, composing and composing; and obtaining a word parameter file according to the parameter space and the randomization parameter.
Specifically, the main purpose of the material preparation subsystem is to provide necessary input data and system parameters required for production for word making, music making and music composing based on user input, and the main task comprises converting the 'non-musical language' in the first input operation of the user into a 'musical language' which can be understood by the computer, and based on the 'non-musical language', retrieving corresponding musical ideas from the material library to provide parameters for the word composition and music composing and mixing subsystem. By "non-music language" is meant a language that does not require background knowledge of the music profession, such as emotions, scenes, and the like. "music language" refers to the musical terminology, such as tempo, key, speed, pitch, dynamics, rhythm, chord, etc., which make up the model inputs of the vocabulary creation and composition mixing subsystem. For example, the following table 2 lists the parameters in the material preparation subsystem, which is only one example of the system parameters, and the application is not limited thereto.
Figure BDA0003037976460000141
Figure BDA0003037976460000151
TABLE 2
The parameters required by the system are (a)1∈A1,…,aN∈AN) Wherein a isiThe i-th parameter used for the above method, AiIs aiThe specific implementation principle of the material preparation subsystem is as follows:
(1) using logic judgment and database search technology to make the "non-music language" inputted by user be correspondent to reasonable parameter value field (A)1,…,AN)。
(2) The number of iterations p is defined as 1.
(3) Randomly selecting a set of data in the range using an independent average sampling method
Figure BDA0003037976460000152
As an execution parameter of the above method.
(4) The execution parameters are cached and subsequent operations are performed using the set of parameters.
(5) And (5) submitting the execution result to an original auditing subsystem for checking, if the check result is passed, skipping to the step (8), and if not, continuing to execute.
(6) If the check result is no pass, the number of iterations increases, and p is p + 1. Randomly selecting an index i from the group of data to be in the range of {1, …, N }, and randomly modifying the corresponding parameter to be in the range of reasonable value
Figure BDA0003037976460000153
The other data is left unchanged and the data is,
Figure BDA0003037976460000154
(7) and (5) caching the execution parameters and jumping to the step (4).
(8) And after the whole music composition process is completed, recording the cached execution parameters, and ending the execution.
Wherein, the above steps (3) and (6) are called as "randomization" in the system; the cached execution parameters (steps (4) and (7)) are respectively called "word composition parameters" and "composition parameters" according to different use objects.
Because the scheme supports the cooperative work of different word-making, music-making and music-composing algorithms, different models can depend on different parameters. Thus, the parameter space is not a fixed set. Only after the model is selected, the parameter space is determined accordingly. In actual use, all parameters and corresponding value ranges thereof are stored in the system in the form of a dictionary table and are finally recorded and saved in the form of a JSON file, other types of self-explanatory files or serialized files. Using the data records recorded in this file, in conjunction with the library, the current production can be completely repeated (i.e., producing identical result files).
In summary, the task of the material preparation subsystem is to receive user input, determine parameter space, provide randomized parameters, and record parameter files, wherein the parameter files are saved in the form of self-explanatory files or serialized files. The operation accurately converts the 'non-music language' input by the user into the 'music language' which can be understood by the computer, thereby departing from the limitation of scenes, and enabling a user without music knowledge to obtain related music only by inputting own ideas. In addition, compared with the traditional music engineering storage mode, the scheme provides a small-size and complete-information parameterized storage mode for the music files. The repeated production of music can be realized by using the parameter file, the possibility that the music files produced by using different parameters are the same is almost zero, and the data cannot be recovered only through the music files. Therefore, technically, the parameter file can be used as a proof material of the music production process, proving the production process of the music producer, thereby protecting the originality thereof.
In one example, the step of processing the vocabulary is performed by the vocabulary creation subsystem, which specifically includes: responding to the second input operation to obtain a word graph parameter; obtaining a main melody MIDI sequence and a corresponding lyric text sequence according to the lyric parameters; and obtaining a word song file according to the MIDI sequence of the main melody and the text sequence of the lyrics.
Specifically, the main task of the word and song creation subsystem is to make words and songs according to the word and song parameter guidance model, produce original main melody MIDI sequence and corresponding lyrics text sequence, and form a 'word and song file'. The second input operation is to input the word composition parameters into the system, wherein the word composition parameters that have been generated by the material preparation subsystem can be selected, including:
(1) speed, beat information. Wherein the tempo refers to the number of quarter notes performed by the music per minute and the tempo refers to the number of quarter notes contained in each measure of the music.
(2) Key signature, chord information. The key sign refers to the change of the tonic along with time in the musical performance process, and the chord refers to the change of the chord along with time in the musical performance process.
(3) Phrase word number information, i.e., the number of words or syllables in the lyrics separated by punctuation marks.
(4) Artificial intelligence composition and other parameters needed for word-making models.
Inputting the word song parameters into a word song creation subsystem, and outputting a word song file, wherein the word song file specifically comprises the following information:
(1) and (4) melody information. Namely the variation information of the main melody (if the song is a song, the melody sung by the main singing, or else the melody played by the main musical instrument) along with the time in the musical performance process.
(2) Lyric information. Namely, the corresponding relation between the singing lyrics and the main melody in the music playing process.
The artificial intelligence composition and word interface of the subsystem can be connected with different models. Specifically, the system can access a recurrent neural network model (RNN), a translator (Transformer), and a pre-training model (such as BERT, GPT2) or other language models as its main structure, respectively, and forms a model through content training in a music material library. When the system runs, the system takes the word and song parameters as global or local conditions and outputs the calculation results of the melody and the lyrics in a random sampling mode. In order to ensure the repeatability of the output result, the pseudo random number seed used in random sampling (i.e., the initial parameters for generating the pseudo random number) must be input as an external parameter.
In conclusion, the task of the word composition subsystem is to receive the word composition parameters and output a word composition file; the lyric synthesis and the melody synthesis can be accessed into the artificial intelligence module through the model interface.
In one example, the step of composing or mixing is performed by a composing and mixing subsystem, which specifically includes: responding to the third input operation to obtain a word song file and a composing parameter; and obtaining files for composing and mixing sound according to the word and music files and the composing parameters, wherein the files for composing and mixing sound are used for representing at least one characteristic of single-track MIDI, multi-track MIDI, musical instruments and timbres.
Specifically, the main task of the composition mixing subsystem is to compose music according to word and music files and composition parameters, and form a multi-track and multi-musical instrument composition MIDI file and a composition engineering file carrying musical instrument, timbre and other information, wherein the third input operation is to input composition parameters, and the composition parameters include:
(1) tempo, beat information, where tempo refers to the number of quarter notes performed by the music per minute and beat refers to the number of quarter notes contained in each measure of the music.
(2) Key number and chord information, wherein the key number refers to the change of the tonic with time in the musical performance process, and the chord refers to the change of the chord with time in the musical performance process.
(3) Paragraph information, i.e., the evolution process of the musical theme, includes the type and length of each paragraph.
(4) And the word song file is the output of the word song creation subsystem.
(5) And other parameters required by the artificial intelligence composition model.
The composition output file comprises the following information:
(1) multi-track MIDI event information: including the start time, end time and pitch of each performance note of each instrument, velocity information, bend information and controller information.
(2) Tone color control parameters of the multitrack musical instrument: including the name and parameter variations of the samplers, synthesizers, effectors used.
Similar to the word and song creation subsystem, the artificial intelligence composing interface of the composing and sound mixing subsystem can also be accessed to different models. Taking an expert system as an example, the song editing interface can be connected with a chord analyzer, the connection and the function of the chord are analyzed by using music theory knowledge, and then the pitch and the controller parameters are endowed again according to the song editing organization body in the material library to form a new MIDI file.
In summary, the task of the composition mixing subsystem is to receive composition parameters, output MIDI files and composition engineering files, and also to access different artificial intelligence models through model interfaces for composition.
In one example, the step of performing the post-processing on the master tape is performed by a post-production subsystem, which specifically includes: responding to the fourth input operation to obtain a word and music file and a file for editing music and mixing sound; performing audio rendering processing according to the word and music file and the files for editing music and mixing sound, and obtaining an audio file according to an audio rendering result; wherein the audio rendering process comprises: single track synthesis, rendering and track combination, and audio processing related to volume adjustment.
Specifically, the post-production subsystem is mainly used for collecting the word and song files and the editing files to perform sound mixing and master tape processing so as to form audio files subjected to audio rendering. Wherein the input content of the fourth input operation includes:
(1) and the word song file is the output file of the word song creation subsystem.
(2) And editing music files, namely editing music mixing subsystem output files.
(3) Other parameters required for the audio rendering model.
The output of the post-production subsystem includes:
(1) a whole digital audio file.
(2) Audio of a particular number of tracks. For example, for a song singing with a vocal, an unaccompanied vocal audio and an accompaniment audio with a vocal removed may be output.
The specific execution mode of the post-production subsystem is divided into three steps: single track composition, rendering and merging, and volume checking. For each step, the system presets a default solution, while also providing a corresponding model interface to replace the default solution, as shown in FIG. 6.
The specific implementation process of the three steps is as follows:
(1) in the single track synthesis stage, the system traverses each track of the multi-track MIDI event information in the song compiling file, and synthesizes single track audio by combining the corresponding musical instrument tone control parameters. In the default scheme, the system is implemented using a virtual instrument, such as a sampler or synthesizer, for the instrument track. The model interface may add custom models for particular audio tracks. For example, for a voice track, a neural network algorithm or a voice sampling, synthesizing, splicing technology, etc. may be accessed to realize the synthesis of the virtual voice.
(2) In the rendering and track combining stage, the system performs rendering calculation on the single-track synthesis result of each track by using effector parameter information recorded in the composition file, and performs sound superposition according to respective parameters such as volume gain and sound phase. In the default solution, the system constructs an Audio node of input-effector-mixer-output for each track using Audio Unit (AU) technology, implementing both real-time playback and offline rendering functions. The model interface can add custom rendering methods to the system. For example, the same kind of functions can be implemented using Virtual Studio (VST) technology or front-end music interaction framework (tone.
(3) In the volume check stage, the system traverses each track of audio generated in the above steps, calculates the total volume after the tracks are combined, and confirms that the total volume meets the loudness standard. If the standard is not met, the system returns to the step 2 and redistributes reasonable volume proportion for the step to ensure the loudness equalization of each instrument. In the default solution, the system measures the full range Loudness (LUFS) after the merger using the European broadcast alliance loudness Standard (EBU R.128), ensuring that its value is within the proper range.
In summary, the tasks of the post-production subsystem include: receiving a word song file and a song editing file, and outputting an audio file; the basic flow of the post-production subsystem comprises the following steps: single track synthesis, rendering and track combination and volume check.
In one example, the step of the duplicate checking process is performed by the original auditing subsystem, and specifically includes: responding to the fifth input operation to obtain a word and music file and a file for editing music and mixing sound; respectively carrying out duplication checking processing on the word and music file and the files for editing music and mixing sound based on similarity comparison with the music material to obtain duplication checking processing results; if the similarity comparison is lower than the threshold, the duplication checking result is determined to be matched with the expected target.
The main task of the original auditing subsystem is to compare the contents of the word and song files and the editing files with the music materials recorded in the material library, and ensure that the similarity is not higher than the specified proportion so as to ensure that the produced contents are different from the input materials. The duplicate checking treatment comprises the following steps: and checking the duplicate of the word and the music and editing the music. The first category, word and music duplication checking, is that after receiving the parameters of the word and music, artificial intelligence composition and artificial intelligence word making are respectively realized, and after obtaining a word and music file based on the artificial intelligence composition and artificial intelligence word making, the word and music file is used as the original content. Further, the word song searching and repeating can be carried out after passing through. And the second type of song editing and duplicate checking is to receive song editing parameters, realize artificial intelligent song editing, obtain MIDI files based on the artificial intelligent song editing, and execute song editing and duplicate checking after taking the MIDI files as the original content. Further, the subsequent audio rendering may be performed after the pass of the compilation pass as shown in FIG. 5. The system separately builds models for each musical idea in the material library to calculate similarity. The specific scheme is as follows:
for lyrics, the original material sequence is recorded as L ═ s1,s2,…,sN). Wherein L is the whole lyrics of a song, siIs the ith phrase in the lyrics. Note (L)(1),…,L(P)) For all the lyrics in the material library,
Figure BDA0003037976460000201
and generating lyrics to be detected for the system.
Firstly, the lyrics to be detected and the lyrics repetition condition of each song in a material library by taking a phrase as a unit are counted. For each phrase
Figure BDA0003037976460000202
Definition set
Figure BDA0003037976460000203
If L is(i)Containing any phrase is
Figure BDA0003037976460000204
Or vice versa) in RkThe index i is recorded in.
After traversing all lyrics in the material library, observe RkThe number of elements can be concluded as follows:
|Rkand | ═ 0, which means that the kth phrase is not repeated with any phrases in the material library. At this time, it is considered that the phrase is not repeated.
|R k1, the k-th phrase is represented to have a relationship with the lyrics of a song in the material library. In this case, it is considered that the phrase and the ith material have a specific overlap phenomenon.
|RkAnd if the content is greater than 1, the content represents that the kth phrase has a containing relation with the lyrics of a plurality of songs in the material library. This usually means that the whole phrase is a common term and can be reasonably reused, and therefore, the phrase is considered to be repeated, but there is no specific repeat phenomenon.
On the basis, two similarity indexes are defined. The first similarity measure is the repetition rate (Rep value), as shown in equation (1):
Figure BDA0003037976460000205
wherein the content of the first and second substances,
Figure BDA0003037976460000207
for the logical decision function, the argument is 1 if true, otherwise 0. The mathematical meaning of the repetition rate is the rate of phrases with specific repetition phenomena to the total phrases. If the Rep value is higher, it means that the song has a 'east spelling and west gathering' condition, i.e. the phrase is directly selected from different materials for splicing. The subsystem needs to avoid too high a repetition rate.
The second similarity index is the maximum quoted ratio (Cite value), as shown in equation (2):
Figure BDA0003037976460000206
the mathematical meaning of the maximum quoted scale is the overall scale of the presence of repeated phenomena (including non-specific repetition) with individual material. If the Cite value is higher, it means that the song has a lot of similarities with a certain material. At this time, even if the Rep value is low (i.e., no specific repetition is apparent, each sentence is a common term), it should not be regarded as an original work. The subsystem also avoids a Cite value that is too high.
For melody, because it contains definite beat information, has multi-dimensional attributes such as pitch and rhythm, and has ambiguity in repeated judgment of rhythm, its modeling process is different from lyric. Recording the sequence of the original material as M ═<K,T,(m1,m2,…,mM)>,mj=(pi,tj) Wherein M is the melody of a song, K is the key number of the natural major key in which the melody is located (relationship), T is the beat information, MkIs the jth melody sound in the melody, pjMIDI pitch information for jth tone, tjTime information in units of quarter notes for the jth tone. Note (M)(1),…,M(P)) The melody is the whole melody in the material library,
Figure BDA00030379764600002112
and generating the melody to be detected for the system.
Firstly, counting the repetitive condition of the melody to be detected and the single melody line in the material library. Unifying the tone marks and the beats of the two melody lines according to the following steps:
transferring the pitch information of the material melody to the key of the target melody:
Figure BDA0003037976460000211
stretching the time information of the material melody to the beat of the target melody according to the proportion:
Figure BDA0003037976460000212
at this time, the process of the present invention,
Figure BDA0003037976460000213
and
Figure BDA0003037976460000214
and the device is comparable under the same key and beat. Next, it is necessary to compare whether the two pieces of melodies have similar parts. The scheme uses a Cross-correlation function (Cross-correlation function) of one-dimensional Gaussian blur to estimate melody similarity. The specific method comprises the following steps:
a time window of w bars (usually w ═ 8) is chosen and calculated according to the following equations (3) - (7):
Figure BDA0003037976460000215
yp(t)=∫wxp(τ)G(t-τ)dτ (4)
Figure BDA0003037976460000217
Figure BDA0003037976460000218
Figure BDA0003037976460000219
wherein x isp(t) is a response function of the occurrence of a pitch p at time t, yp(t) is xp(t) a one-dimensional Gaussian blurred function, G (t) is a one-dimensional Gaussian function used in the blurring process, ri(l) Is a cross-correlation function taking l as the distance between the response function of the target melody and the fuzzy function of the ith material melody of the material library,
Figure BDA00030379764600002110
is the maximum in the cross-correlation function.
The mathematical meaning of the formula is that the rhythm of the material base is taken, a sliding time window is formed according to the length of the w bars, and after Gaussian blur is carried out on the rhythm of the rhythm, the value of the cross-correlation function is calculated with the target rhythm. The value of
Figure BDA00030379764600002111
The larger the degree of overlap.
Taking a threshold r as a boundary, and for each melody slice k, a set can be constructed
Figure BDA0003037976460000221
Its mathematical meaning and R in examination of lyricskQuite similar.
|RkAnd if the value is 0, the k-th melody slice is not repeated with any melody in the material library, and the phrase has no repeated phenomenon.
|R k1, the similarity of the kth melody slice and the melody of a song in the material library is higher than a threshold value, and the phrase and the ith material have a specific repeated phenomenon.
|RkIf the melody is larger than 1, the melody of the kth melody slice and the songs in the material library are all higher than the threshold value. This usually means that the melody is a universal melody or melody specific to the earSince typical melodies can be reasonably reused, the phrase is considered to be repeated, but no specific repeat phenomenon exists.
The subsequent processing steps are also the same as the lyric scheme, and similarity is calculated by using the repetition proportion Rep and the maximum quoted proportion Cite.
For chord progression, the system is represented in the form of a string of characters and processes the similarity calculation in the manner of lyric calculation.
For the composition material, because most of the tissues (such as piano arpeggio and the like) are materials which can be reused in a large amount, the system only considers important melody instruments therein and processes similarity calculation according to the melody calculation mode.
In summary, no matter which material is used, the system obtains two unified similarity measurement indexes after calculation: the repetition rate Rep and the maximum quoted rate Cite.
And after the calculation is finished, the system enters an auditing stage. At this point, music that exceeds a certain threshold will be discarded and re-produced according to the flow of the material preparation subsystem. Music that does not exceed the threshold will pass the audit, the system will output audio results and results of the original audit: overall Rep value, overall note value, and note for each music materialiThe value is obtained.
Finally, since the similarity and originality are not equivalent, the system will skip part of the audits during the actual operation if the following occurs:
because the chord is not originally attributed, the system only calculates the similarity index without auditing.
Since the lyric materials do not all have original works in the copyright sense, the system only checks the original lyric materials with definite copyright.
In summary, the original auditing subsystem calculates the overall Rep value, the overall Cite value and the Cite of each music material of the target musiciComparing the content of the word song file and the editing song file with the music material recorded in the material library to ensure that the similarity is not higher than a specified proportion, and checking whether the music exists originallyInvasive risk, the process of which is shown in fig. 7. Music that does not exceed the threshold will pass the audit phase, otherwise it will enter the material preparation subsystem to be re-produced. For the music which finally passes the audit, the system outputs the result of the calculation besides the audio file.
According to an embodiment of the present application, there is provided an information processing apparatus, and fig. 8 is a schematic diagram of a configuration of the information processing apparatus according to the embodiment of the present application, and as shown in fig. 8, the information processing apparatus includes: an obtaining module 81 for obtaining shared musical material, the musical material comprising: non-original audio for characterizing at least one of a character, a style, and an emotion; and the synthesis module 82 is used for performing audio synthesis according to the music material and at least one processing logic of material processing, word and music processing, music composition processing, sound mixing processing, master tape post-processing and duplication checking processing to obtain a target object.
In one embodiment, the synthesis module is configured to: and according to the music material and all processing logics including material processing, word and music processing, music composing processing, sound mixing processing, master tape post processing and duplicate checking processing, carrying out audio synthesis in a full-automatic mode to obtain an original audio file, and taking the original audio file as the target object.
In one embodiment, the synthesis module is configured to: and adding artificial assistance according to the music material and any processing logic of material processing, word and music processing, music editing processing, sound mixing processing, master tape post processing and duplication checking processing, performing audio synthesis in a semi-automatic mode to obtain an intermediate processing result or an original audio file, and taking the intermediate processing result or the original audio file as the target object.
In one embodiment, the apparatus further comprises an artificial intelligence module configured to: taking the music material as sample data, and performing target model training according to the sample data to obtain a trained target model; the trained target model comprises: at least one type of material processing, word and song processing, composition processing, mixing processing, mother tape post processing and duplication checking processing.
In one embodiment, in the case that the processing logic is a material processing, the processing logic further includes a material preparation module, configured to: in response to a first input operation, processing and determining a parameter space and a randomization parameter for at least one of composition, composing and composing; and obtaining a word parameter file according to the parameter space and the randomization parameter.
In one embodiment, in the case that the processing logic is a vocabulary processing, the processing logic further includes a vocabulary creation module configured to: responding to the second input operation to obtain a word graph parameter; obtaining a main melody MIDI sequence and a corresponding lyric text sequence according to the lyric parameters; and obtaining a word song file according to the MIDI sequence of the main melody and the text sequence of the lyrics.
In an embodiment, when the processing logic is an assembly processing and a mixing processing, the processing logic further includes an assembly mixing module, configured to: responding to the third input operation to obtain a word song file and a composing parameter; and obtaining files for composing and mixing sound according to the word and music files and the composing parameters, wherein the files for composing and mixing sound are used for representing at least one characteristic of single-track MIDI, multi-track MIDI, musical instruments and timbres.
In an embodiment, in the case that the processing logic is a master tape post-processing, the processing logic further includes a post-processing module, configured to: responding to the fourth input operation to obtain a word and music file and a file for editing music and mixing sound; performing audio rendering processing according to the word and music file and the files for editing music and mixing sound, and obtaining an audio file according to an audio rendering result; wherein the audio rendering process comprises: single track synthesis, rendering and track combination, and audio processing related to volume adjustment.
In an embodiment, in the case that the processing logic is a duplicate checking process, the processing logic further includes an original audit module, configured to: responding to the fifth input operation to obtain a word and music file and a file for editing music and mixing sound; respectively carrying out duplication checking processing on the word and music file and the files for editing music and mixing sound based on similarity comparison with the music material to obtain duplication checking processing results; if the similarity comparison is lower than the threshold, the duplication checking result is determined to be matched with the expected target.
The functions of each module in each apparatus in the embodiment of the present application may refer to corresponding descriptions in the above method, and are not described herein again.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 9 is a block diagram of an electronic device for implementing the object detection method according to the embodiment of the present application. The electronic device may be the aforementioned deployment device or proxy device. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 9, the electronic apparatus includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 9 illustrates an example of a processor 801.
The memory 802 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the object detection method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the object detection method provided by the present application.
The memory 802, as a non-transitory computer-readable storage medium, may be used for storing non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules (e.g., the acquisition module, the synthesis module, and the like shown in fig. 8) corresponding to the object detection method in the embodiments of the present application. The processor 801 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 802, that is, implements the object detection method in the above-described method embodiments.
The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 802 optionally includes memory located remotely from the processor 801, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the target detection method may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or other means, and are exemplified by a bus in fig. 9.
The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (20)

1. An information processing method, characterized in that the method comprises:
obtaining shared musical material, said musical material comprising: non-original audio for characterizing at least one of a character, a style, and an emotion;
and carrying out audio synthesis according to the music material and at least one processing logic of material processing, word and music processing, music editing processing, sound mixing processing, master tape post processing and duplicate checking processing to obtain a target object.
2. The method of claim 1, wherein the audio synthesizing according to the music material and at least one of processing logic including material processing, word and song processing, song composition processing, sound mixing processing, mother tape post-processing and duplication checking processing to obtain the target object comprises:
and according to the music materials and all processing logics including material processing, word and music processing, music composing processing, sound mixing processing, master tape post processing and duplicate checking processing, carrying out audio synthesis in a full-automatic mode to obtain an original audio file, and taking the original audio file as the target object.
3. The method of claim 1, wherein the audio synthesizing according to the music material and at least one of processing logic including material processing, word and song processing, song composition processing, sound mixing processing, mother tape post-processing and duplication checking processing to obtain the target object comprises:
and adding artificial assistance according to the music materials and any processing logic of material processing, word and music processing, music editing processing, sound mixing processing, master tape post processing and duplicate checking processing, performing audio synthesis in a semi-automatic mode to obtain an intermediate processing result or an original audio file, and taking the intermediate processing result or the original audio file as the target object.
4. The method of claim 1, further comprising:
taking the music material as sample data, and performing target model training according to the sample data to obtain a trained target model;
the trained target model comprises: at least one type of material processing, word and song processing, composition processing, mixing processing, mother tape post processing and duplication checking processing.
5. The method according to any one of claims 1-4, wherein if the processing logic is material processing, further comprising:
in response to a first input operation, processing and determining a parameter space and a randomization parameter for at least one of composition, composing and composing;
and obtaining a word parameter file according to the parameter space and the randomization parameters.
6. The method of any of claims 1-4, wherein if the processing logic is lexical processing, further comprising:
responding to the second input operation to obtain a word graph parameter;
obtaining a main melody digital interface MIDI sequence and a corresponding lyric text sequence according to the lyric parameters;
and obtaining a word song file according to the main melody MIDI sequence and the lyric text sequence.
7. The method according to any one of claims 1-4, wherein in the case that the processing logic is a composition process or a mixing process, further comprising:
responding to the third input operation to obtain a word song file and a composing parameter;
and obtaining files for composing and mixing sound according to the word song files and the composing parameters, wherein the files for composing and mixing sound are used for representing at least one characteristic of single-track MIDI, multi-track MIDI, musical instruments and timbres.
8. The method according to any of claims 1-4, wherein if the processing logic is mastering, further comprising:
responding to the fourth input operation to obtain a word and music file and a file for editing music and mixing sound;
performing audio rendering processing according to the word and music files and the files for composing and mixing sound, and obtaining audio files according to audio rendering results;
wherein the audio rendering process comprises: single track synthesis, rendering and track combination, and audio processing related to volume adjustment.
9. The method according to any of claims 1-4, wherein if the processing logic is a duplicate checking process, further comprising:
responding to the fifth input operation to obtain a word and music file and a file for editing music and mixing sound;
respectively carrying out duplication checking processing on the word and music files and the files for editing music and mixing sound based on similarity comparison with the music materials to obtain duplication checking processing results;
and if the similarity comparison is lower than a threshold value, determining that the duplication checking processing result is matched with an expected target.
10. An information processing apparatus characterized in that the apparatus comprises:
an obtaining module, configured to obtain shared music materials, where the music materials include: non-original audio for characterizing at least one of a character, a style, and an emotion;
and the synthesis module is used for carrying out audio synthesis according to the music material and at least one processing logic of material processing, word and music processing, music editing processing, sound mixing processing, master tape post-processing and duplication checking processing to obtain a target object.
11. The apparatus of claim 10, wherein the synthesis module is configured to:
and according to the music materials and all processing logics including material processing, word and music processing, music composing processing, sound mixing processing, master tape post processing and duplicate checking processing, carrying out audio synthesis in a full-automatic mode to obtain an original audio file, and taking the original audio file as the target object.
12. The apparatus of claim 10, wherein the synthesis module is configured to:
and adding artificial assistance according to the music materials and any processing logic of material processing, word and music processing, music editing processing, sound mixing processing, master tape post processing and duplicate checking processing, performing audio synthesis in a semi-automatic mode to obtain an intermediate processing result or an original audio file, and taking the intermediate processing result or the original audio file as the target object.
13. The apparatus of claim 10, further comprising an artificial intelligence module to:
taking the music material as sample data, and performing target model training according to the sample data to obtain a trained target model;
the trained target model comprises: at least one type of material processing, word and song processing, composition processing, mixing processing, mother tape post processing and duplication checking processing.
14. The apparatus of any of claims 10-13, further comprising a material preparation module, for use if the processing logic is material processing,
in response to a first input operation, processing and determining a parameter space and a randomization parameter for at least one of composition, composing and composing;
and obtaining a word parameter file according to the parameter space and the randomization parameters.
15. The apparatus of any one of claims 10-13, wherein the processing logic, when processing the vocabulary, further comprises a vocabulary authoring module to:
responding to the second input operation to obtain a word graph parameter;
obtaining a main melody digital interface MIDI sequence and a corresponding lyric text sequence according to the lyric parameters;
and obtaining a word song file according to the main melody MIDI sequence and the lyric text sequence.
16. The apparatus of any of claims 10-13, wherein the processing logic, in the case of an assembly process or a mixing process, further comprises an assembly mixing module to:
responding to the third input operation to obtain a word song file and a composing parameter;
and obtaining files for composing and mixing sound according to the word song files and the composing parameters, wherein the files for composing and mixing sound are used for representing at least one characteristic of single-track MIDI, multi-track MIDI, musical instruments and timbres.
17. The apparatus of any of claims 10-13, further comprising a post-processing module to, where the processing logic is master tape post-processing,
responding to the fourth input operation to obtain a word and music file and a file for editing music and mixing sound;
performing audio rendering processing according to the word and music files and the files for composing and mixing sound, and obtaining audio files according to audio rendering results;
wherein the audio rendering process comprises: single track synthesis, rendering and track combination, and audio processing related to volume adjustment.
18. The apparatus of any of claims 10-13, further comprising an originality review module to, if the processing logic is a duplicate checking process,
responding to the fifth input operation to obtain a word and music file and a file for editing music and mixing sound;
respectively carrying out duplication checking processing on the word and music files and the files for editing music and mixing sound based on similarity comparison with the music materials to obtain duplication checking processing results;
and if the similarity comparison is lower than a threshold value, determining that the duplication checking processing result is matched with an expected target.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.
CN202110448803.3A 2021-04-25 2021-04-25 Information processing method, information processing device, electronic equipment and storage medium Pending CN113178182A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110448803.3A CN113178182A (en) 2021-04-25 2021-04-25 Information processing method, information processing device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110448803.3A CN113178182A (en) 2021-04-25 2021-04-25 Information processing method, information processing device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113178182A true CN113178182A (en) 2021-07-27

Family

ID=76925544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110448803.3A Pending CN113178182A (en) 2021-04-25 2021-04-25 Information processing method, information processing device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113178182A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106652984A (en) * 2016-10-11 2017-05-10 张文铂 Automatic song creation method via computer
KR20180070340A (en) * 2016-12-16 2018-06-26 아주대학교산학협력단 System and method for composing music by using artificial intelligence
CN109785818A (en) * 2018-12-18 2019-05-21 武汉西山艺创文化有限公司 A kind of music music method and system based on deep learning
CN110599985A (en) * 2018-06-12 2019-12-20 阿里巴巴集团控股有限公司 Audio content generation method, server side equipment and client side equipment
CN112331234A (en) * 2020-10-27 2021-02-05 北京百度网讯科技有限公司 Song multimedia synthesis method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106652984A (en) * 2016-10-11 2017-05-10 张文铂 Automatic song creation method via computer
KR20180070340A (en) * 2016-12-16 2018-06-26 아주대학교산학협력단 System and method for composing music by using artificial intelligence
CN110599985A (en) * 2018-06-12 2019-12-20 阿里巴巴集团控股有限公司 Audio content generation method, server side equipment and client side equipment
CN109785818A (en) * 2018-12-18 2019-05-21 武汉西山艺创文化有限公司 A kind of music music method and system based on deep learning
CN112331234A (en) * 2020-10-27 2021-02-05 北京百度网讯科技有限公司 Song multimedia synthesis method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11017750B2 (en) Method of automatically confirming the uniqueness of digital pieces of music produced by an automated music composition and generation system while satisfying the creative intentions of system users
CN108806656B (en) Automatic generation of songs
US20190237051A1 (en) Method of and system for controlling the qualities of musical energy embodied in and expressed by digital music to be automatically composed and generated by an automated music composition and generation engine
CN108806655B (en) Automatic generation of songs
Humphrey et al. An introduction to signal processing for singing-voice analysis: High notes in the effort to automate the understanding of vocals in music
Goto et al. Music interfaces based on automatic music signal analysis: new ways to create and listen to music
Gupta et al. Deep learning approaches in topics of singing information processing
Chandna et al. A deep-learning based framework for source separation, analysis, and synthesis of choral ensembles
CN113178182A (en) Information processing method, information processing device, electronic equipment and storage medium
Ortega et al. Phrase-level modeling of expression in violin performances
Zhu et al. A Survey of AI Music Generation Tools and Models
Zhou et al. AnimeTAB: A new guitar tablature dataset of anime and game music
Thompson IV Creating Musical Scores Inspired by the Intersection of Human Speech and Music Through Model-Based Cross Synthesis
Fazekas Semantic Audio Analysis Utilities and Applications.
US20240038205A1 (en) Systems, apparatuses, and/or methods for real-time adaptive music generation
Lu et al. Towards the Implementation of an Automatic Composition System for Popular Songs
Zheng et al. FT-GAN: Fine-Grained Tune Modeling for Chinese Opera Synthesis
CN113177635A (en) Information processing method, information processing device, electronic equipment and storage medium
Herremans et al. International Workshop on Deep Learning and Music
Rao Accepted Papers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination