WO2019232959A1 - Artificial intelligence-based composing method and system, computer device and storage medium - Google Patents

Artificial intelligence-based composing method and system, computer device and storage medium Download PDF

Info

Publication number
WO2019232959A1
WO2019232959A1 PCT/CN2018/104715 CN2018104715W WO2019232959A1 WO 2019232959 A1 WO2019232959 A1 WO 2019232959A1 CN 2018104715 W CN2018104715 W CN 2018104715W WO 2019232959 A1 WO2019232959 A1 WO 2019232959A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
lstm
music
time series
network
Prior art date
Application number
PCT/CN2018/104715
Other languages
French (fr)
Chinese (zh)
Inventor
王义文
刘奡智
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019232959A1 publication Critical patent/WO2019232959A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/111Automatic composing, i.e. using predefined musical rules

Definitions

  • the present application relates to the field of information technology, and in particular, to a composition method, system, computer equipment, and storage medium based on artificial intelligence.
  • the existing automatic composition methods based on artificial intelligence technology mainly include automatic composition based on heuristic search and automatic composition based on genetic algorithm.
  • the existing automatic composition based on heuristic search is only suitable for the case where the length of the music piece is short, and its search efficiency decreases exponentially with the increase of the length of the music piece. Therefore, this method is not feasible for long pieces of music.
  • the automatic composition method based on genetic algorithm inherits some typical disadvantages of genetic algorithm, such as heavy dependence on the initial population and difficult selection of genetic operators.
  • a composition method based on artificial intelligence including:
  • the note information includes a playback start time, a playback duration, and a pitch value of each note
  • a composition system based on artificial intelligence including:
  • An acquisition module configured to acquire note information, where the note information includes a start time, a playback duration, and a pitch value of each note;
  • the creation module is set to create a music time series and perform autoregressive integrated moving average ARIMA model prediction to obtain the time series prediction results;
  • a construction module configured to construct a music prediction model according to the note information and a time series prediction result
  • the composition module is configured to determine a topology structure of the music prediction model, combine the music prediction model and the determined topology structure according to the note information and the time series prediction result, obtain the predicted music, and implement automatic composition.
  • a computer device includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor causes the processor to perform the following steps:
  • the note information includes a playback start time, a playback duration, and a pitch value of each note
  • a storage medium storing computer-readable instructions.
  • the one or more processors execute the following steps:
  • the note information includes a playback start time, a playback duration time, and a pitch value of each note
  • the aforementioned artificial intelligence-based composition method, system, computer device, and storage medium obtain musical note information by including the note start time, duration, and pitch value of each note, creating a music time series, and performing autoregressive integration
  • the moving average ARIMA model predicts the time series prediction results, and establishes a single-layer long-term and short-term memory network LSTM.
  • the single-layer LSTM contains LSTM blocks, and the LSTM block contains a gate.
  • the gate determines whether the input is important and whether It is remembered and can be output.
  • the initial state is randomly initialized. Each step corresponds to an input value.
  • the input value is the word vector corresponding to each word.
  • the Dropout mechanism is used to randomly delete parts of the hidden layer and maintain input and output.
  • the neuron is unchanged, and multiple basic LSTM units are summarized into one using a multilayer RNN unit.
  • the state of the RNN unit is controlled through the gate structure, and information is deleted or added to it.
  • a basic LSTM unit is called again.
  • Set the initial value for the initial state of each layer of the network initialize the weight parameters, and use
  • the backward propagation mechanism reversely adjusts the weight parameters in the two-layer network layer by layer, iteratively improves the training accuracy of the network, optimizes the loss function, constructs a music prediction model, determines the topological structure of the music prediction model, and according to the note information and time
  • the sequence prediction result is combined with the music prediction model and the determined topological structure to obtain the predicted music and realize automatic composition.
  • the production process is greatly simplified. Users can participate in the creation and generate different songs through different inputs without the need for midway intervention. It is flexible and rich.
  • FIG. 1 is a flowchart of a composition method based on artificial intelligence in an embodiment
  • FIG. 2 is a flowchart of constructing a music prediction model based on note information and time series prediction results in an embodiment
  • FIG. 3 is a flowchart of establishing a single-layer long-term and short-term memory network LSTM using a Dropout mechanism to establish a hidden layer in an embodiment
  • FIG. 4 is a flowchart of copying a single-layer network into a double-layer network according to a single-layer LSTM according to an embodiment
  • composition system based on artificial intelligence in an embodiment
  • FIG. 6 is a structural block diagram of a building module in an embodiment
  • FIG. 7 is a structural block diagram of an establishing unit in an embodiment
  • FIG. 8 is a structural block diagram of a duplication unit in an embodiment.
  • an artificial intelligence-based composition method includes the following steps:
  • Step S101 Acquire note information, where the note information includes a play start time, a play duration, and a pitch value of each note;
  • LSTM Long Short-Term Memory
  • LSTM Long Short-Term Memory
  • Intelligent composition is based on the use of two dimensions: pitch and duration to express music characteristics, and realizes simple input of notes to the system. After that, the machine can learn according to the characteristics of the arrangement of the notes, and write a more complete and rich tune, which simulates the performance of a musician, and the song played is like agile and rigid.
  • Step S102 Create a music time series, perform autoregressive integrated moving average ARIMA model prediction, and obtain a time series prediction result;
  • the method of music time series model stationary is generally tested according to the test model.
  • you can also use some operations to make the time series stable such as taking logarithms and differences), and then perform ARIMA model prediction to obtain stable time series prediction results.
  • the full name of the ARIMA model is the Autoregressive Integrated Moving Average Model (Autoregressive Integrated Moving Average Model).
  • the data sequence formed by the predicted object over time is regarded as a random sequence, and a certain mathematical model is used to approximate this sequence. Once this model is identified, future values can be predicted from past and present values of the time series.
  • Step S103 construct a music prediction model according to the note information and the time series prediction result
  • the advantage of the dual-channel long-short memory network (LSTM) used in this technical solution is that it can "smartly" forget long-term memories that are not needed, learn new and useful information, and store useful information in long-term memory again. For example, the previously played movements have no effect on the present, so the movement information is removed, and the melody in the latest movement is recorded in the "long-term memory”.
  • the input of the LSTM is xt
  • the output is yt
  • the weight of the input layer to the hidden layer is W
  • the weight of the hidden layer to the hidden layer is U.
  • the status is summarized as an aid for the next input.
  • Step S104 Determine the topological structure of the music prediction model, combine the music prediction model and the determined topological structure according to the note information and the time series prediction result, obtain the predicted music, and implement automatic composition.
  • the topology is a hedged neural network structure.
  • a recurrent neural network (RNN) is used as an example.
  • the topology includes two independent RNNs and a connection unit, and two independent RNNs. They are named LF_RNN and HF_RNN, which are used for low frequency multi-frequency feature combination and high frequency multi-frequency feature combination, respectively.
  • Extract the frame-level audio characteristics of the music corresponding to the music file and then obtain the module to obtain the frame-level audio characteristics carrying the frequency band information according to the combination model of the above-mentioned frame-level audio features and the pre-built music frequency band characteristics, and the frame-level audio information according to the above-mentioned frequency band Audio features and pre-built music prediction models to obtain predicted music, so that automatic composition can be achieved.
  • performing the autoregressive integrated moving average ARIMA model prediction includes: checking the audio, sound frame, and their changing laws according to the autocorrelation function and partial autocorrelation function to identify the stability of the time series; non-stationary time The sequence is smoothed. According to the recognition rules of the time series, the ARIMA model is used to estimate the parameters of the ARIMA model. Hypothesis testing is performed to diagnose whether the residual sequence is white noise. Prediction analysis is performed using the tested model.
  • the ARIMA model prediction program is: testing the audio, sound frame, and their changes according to the autocorrelation function and partial autocorrelation function to identify the stability of the time series; smoothing the non-stationary time series; according to the recognition rules of the time series
  • the ARIMA model is selected for parameter estimation of the ARIMA model; a hypothesis test is performed to diagnose whether the residual sequence is white noise; and a predictive analysis is performed using the tested model.
  • constructing a music prediction model according to the note information and the time series prediction result includes:
  • Step S201 Establish a single-layer long-term and short-term memory network LSTM, and use the Dropout mechanism to establish a hidden layer;
  • LSTM long-term and short-term memory is a kind of time-recurrent neural network that contains LSTM blocks (blocks) or other types of neural networks. LSTM blocks may be described as intelligent network units, because it can memorize values of indefinite time length. There is a gate that determines whether the input is important enough to be remembered and can be output.
  • the initial state is initialized randomly, one input per step, and the actual input is the word vector corresponding to each word.
  • Hidden state the number of hidden nodes, can go to the state of the last step as the output, can also weight or directly average the state of all steps as the output, and flexibly adjust according to the specific task.
  • Step S202 copy the single-layer network into a double-layer network according to the single-layer LSTM;
  • LSTM has the form of a series of repeating neural network modules, which have different structures. It has four neural network layers that interact in a special way.
  • the horizontal line represents the state of the unit and has a linear interaction, which can ensure that the information is passed down.
  • Selectively let information pass consisting of a sigmoid neural network layer and pointwise multiplication operations.
  • the sigmoid layer maps variables between 0 and 1 and describes whether each component should pass a threshold.
  • Sigmoid layer Sigmoid is a neural network algorithm that uses the sigmoid function as the activation function. In order to describe the neural network, Sigmoid uses the sigmoid function to perform operations on each input data. You can also choose the hyperbolic tangent function (tanh).
  • LSTM has three similar thresholds: the "forgotten threshold sigmoid layer” that determines which information needs to be discarded from the unit state, the "input threshold layer” that determines what new information needs to be stored in the unit state, and the “input threshold layer” that adds values to the state tanh layer ", sigmoid layer and tanh function that determine which parts of the unit state need to be output.
  • the dual-channel long and short memory network which is the use of two LSTM networks together, can enhance the “learning” ability of the neural network, so that the intelligent system can fully “learn” the characteristics of music and learn the style of music performance.
  • multiple layers of networks are required.
  • the LSTM controls the state of the cell through a structure called a gate, and deletes or adds information to it. It is worth noting that each time a unit is added, a base LSTM unit needs to be recalled. Because the function declares internal variables once every time, if you don't do so, these variables will be reused, resulting in an error. Set an initial value for the initial state of each layer. You can also use the zero_state method to generate the initial value, but then you cannot display the intermediate state. Choose according to the actual application. Multi-layer LSTMs are better than single layers. This trend is in line with the general law of multi-layer neural network modeling when the data dimension becomes larger and nonlinear factors increase.
  • step S203 the weight parameters are initialized, the back-propagation mechanism is used to adjust the weight parameters in the double-layer network layer by layer, and iteratively improves the training accuracy of the network to optimize the loss function and construct a music prediction model.
  • the network model uses the gradient descent method to minimize the loss function to adjust the weight parameters in the network layer by layer, and iteratively trains and improves Network accuracy.
  • the parameters are updated. The same gradient is calculated in the back-propagation to update the same parameters. From the perspective of the cost function, the parameter initialization cannot be too large, so we want the initial value of the weight to be very close to 0 and not equal to 0.
  • establishing a single-layer long-term and short-term memory network LSTM, and using the Dropout mechanism to establish a hidden layer include:
  • Step S301 Establish a single-layer long-term and short-term memory network LSTM.
  • the single-layer LSTM contains LSTM blocks.
  • the LSTM block contains a gate. The gate determines whether the input is important, whether it can be remembered, and whether it can be output.
  • the training process of a neural network is to forward the input through the network and then back-propagate the error.
  • the Dropout mechanism is aimed at this process, randomly deleting some units of the hidden layer and performing the above process.
  • the above process may include randomly deleting some hidden neurons in the network, keeping the input and output neurons unchanged; transmitting the input forward through the modified network, and then reversing the error through the modified network Propagation; for another batch of training samples, the above operation was repeated to achieve the role of a Vote mechanism.
  • we can use the same data to train 5 different neural networks may get multiple different results, we can use a vote mechanism to determine the winner of multiple votes, so it is relatively improved Network accuracy and robustness.
  • robustness means that the control system maintains certain other performance characteristics under a certain (structure, size) parameter perturbation. According to different definitions of performance, it can be divided into stable robustness and performance robustness.
  • the fixed controller designed with the robustness of the closed-loop system as the target is called a robust controller.
  • Step S302 the initial state is initialized randomly, each step corresponds to an input value, and the input value is a word vector corresponding to each word;
  • the initial state is initialized randomly, one input per step, and the actual input is the word vector corresponding to each word.
  • Hidden state the number of hidden nodes, can go to the state of the last step as the output, and can also weight or directly average the state of all steps as the output, and flexibly adjust according to the specific task.
  • step S303 a Dropout mechanism is used to randomly delete some units in the hidden layer, keeping the input and output neurons unchanged.
  • the Dropout mechanism is aimed at this process, randomly deleting some units of the hidden layer and performing the above process.
  • the above process may include randomly deleting some hidden neurons in the network, keeping the input and output neurons unchanged; transmitting the input forward through the modified network, and then reversing the error through the modified network Propagation; for another batch of training samples, the above operation was repeated to achieve the role of a Vote mechanism.
  • replicating a single-layer network into a dual-layer network according to a single-layer LSTM includes:
  • step S401 a plurality of basic LSTM units are aggregated into one by using a multilayer RNN unit;
  • step S402 the state of the RNN unit is controlled through the gate structure, and information is deleted or added to it, and a basic LSTM unit is called again each time an RNN unit is added;
  • step S403 an initial value is set for an initial state of each layer of the network.
  • an artificial intelligence-based composition system including: an acquisition module configured to acquire note information, where the note information includes a playback start time and a playback duration of each note And pitch values; a creation module, set to create a music time series, and performing an autoregressive integral moving average ARIMA model prediction to obtain a time series prediction result; a construction module, set to construct a music prediction model based on the note information and the time series prediction result A composition module, configured to determine the topology of the music prediction model, combining the music prediction model and the determined topology according to the note information and time series prediction results, to obtain the predicted music, and realize automatic composition.
  • the creation module further includes: a recognition unit configured to test the audio, sound frame, and its variation rule according to the auto-correlation function and the partial auto-correlation function to identify the smoothness of the time series; the smoothing processing unit , Set to smooth the non-stationary time series; parameter estimation unit, set to select the ARIMA model to estimate the parameters of the ARIMA model according to the recognition rules of the time series; diagnostic unit, set to perform hypothesis testing, and diagnose the residual sequence Whether it is white noise; the prediction unit is set to perform prediction analysis using a model that has passed the test.
  • a recognition unit configured to test the audio, sound frame, and its variation rule according to the auto-correlation function and the partial auto-correlation function to identify the smoothness of the time series
  • the smoothing processing unit Set to smooth the non-stationary time series
  • parameter estimation unit set to select the ARIMA model to estimate the parameters of the ARIMA model according to the recognition rules of the time series
  • diagnostic unit set to perform hypothesis testing, and diagnose the residual sequence Whether it is white noise
  • the building module further includes: a building unit configured to build a single-layer long-term and short-term memory network LSTM, and a Dropout mechanism to establish a hidden layer; a replication unit configured to be based on the single layer LSTM, which replicates a single-layer network into a double-layer network; the optimization unit is set to initialize the weight parameters, and the back-propagation mechanism is used to adjust the weight parameters in the double-layer network inversely layer by layer to improve the training accuracy of the network and optimize the loss Function to build a music prediction model.
  • a building unit configured to build a single-layer long-term and short-term memory network LSTM, and a Dropout mechanism to establish a hidden layer
  • a replication unit configured to be based on the single layer LSTM, which replicates a single-layer network into a double-layer network
  • the optimization unit is set to initialize the weight parameters
  • the back-propagation mechanism is used to adjust the weight parameters in the double-layer network inversely layer by layer to
  • the establishing unit further includes: a establishing subunit configured to establish a single-layer long-term and short-term memory network LSTM, where the single-layer LSTM includes an LSTM block, and the LSTM block includes A gate that determines whether the input is important, whether it can be remembered, and whether it can be output; the corresponding subunit is set to the initial state and randomly initialized, each step corresponds to an input value, and the input value corresponds to each word Word vector; delete subunits, set to use the Dropout mechanism to randomly delete parts of the hidden layer, keeping the input and output neurons unchanged.
  • a establishing subunit configured to establish a single-layer long-term and short-term memory network LSTM, where the single-layer LSTM includes an LSTM block, and the LSTM block includes A gate that determines whether the input is important, whether it can be remembered, and whether it can be output; the corresponding subunit is set to the initial state and randomly initialized, each step corresponds to an input value, and the input value corresponds to each
  • the duplication unit further includes: a summary sub-unit configured to use a multi-layer RNN unit to summarize a plurality of basic LSTM units into one; a deletion sub-unit configured to control a state of the RNN unit through a gate structure, And delete or add information to it, add an RNN unit each time to recall a basic LSTM unit; set sub-units, set to set initial values for the initial state of each layer of the network.
  • a computer device in one embodiment, includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor causes the processor to execute the computer program.
  • a storage medium storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors are caused to perform manual-based operations in the foregoing embodiments. Steps in a smart composition method.
  • the storage medium may be a non-volatile storage medium.
  • the program may be stored in a computer-readable storage medium.
  • the storage medium may include: Read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Machine Translation (AREA)

Abstract

Disclosed in the present application are an artificial intelligence-based composing method and system, a computer device and a storage medium. The method comprises: acquiring musical note information, the musical note information comprising playback start time, a playback duration and a pitch value of each musical note; creating a music time sequence, performing an auto-regressive integrated moving average (ARIMA) model prediction to obtain a time sequence prediction result; constructing a music prediction model according to the musical note information and the time sequence prediction result; determining a topological structure of the music prediction model, obtaining predicted music according to the musical note information and the time sequence prediction result and combining the music prediction model and the determined topological structure, to implement automatic composing. Said method implements automatic composing, simplifying the manufacturing process, and a user can participate in creation, to generate different compositions by means of different inputs, without the need for intermediate intervention. The invention is flexible and abundant.

Description

基于人工智能的作曲方法、系统、计算机设备和存储介质Composition method, system, computer equipment and storage medium based on artificial intelligence
本申请要求于2018年06月04日提交中国专利局、申请号为201810561621.5、发明名称为“基于人工智能的作曲方法、系统、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 4, 2018, with application number 201810561621.5, and the invention name is "Artificial Intelligence-based Composition Method, System, Computer Equipment, and Storage Medium" Incorporated by reference in this application.
技术领域Technical field
本申请涉及信息技术领域,尤其涉及一种基于人工智能的作曲方法、系统、计算机设备和存储介质。The present application relates to the field of information technology, and in particular, to a composition method, system, computer equipment, and storage medium based on artificial intelligence.
背景技术Background technique
随着计算机技术在音乐处理上的应用,计算机音乐应运而生。计算机音乐作为新生代艺术,逐渐渗透到音乐的创作、乐器演奏、教育、娱乐等多个方面,采用人工智能技术进行自动作曲作为计算机音乐中较新的研究方向,近年来受到了相关领域研究人员的高度重视。With the application of computer technology in music processing, computer music came into being. Computer music, as a new generation of art, has gradually penetrated into many aspects such as music creation, musical instrument performance, education, and entertainment. The use of artificial intelligence technology for automatic composition is a relatively new research direction in computer music. In recent years, it has received researchers in related fields. Highly valued.
现有的基于人工智能技术的自动作曲方法主要有基于启发式搜索的自动作曲和基于遗传算法的自动作曲。但是,现有的基于启发式搜索的自动作曲只适设置为乐曲长度短的情况,其搜索效率随着乐曲长度的增加成指数级下降,因而对于长度较长的乐曲该方法的可行性差。而基于遗传算法的自动作曲方法继承了遗传算法的一些典型缺点,比如对初始种群依赖大、遗传算子难以精准选定等。The existing automatic composition methods based on artificial intelligence technology mainly include automatic composition based on heuristic search and automatic composition based on genetic algorithm. However, the existing automatic composition based on heuristic search is only suitable for the case where the length of the music piece is short, and its search efficiency decreases exponentially with the increase of the length of the music piece. Therefore, this method is not feasible for long pieces of music. However, the automatic composition method based on genetic algorithm inherits some typical disadvantages of genetic algorithm, such as heavy dependence on the initial population and difficult selection of genetic operators.
发明内容Summary of the Invention
基于此,有必要针对现行自动作曲方法的弊端,提供一种基于人工智能的作曲方法、系统、计算机设备和存储介质。Based on this, it is necessary to provide an artificial intelligence-based composition method, system, computer equipment, and storage medium in view of the disadvantages of the current automatic composition method.
一种基于人工智能的作曲方法,包括:A composition method based on artificial intelligence, including:
获取音符信息,所述音符信息包括每个音符的播放开始时间、播放持续时间和音高值;Acquiring note information, where the note information includes a playback start time, a playback duration, and a pitch value of each note;
创建音乐时间序列,进行自回归积分滑动平均ARIMA模型预测,得到时间序列预测结果;Create music time series and perform autoregressive integrated moving average ARIMA model prediction to get time series prediction results;
根据所述音符信息和时间序列预测结果,构建音乐预测模型;Construct a music prediction model according to the note information and time series prediction results;
确定音乐预测模型的拓扑结构,根据所述音符信息和时间序列预测结果结合音乐预测模型,以及确定的拓扑结构,获得预测出的音乐,实现自动作曲。Determine the topological structure of the music prediction model, combine the music prediction model and the determined topological structure according to the note information and time series prediction results to obtain the predicted music, and realize automatic composition.
一种基于人工智能的作曲系统,包括:A composition system based on artificial intelligence, including:
获取模块,设置为获取音符信息,所述音符信息包括每个音符的播放开始时间、播放持续时间和音高值;An acquisition module, configured to acquire note information, where the note information includes a start time, a playback duration, and a pitch value of each note;
创建模块,设置为创建音乐时间序列,进行自回归积分滑动平均ARIMA模型预测,得到时间序列预测结果;The creation module is set to create a music time series and perform autoregressive integrated moving average ARIMA model prediction to obtain the time series prediction results;
构建模块,设置为根据所述音符信息和时间序列预测结果,构建音乐预测模型;A construction module configured to construct a music prediction model according to the note information and a time series prediction result;
作曲模块,设置为确定音乐预测模型的拓扑结构,根据所述音符信息和时间序列预测结果结合音乐预测模型,以及确定的拓扑结构,获得预测出的音乐,实现自动作曲。The composition module is configured to determine a topology structure of the music prediction model, combine the music prediction model and the determined topology structure according to the note information and the time series prediction result, obtain the predicted music, and implement automatic composition.
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor causes the processor to perform the following steps:
获取音符信息,所述音符信息包括每个音符的播放开始时间、播放持续时间和音高值;Acquiring note information, where the note information includes a playback start time, a playback duration, and a pitch value of each note;
创建音乐时间序列,进行自回归积分滑动平均ARIMA模型预测,得到时间序列预测结果;Create music time series and perform autoregressive integrated moving average ARIMA model prediction to get time series prediction results;
根据所述音符信息和时间序列预测结果,构建音乐预测模型;Construct a music prediction model according to the note information and time series prediction results;
确定音乐预测模型的拓扑结构,根据所述音符信息和时间序列预测结果结合音乐预测模型,以及确定的拓扑结构,获得预测出的音乐,实现自动作曲。Determine the topological structure of the music prediction model, combine the music prediction model and the determined topological structure according to the note information and time series prediction results to obtain the predicted music, and realize automatic composition.
一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:A storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
获取音符信息,所述音符信息包括每个音符的播放开始时间、播放持续时 间和音高值;Acquiring note information, where the note information includes a playback start time, a playback duration time, and a pitch value of each note;
创建音乐时间序列,进行自回归积分滑动平均ARIMA模型预测,得到时间序列预测结果;Create music time series and perform autoregressive integrated moving average ARIMA model prediction to get time series prediction results;
根据所述音符信息和时间序列预测结果,构建音乐预测模型;Construct a music prediction model according to the note information and time series prediction results;
确定音乐预测模型的拓扑结构,根据所述音符信息和时间序列预测结果结合音乐预测模型,以及确定的拓扑结构,获得预测出的音乐,实现自动作曲。Determine the topological structure of the music prediction model, combine the music prediction model and the determined topological structure according to the note information and time series prediction results to obtain the predicted music, and realize automatic composition.
上述基于人工智能的作曲方法、系统、计算机设备和存储介质,通过获取音符信息,所述音符信息包括每个音符的播放开始时间、播放持续时间和音高值,创建音乐时间序列,进行自回归积分滑动平均ARIMA模型预测,得到时间序列预测结果,建立单层长短期记忆网络LSTM,所述单层LSTM含有LSTM区块,所述LSTM区块包含一个门,所述门决定输入是否重要,能否被记住以及能不能被输出,初始状态随机初始化,每个步骤对应一个输入值,所述输入值为每个词对应的词向量,采用Dropout机制随机地删除隐藏层的部分单元,保持输入输出神经元不变,利用多层RNN单元将多个基础LSTM单元汇总为一个,通过门结构控制RNN单元的状态,并向其中删减或增加信息,每次添加一个RNN单元重新调用一个基础LSTM单元,为每一层网络的初始状态设置初始值,初始化权重参数,采用后向传播机制将所述双层网络中权重参数逐层反向调节,通过迭代提高网络的训练精度优化损失函数,构建音乐预测模型,确定音乐预测模型的拓扑结构,根据所述音符信息和时间序列预测结果结合音乐预测模型,以及确定的拓扑结构,获得预测出的音乐,实现自动作曲,制作过程大大简化,用户可以参与创作过,通过不同输入生成不同曲目,不需要中途干预,灵活丰富。The aforementioned artificial intelligence-based composition method, system, computer device, and storage medium obtain musical note information by including the note start time, duration, and pitch value of each note, creating a music time series, and performing autoregressive integration The moving average ARIMA model predicts the time series prediction results, and establishes a single-layer long-term and short-term memory network LSTM. The single-layer LSTM contains LSTM blocks, and the LSTM block contains a gate. The gate determines whether the input is important and whether It is remembered and can be output. The initial state is randomly initialized. Each step corresponds to an input value. The input value is the word vector corresponding to each word. The Dropout mechanism is used to randomly delete parts of the hidden layer and maintain input and output. The neuron is unchanged, and multiple basic LSTM units are summarized into one using a multilayer RNN unit. The state of the RNN unit is controlled through the gate structure, and information is deleted or added to it. Each time an RNN unit is added, a basic LSTM unit is called again. , Set the initial value for the initial state of each layer of the network, initialize the weight parameters, and use The backward propagation mechanism reversely adjusts the weight parameters in the two-layer network layer by layer, iteratively improves the training accuracy of the network, optimizes the loss function, constructs a music prediction model, determines the topological structure of the music prediction model, and according to the note information and time The sequence prediction result is combined with the music prediction model and the determined topological structure to obtain the predicted music and realize automatic composition. The production process is greatly simplified. Users can participate in the creation and generate different songs through different inputs without the need for midway intervention. It is flexible and rich.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the detailed description of the preferred embodiments below. The drawings are only for the purpose of illustrating preferred embodiments and are not to be considered as limiting the present application.
图1为一个实施例中基于人工智能的作曲方法的流程图;FIG. 1 is a flowchart of a composition method based on artificial intelligence in an embodiment; FIG.
图2为一个实施例中据音符信息和时间序列预测结果构建音乐预测模型的 流程图;2 is a flowchart of constructing a music prediction model based on note information and time series prediction results in an embodiment;
图3为一个实施例中建立单层长短期记忆网络LSTM采用Dropout机制建立隐藏层的流程图;3 is a flowchart of establishing a single-layer long-term and short-term memory network LSTM using a Dropout mechanism to establish a hidden layer in an embodiment;
图4为一个实施例中根据单层LSTM将单层网络复制成双层网络的流程图;4 is a flowchart of copying a single-layer network into a double-layer network according to a single-layer LSTM according to an embodiment;
图5为一个实施例中基于人工智能的作曲系统的结构框图;5 is a structural block diagram of a composition system based on artificial intelligence in an embodiment;
图6为一个实施例中构建模块的结构框图;6 is a structural block diagram of a building module in an embodiment;
图7为一个实施例中建立单元的结构框图;7 is a structural block diagram of an establishing unit in an embodiment;
图8为一个实施例中复制单元的结构框图。FIG. 8 is a structural block diagram of a duplication unit in an embodiment.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution, and advantages of the present application clearer, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and are not used to limit the application.
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。Those skilled in the art will understand that, unless specifically stated otherwise, the singular forms "a", "an", "the" and "the" may include plural forms. It should be further understood that the word "comprising" used in the specification of the present application refers to the presence of the described features, integers, steps, operations, elements and / or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and / or groups thereof.
作为一个较好的实施例,如图1所示,一种基于人工智能的作曲方法,该基于人工智能的作曲方法包括以下步骤:As a better embodiment, as shown in FIG. 1, an artificial intelligence-based composition method includes the following steps:
步骤S101,获取音符信息,音符信息包括每个音符的播放开始时间、播放持续时间和音高值;Step S101: Acquire note information, where the note information includes a play start time, a play duration, and a pitch value of each note;
长短记忆网络(LSTM,Long Short-Term Memory)是循环神经网络RNN的一种,能够解决传统循环神经网络中的长期“记忆”无法利用的问题,具有远距离学习的能力。本技术方案提供了一种基于双通道长短记忆网络(LSTM)的人工智能作曲系统,在采用音高和时长两个维度来表达音乐特征的基础上进行智能创作,实现了对系统输入简单的音符后,机器就能根据音符的排列特点进行学习,并编写出更完整丰富的曲子,模拟音乐家演奏出来,演奏的歌曲像灵动不呆板。获取音符信息,提取对应的帧级音频特征,然后获得模块根据上述 帧级音频特征和预先构建的音乐频带特征结合模型,获得携带频带信息的帧级音频特征,以及根据上述携带频带信息的帧级音频特征和预先构建的音乐预测模型,获得预测出的音乐,从而可以实现自动作曲。Long Short-Term Memory (LSTM) is a kind of RNN, which can solve the problem that long-term "memory" cannot be used in traditional recurrent neural networks, and has the ability of long-distance learning. This technical solution provides an artificial intelligence composition system based on a two-channel long-short memory network (LSTM). Intelligent composition is based on the use of two dimensions: pitch and duration to express music characteristics, and realizes simple input of notes to the system. After that, the machine can learn according to the characteristics of the arrangement of the notes, and write a more complete and rich tune, which simulates the performance of a musician, and the song played is like agile and rigid. Obtain note information, extract corresponding frame-level audio features, and then obtain a module based on the above-mentioned combination of frame-level audio features and a pre-built music band feature model to obtain frame-level audio features that carry frequency band information, and frame-level audio features based on the aforementioned band-frequency information Audio features and pre-built music prediction models to obtain predicted music, so that automatic composition can be achieved.
步骤S102,创建音乐时间序列,进行自回归积分滑动平均ARIMA模型预测,得到时间序列预测结果;Step S102: Create a music time series, perform autoregressive integrated moving average ARIMA model prediction, and obtain a time series prediction result;
音乐时间序列模型平稳的方法一般根据检验模型去检验。当然如果时间序列不稳定,也可以通过一些操作去使得时间序列稳定(比如取对数,差分),然后进行ARIMA模型预测,得到稳定的时间序列的预测结果。ARIMA模型全称为自回归积分滑动平均模型(Autoregressive Integrated Moving Average Model)将预测对象随时间推移而形成的数据序列视为一个随机序列,用一定的数学模型来近似描述这个序列。这个模型一旦被识别后就可以从时间序列的过去值及现在值来预测未来值。The method of music time series model stationary is generally tested according to the test model. Of course, if the time series is unstable, you can also use some operations to make the time series stable (such as taking logarithms and differences), and then perform ARIMA model prediction to obtain stable time series prediction results. The full name of the ARIMA model is the Autoregressive Integrated Moving Average Model (Autoregressive Integrated Moving Average Model). The data sequence formed by the predicted object over time is regarded as a random sequence, and a certain mathematical model is used to approximate this sequence. Once this model is identified, future values can be predicted from past and present values of the time series.
步骤S103,根据音符信息和时间序列预测结果,构建音乐预测模型;Step S103: construct a music prediction model according to the note information and the time series prediction result;
本技术方案所采用双通道长短记忆网络(LSTM),其优势在于能够“聪明”的忘掉那些用不上的长期记忆,学习新的有用的信息,并将有用的信息再次存入长期记忆中。比如之前演奏的乐章,对现在没有影响了,就把乐章信息去除,把最新的乐章里的旋律记录到“长期记忆”中。比如,假设LSTM的输入是xt,输出是yt,输入层到隐藏层的权重为W,隐藏层到隐藏层的权重为U,LSTM有记忆能力,就是通过隐藏层之间的权重将以往的输入状态进行总结,来作为下次输入的辅助。The advantage of the dual-channel long-short memory network (LSTM) used in this technical solution is that it can "smartly" forget long-term memories that are not needed, learn new and useful information, and store useful information in long-term memory again. For example, the previously played movements have no effect on the present, so the movement information is removed, and the melody in the latest movement is recorded in the "long-term memory". For example, suppose the input of the LSTM is xt, the output is yt, the weight of the input layer to the hidden layer is W, and the weight of the hidden layer to the hidden layer is U. The status is summarized as an aid for the next input.
步骤S104,确定音乐预测模型的拓扑结构,根据音符信息和时间序列预测结果结合音乐预测模型,以及确定的拓扑结构,获得预测出的音乐,实现自动作曲。Step S104: Determine the topological structure of the music prediction model, combine the music prediction model and the determined topological structure according to the note information and the time series prediction result, obtain the predicted music, and implement automatic composition.
确定音乐预测模型的拓扑结构,根据音符信息和时间序列预测结果结合音乐预测模型,以及确定的拓扑结构,获得预测出的音乐,实现自动作曲。拓扑结构为一个对冲的神经网络结构,本实施例以对冲循环神经网络(Recurrent Neural Networks,RNN)为例,其拓扑结构包括两个独立的RNN和一个连接单元,两个独立的RNN,分别取名为LF_RNN和HF_RNN,分别用于低频段多频率特征结 合和高频段多频率特征结合。提取上述音乐文件对应音乐的帧级音频特征,然后获得模块根据上述帧级音频特征和预先构建的音乐频带特征结合模型,获得携带频带信息的帧级音频特征,以及根据上述携带频带信息的帧级音频特征和预先构建的音乐预测模型,获得预测出的音乐,从而可以实现自动作曲。Determine the topological structure of the music prediction model, combine the music prediction model and the determined topological structure according to the note information and time series prediction results to obtain the predicted music, and realize automatic composition. The topology is a hedged neural network structure. In this embodiment, a recurrent neural network (RNN) is used as an example. The topology includes two independent RNNs and a connection unit, and two independent RNNs. They are named LF_RNN and HF_RNN, which are used for low frequency multi-frequency feature combination and high frequency multi-frequency feature combination, respectively. Extract the frame-level audio characteristics of the music corresponding to the music file, and then obtain the module to obtain the frame-level audio characteristics carrying the frequency band information according to the combination model of the above-mentioned frame-level audio features and the pre-built music frequency band characteristics, and the frame-level audio information according to the above-mentioned frequency band Audio features and pre-built music prediction models to obtain predicted music, so that automatic composition can be achieved.
在其中一个实施例中,进行自回归积分滑动平均ARIMA模型预测包括:根据自相关函数和偏自相关函数检验音频、音帧及其变化规律,对时间序列的平稳性进行识别;对非平稳时间序列进行平稳化处理;根据时间序列的识别规则,选用ARIMA模型,进行ARIMA模型的参数估计;进行假设检验,诊断残差序列是否为白噪声;利用已通过检验的模型进行预测分析。In one embodiment, performing the autoregressive integrated moving average ARIMA model prediction includes: checking the audio, sound frame, and their changing laws according to the autocorrelation function and partial autocorrelation function to identify the stability of the time series; non-stationary time The sequence is smoothed. According to the recognition rules of the time series, the ARIMA model is used to estimate the parameters of the ARIMA model. Hypothesis testing is performed to diagnose whether the residual sequence is white noise. Prediction analysis is performed using the tested model.
ARIMA模型预测程序为:根据自相关函数和偏自相关函数检验音频、音帧及其变化规律,对时间序列的平稳性进行识别;对非平稳时间序列进行平稳化处理;根据时间序列的识别规则,选用ARIMA模型,进行ARIMA模型的参数估计;进行假设检验,诊断残差序列是否为白噪声;利用已通过检验的模型进行预测分析。The ARIMA model prediction program is: testing the audio, sound frame, and their changes according to the autocorrelation function and partial autocorrelation function to identify the stability of the time series; smoothing the non-stationary time series; according to the recognition rules of the time series The ARIMA model is selected for parameter estimation of the ARIMA model; a hypothesis test is performed to diagnose whether the residual sequence is white noise; and a predictive analysis is performed using the tested model.
如图2所示,在一个实施例中,根据音符信息和时间序列预测结果,构建音乐预测模型包括:As shown in FIG. 2, in one embodiment, constructing a music prediction model according to the note information and the time series prediction result includes:
步骤S201,建立单层长短期记忆网络LSTM,采用Dropout机制建立隐藏层;Step S201: Establish a single-layer long-term and short-term memory network LSTM, and use the Dropout mechanism to establish a hidden layer;
LSTM长短期记忆是一种时间递归神经网络,含有LSTM区块(blocks)或其他的一种类神经网络,LSTM区块可能被描述成智能网络单元,因为它可以记忆不定时间长度的数值,区块中有一个门(gate)能够决定输入是否重要到能被记住及能不能被输出。初始状态随机初始化,每个步骤一个输入,实际输入为每个词对应的词向量。隐藏状态,隐藏节点数,可以去到最后一步骤的状态作为输出,也可以对所有步骤的状态加权或者直接平均作为输出,根据具体任务灵活调整。LSTM long-term and short-term memory is a kind of time-recurrent neural network that contains LSTM blocks (blocks) or other types of neural networks. LSTM blocks may be described as intelligent network units, because it can memorize values of indefinite time length. There is a gate that determines whether the input is important enough to be remembered and can be output. The initial state is initialized randomly, one input per step, and the actual input is the word vector corresponding to each word. Hidden state, the number of hidden nodes, can go to the state of the last step as the output, can also weight or directly average the state of all steps as the output, and flexibly adjust according to the specific task.
步骤S202,根据单层LSTM,将单层网络复制成双层网络;Step S202: copy the single-layer network into a double-layer network according to the single-layer LSTM;
LSTM具有一连串重复神经网络模块的形式,重复模块有着不同的结构。它有四层神经网络层以特别的方式相互作用。水平线代表单元状态,有线性相互作用,可以保证信息往下传递。有选择的让信息通过,由sigmoid神经网络层 和逐点乘法运算组成。sigmoid层将变量映射到0与1之间,描述了每个成分是否应该通过门限。sigmoid层,sigmoid是神经网络算法选用sigmoid函数作为激活函数,为了为了描述神经网络之用,Sigmoid对每个输入数据,利用sigmoid函数执行操作,也可以选择双曲正切函数(tanh)。预设0代表“不让任何成分通过”,而1则代表“让成分通过”。LSTM有三种类似的门限,分别是决定哪些信息需要从单元状态中舍弃的“遗忘门限sigmoid层”、决定在单元状态中需要存储哪些新信息的“输入门限层”和将值添加到状态的“tanh层”、决定单元状态中哪些部分需要输出的sigmoid层和tanh函数。而双通道长短记忆网络(LSTM),就是两个LSTM网络一起使用,可以增强神经网络的“学习”能力,让智能系统充分明白地“学会”音乐里的特点,学到音乐的演奏风格。对于一些复杂的序列,需要用到多层的网络。建立模型首先利用多层RNN Cell将多个基础LSTM单元汇总为一个,LSTM通过一种名为门(gate)的结构控制cell的状态,并向其中删减或增加信息。值得注意的是,每次添加一个单元需要重新调用一基础LSTM单元。因为该函数每次都会声明一次内部变量,如果不这么做则会reuse这些变量,从而产生错误。为每一层的初始状态设置初始值。也可以采用zero_state方法生成初始值,但是这样就不能对中间状态进行显示控制。具体根据实际应用选择。多层LSTM比单层更优秀,该趋势符合数据维度变大时,非线性因素增加,需要多层神经网络建模的一般规律。LSTM has the form of a series of repeating neural network modules, which have different structures. It has four neural network layers that interact in a special way. The horizontal line represents the state of the unit and has a linear interaction, which can ensure that the information is passed down. Selectively let information pass, consisting of a sigmoid neural network layer and pointwise multiplication operations. The sigmoid layer maps variables between 0 and 1 and describes whether each component should pass a threshold. Sigmoid layer. Sigmoid is a neural network algorithm that uses the sigmoid function as the activation function. In order to describe the neural network, Sigmoid uses the sigmoid function to perform operations on each input data. You can also choose the hyperbolic tangent function (tanh). The preset 0 means "don't let any ingredient pass", and 1 means "let ingredients pass". LSTM has three similar thresholds: the "forgotten threshold sigmoid layer" that determines which information needs to be discarded from the unit state, the "input threshold layer" that determines what new information needs to be stored in the unit state, and the "input threshold layer" that adds values to the state tanh layer ", sigmoid layer and tanh function that determine which parts of the unit state need to be output. The dual-channel long and short memory network (LSTM), which is the use of two LSTM networks together, can enhance the "learning" ability of the neural network, so that the intelligent system can fully "learn" the characteristics of music and learn the style of music performance. For some complex sequences, multiple layers of networks are required. To build a model, first use multiple layers of RNN cells to aggregate multiple basic LSTM cells into one. The LSTM controls the state of the cell through a structure called a gate, and deletes or adds information to it. It is worth noting that each time a unit is added, a base LSTM unit needs to be recalled. Because the function declares internal variables once every time, if you don't do so, these variables will be reused, resulting in an error. Set an initial value for the initial state of each layer. You can also use the zero_state method to generate the initial value, but then you cannot display the intermediate state. Choose according to the actual application. Multi-layer LSTMs are better than single layers. This trend is in line with the general law of multi-layer neural network modeling when the data dimension becomes larger and nonlinear factors increase.
步骤S203,初始化权重参数,采用后向传播机制将双层网络中权重参数逐层反向调节,通过迭代提高网络的训练精度优化损失函数,构建音乐预测模型。In step S203, the weight parameters are initialized, the back-propagation mechanism is used to adjust the weight parameters in the double-layer network layer by layer, and iteratively improves the training accuracy of the network to optimize the loss function and construct a music prediction model.
在开始训练前,所有的权重都会用一些不同的小随机数进行初始化,该网络模型是通过梯度下降法最小化损失函数对网络中的权重参数逐层反向调节,通过频繁的迭代来训练提高网络的精度。数据预处理,参数初始化,批量标准化BN正则化,随机失活Dropout,损失函数。分类问题,回归问题,梯度检查。模型检查训练前,训练中,参数更新。在反向传播中计算出同样的梯度,从而进行同样的参数更新,从代价函数的角度来说,参数初始化又不能太大,所以我们希望权重初始值要非常接近0又不能等于0。Before starting training, all the weights will be initialized with some different small random numbers. The network model uses the gradient descent method to minimize the loss function to adjust the weight parameters in the network layer by layer, and iteratively trains and improves Network accuracy. Data preprocessing, parameter initialization, batch normalized BN regularization, random inactivation of Dropout, loss function. Classification problems, regression problems, gradient checking. Before the model check training, during the training, the parameters are updated. The same gradient is calculated in the back-propagation to update the same parameters. From the perspective of the cost function, the parameter initialization cannot be too large, so we want the initial value of the weight to be very close to 0 and not equal to 0.
如图3所示,在一个实施例中,建立单层长短期记忆网络LSTM,采用Dropout 机制建立隐藏层包括:As shown in FIG. 3, in one embodiment, establishing a single-layer long-term and short-term memory network LSTM, and using the Dropout mechanism to establish a hidden layer include:
步骤S301,建立单层长短期记忆网络LSTM,单层LSTM含有LSTM区块,LSTM区块包含一个门,门决定输入是否重要,能否被记住以及能不能被输出;Step S301: Establish a single-layer long-term and short-term memory network LSTM. The single-layer LSTM contains LSTM blocks. The LSTM block contains a gate. The gate determines whether the input is important, whether it can be remembered, and whether it can be output.
神经网络其训练流程是将输入通过网络进行正向传导,然后将误差进行反向传播。Dropout机制就是针对这一过程中,随机地删除隐藏层的部分单元,进行上述过程。综合而言,上述过程可以包括,随机删除网络中的一些隐藏神经元,保持输入输出神经元不变;将输入通过修改后的网络进行前向传播,然后将误差通过修改后的网络进行反向传播;对于另外一批的训练样本,重复上述操作,达到了一种Vote机制的作用。对于全连接神经网络而言,我们用相同的数据去训练5个不同的神经网络可能会得到多个不同的结果,我们可以通过一种vote机制来决定多票者胜出,因此相对而言提升了网络的精度与鲁棒性。对于单个神经网络而言,如果我们将其进行分批,虽然不同的网络可能会产生不同程度的过拟合,但是将其公用一个损失函数,相当于对其同时进行了优化,取了平均,因此可以较为有效地防止过拟合的发生。减少神经元之间复杂的共适应性。当隐藏层神经元被随机删除之后,使得全连接网络具有了一定的稀疏化,从而有效地减轻了不同特征的协同效应。也就是说,有些特征可能会依赖于固定关系的隐含节点的共同作用,而通过Dropout机制的话,就有效地组织了某些特征在其他特征存在下才有效果的情况,增加了神经网络的鲁棒性。鲁棒性(Robust),也就是健壮和强壮的意思,它是在异常和危险情况下系统生存的关键。比如说,计算机软件在输入错误、磁盘故障、网络过载或有意攻击情况下,能否不死机、不崩溃,就是该软件的鲁棒性。所谓“鲁棒性”,是指控制系统在一定(结构,大小)的参数摄动下,维持其它某些性能的特性。根据对性能的不同定义,可分为稳定鲁棒性和性能鲁棒性。以闭环系统的鲁棒性作为目标设计得到的固定控制器称为鲁棒控制器。The training process of a neural network is to forward the input through the network and then back-propagate the error. The Dropout mechanism is aimed at this process, randomly deleting some units of the hidden layer and performing the above process. In summary, the above process may include randomly deleting some hidden neurons in the network, keeping the input and output neurons unchanged; transmitting the input forward through the modified network, and then reversing the error through the modified network Propagation; for another batch of training samples, the above operation was repeated to achieve the role of a Vote mechanism. For a fully connected neural network, we can use the same data to train 5 different neural networks may get multiple different results, we can use a vote mechanism to determine the winner of multiple votes, so it is relatively improved Network accuracy and robustness. For a single neural network, if we batch it, although different networks may produce different degrees of overfitting, sharing it with a loss function is equivalent to optimizing them at the same time and taking an average. Therefore, overfitting can be prevented more effectively. Reduce complex co-adaptation between neurons. When the hidden layer neurons are randomly deleted, the fully connected network has a certain sparseness, which effectively reduces the synergistic effect of different features. In other words, some features may depend on the combined action of the hidden nodes of the fixed relationship. With the Dropout mechanism, the situation where some features are effective only in the presence of other features is effectively organized. Robustness. Robustness, which means robust and strong, is the key to system survival in abnormal and dangerous situations. For example, whether the computer software does not crash or crash under input errors, disk failures, network overload, or intentional attacks is the robustness of the software. The so-called "robustness" means that the control system maintains certain other performance characteristics under a certain (structure, size) parameter perturbation. According to different definitions of performance, it can be divided into stable robustness and performance robustness. The fixed controller designed with the robustness of the closed-loop system as the target is called a robust controller.
步骤S302,初始状态随机初始化,每个步骤对应一个输入值,输入值为每个词对应的词向量;Step S302, the initial state is initialized randomly, each step corresponds to an input value, and the input value is a word vector corresponding to each word;
初始状态随机初始化,每个步骤一个输入,实际输入为每个词对应的词向量。隐藏状态,隐藏节点数,可以去到最后一步骤的状态作为输出,也可以对 所有步骤的状态加权或者直接平均作为输出,根据具体任务灵活调整。The initial state is initialized randomly, one input per step, and the actual input is the word vector corresponding to each word. Hidden state, the number of hidden nodes, can go to the state of the last step as the output, and can also weight or directly average the state of all steps as the output, and flexibly adjust according to the specific task.
步骤S303,采用Dropout机制随机地删除隐藏层的部分单元,保持输入输出神经元不变。In step S303, a Dropout mechanism is used to randomly delete some units in the hidden layer, keeping the input and output neurons unchanged.
Dropout机制就是针对这一过程中,随机地删除隐藏层的部分单元,进行上述过程。综合而言,上述过程可以包括,随机删除网络中的一些隐藏神经元,保持输入输出神经元不变;将输入通过修改后的网络进行前向传播,然后将误差通过修改后的网络进行反向传播;对于另外一批的训练样本,重复上述操作,达到了一种Vote机制的作用。The Dropout mechanism is aimed at this process, randomly deleting some units of the hidden layer and performing the above process. In summary, the above process may include randomly deleting some hidden neurons in the network, keeping the input and output neurons unchanged; transmitting the input forward through the modified network, and then reversing the error through the modified network Propagation; for another batch of training samples, the above operation was repeated to achieve the role of a Vote mechanism.
如图4所示,在一个实施例中,根据单层LSTM,将单层网络复制成双层网络包括:As shown in FIG. 4, in one embodiment, replicating a single-layer network into a dual-layer network according to a single-layer LSTM includes:
步骤S401,利用多层RNN单元将多个基础LSTM单元汇总为一个;In step S401, a plurality of basic LSTM units are aggregated into one by using a multilayer RNN unit;
步骤S402,通过门结构控制RNN单元的状态,并向其中删减或增加信息,每次添加一个RNN单元重新调用一个基础LSTM单元;In step S402, the state of the RNN unit is controlled through the gate structure, and information is deleted or added to it, and a basic LSTM unit is called again each time an RNN unit is added;
步骤S403,为每一层网络的初始状态设置初始值。In step S403, an initial value is set for an initial state of each layer of the network.
如图5所示,在一个实施例中,提供了一种基于人工智能的作曲系统,包括:获取模块,设置为获取音符信息,所述音符信息包括每个音符的播放开始时间、播放持续时间和音高值;创建模块,设置为创建音乐时间序列,进行自回归积分滑动平均ARIMA模型预测,得到时间序列预测结果;构建模块,设置为根据所述音符信息和时间序列预测结果,构建音乐预测模型;作曲模块,设置为确定音乐预测模型的拓扑结构,根据所述音符信息和时间序列预测结果结合音乐预测模型,以及确定的拓扑结构,获得预测出的音乐,实现自动作曲。As shown in FIG. 5, in one embodiment, an artificial intelligence-based composition system is provided, including: an acquisition module configured to acquire note information, where the note information includes a playback start time and a playback duration of each note And pitch values; a creation module, set to create a music time series, and performing an autoregressive integral moving average ARIMA model prediction to obtain a time series prediction result; a construction module, set to construct a music prediction model based on the note information and the time series prediction result A composition module, configured to determine the topology of the music prediction model, combining the music prediction model and the determined topology according to the note information and time series prediction results, to obtain the predicted music, and realize automatic composition.
在一个实施例中,所述创建模块还包括:识别单元,设置为根据自相关函数和偏自相关函数检验音频、音帧及其变化规律,对时间序列的平稳性进行识别;平稳化处理单元,设置为对非平稳时间序列进行平稳化处理;参数估计单元,设置为根据时间序列的识别规则,选用ARIMA模型,进行ARIMA模型的参数估计;诊断单元,设置为进行假设检验,诊断残差序列是否为白噪声;预测单元,设置为利用已通过检验的模型进行预测分析。In one embodiment, the creation module further includes: a recognition unit configured to test the audio, sound frame, and its variation rule according to the auto-correlation function and the partial auto-correlation function to identify the smoothness of the time series; the smoothing processing unit , Set to smooth the non-stationary time series; parameter estimation unit, set to select the ARIMA model to estimate the parameters of the ARIMA model according to the recognition rules of the time series; diagnostic unit, set to perform hypothesis testing, and diagnose the residual sequence Whether it is white noise; the prediction unit is set to perform prediction analysis using a model that has passed the test.
如图6所示,在一个实施例中,所述构建模块还包括:建立单元,设置为 建立单层长短期记忆网络LSTM,采用Dropout机制建立隐藏层;复制单元,设置为根据所述单层LSTM,将单层网络复制成双层网络;优化单元,设置为初始化权重参数,采用后向传播机制将所述双层网络中权重参数逐层反向调节,通过迭代提高网络的训练精度优化损失函数,构建音乐预测模型。As shown in FIG. 6, in one embodiment, the building module further includes: a building unit configured to build a single-layer long-term and short-term memory network LSTM, and a Dropout mechanism to establish a hidden layer; a replication unit configured to be based on the single layer LSTM, which replicates a single-layer network into a double-layer network; the optimization unit is set to initialize the weight parameters, and the back-propagation mechanism is used to adjust the weight parameters in the double-layer network inversely layer by layer to improve the training accuracy of the network and optimize the loss Function to build a music prediction model.
如图7所示,在一个实施例中,所述建立单元还包括:建立子单元,设置为建立单层长短期记忆网络LSTM,所述单层LSTM含有LSTM区块,所述LSTM区块包含一个门,所述门决定输入是否重要,能否被记住以及能不能被输出;对应子单元,设置为初始状态随机初始化,每个步骤对应一个输入值,所述输入值为每个词对应的词向量;删除子单元,设置为采用Dropout机制随机地删除隐藏层的部分单元,保持输入输出神经元不变。As shown in FIG. 7, in one embodiment, the establishing unit further includes: a establishing subunit configured to establish a single-layer long-term and short-term memory network LSTM, where the single-layer LSTM includes an LSTM block, and the LSTM block includes A gate that determines whether the input is important, whether it can be remembered, and whether it can be output; the corresponding subunit is set to the initial state and randomly initialized, each step corresponds to an input value, and the input value corresponds to each word Word vector; delete subunits, set to use the Dropout mechanism to randomly delete parts of the hidden layer, keeping the input and output neurons unchanged.
如图8所示,所述复制单元还包括:汇总子单元,设置为利用多层RNN单元将多个基础LSTM单元汇总为一个;删减子单元,设置为通过门结构控制RNN单元的状态,并向其中删减或增加信息,每次添加一个RNN单元重新调用一个基础LSTM单元;设置子单元,设置为为每一层网络的初始状态设置初始值。As shown in FIG. 8, the duplication unit further includes: a summary sub-unit configured to use a multi-layer RNN unit to summarize a plurality of basic LSTM units into one; a deletion sub-unit configured to control a state of the RNN unit through a gate structure, And delete or add information to it, add an RNN unit each time to recall a basic LSTM unit; set sub-units, set to set initial values for the initial state of each layer of the network.
在一个实施例中,提出了一种计算机设备,所述计算机设备包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行所述计算机程序时实现上述各实施例里基于人工智能的作曲方法中的步骤。In one embodiment, a computer device is provided. The computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor causes the processor to execute the computer program. To implement the steps in the composition method based on artificial intelligence in the above embodiments at times.
在一个实施例中,提出了一种存储有计算机可读指令的存储介质,该计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述各实施例里基于人工智能的作曲方法中的步骤。其中,存储介质可以为非易失性存储介质。In one embodiment, a storage medium storing computer-readable instructions is provided. When the computer-readable instructions are executed by one or more processors, the one or more processors are caused to perform manual-based operations in the foregoing embodiments. Steps in a smart composition method. The storage medium may be a non-volatile storage medium.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。A person of ordinary skill in the art may understand that all or part of the steps in the various methods of the foregoing embodiments may be implemented by a program instructing related hardware. The program may be stored in a computer-readable storage medium. The storage medium may include: Read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks, etc.
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技 术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the embodiments described above can be arbitrarily combined. In order to simplify the description, all possible combinations of the technical features in the above embodiments have not been described. However, as long as there is no contradiction in the combination of these technical features, It should be considered as the scope described in this specification.
以上所述实施例仅表达了本申请一些示例性实施例,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express some exemplary embodiments of the present application, and their descriptions are more specific and detailed, but cannot be understood as a limitation on the scope of the patent of the present application. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, several modifications and improvements can be made, and these all belong to the protection scope of the present application. Therefore, the protection scope of this application patent shall be subject to the appended claims.

Claims (20)

  1. 一种基于人工智能的作曲方法,包括:A composition method based on artificial intelligence, including:
    获取音符信息,所述音符信息包括每个音符的播放开始时间、播放持续时间和音高值;Acquiring note information, where the note information includes a playback start time, a playback duration, and a pitch value of each note;
    创建音乐时间序列,进行自回归积分滑动平均ARIMA模型预测,得到时间序列预测结果;Create music time series and perform autoregressive integrated moving average ARIMA model prediction to get time series prediction results;
    根据所述音符信息和时间序列预测结果,构建音乐预测模型;Construct a music prediction model according to the note information and time series prediction results;
    确定音乐预测模型的拓扑结构,根据所述音符信息和时间序列预测结果结合音乐预测模型,以及确定的拓扑结构,获得预测出的音乐,实现自动作曲。Determine the topological structure of the music prediction model, combine the music prediction model and the determined topological structure according to the note information and time series prediction results to obtain the predicted music, and realize automatic composition.
  2. 根据权利要求1所述的基于人工智能的作曲方法,其中,所述进行自回归积分滑动平均ARIMA模型预测包括:The composition method based on artificial intelligence according to claim 1, wherein the performing ARIMA model prediction of autoregressive integral moving average comprises:
    根据自相关函数和偏自相关函数检验音频、音帧及其变化规律,对时间序列的平稳性进行识别;Test the audio, sound frame and its changing rules according to the autocorrelation function and partial autocorrelation function to identify the stability of the time series;
    对非平稳时间序列进行平稳化处理;Smoothing non-stationary time series;
    根据时间序列的识别规则,选用ARIMA模型,进行ARIMA模型的参数估计;According to the identification rules of the time series, the ARIMA model is selected for parameter estimation of the ARIMA model;
    进行假设检验,诊断残差序列是否为白噪声;Perform a hypothesis test to diagnose whether the residual sequence is white noise;
    利用已通过检验的模型进行预测分析。Predictive analysis using validated models.
  3. 根据权利要求1所述的基于人工智能的作曲方法,其中,所述根据音符信息和时间序列预测结果,构建音乐预测模型包括:The composition method based on artificial intelligence according to claim 1, wherein the constructing a music prediction model based on the note information and the time series prediction result comprises:
    建立单层长短期记忆网络LSTM,采用Dropout机制建立隐藏层;Establish a single-layer long-term and short-term memory network LSTM, and use the Dropout mechanism to establish a hidden layer;
    根据所述单层LSTM,将单层网络复制成双层网络;Copying a single-layer network into a double-layer network according to the single-layer LSTM;
    初始化权重参数,采用后向传播机制将所述双层网络中权重参数逐层反向调节,通过迭代提高网络的训练精度优化损失函数,构建音乐预测模型。The weight parameters are initialized, the back-propagation mechanism is used to adjust the weight parameters in the two-layer network layer by layer, and iteratively improves the training accuracy of the network to optimize the loss function and construct a music prediction model.
  4. 根据权利要求3所述的基于人工智能的作曲方法,其中,所述建立单层长短期记忆网络LSTM,采用Dropout机制建立隐藏层包括:The composition method based on artificial intelligence according to claim 3, wherein the establishment of a single-layer long-term and short-term memory network LSTM and the use of a Dropout mechanism to establish a hidden layer comprises:
    建立单层长短期记忆网络LSTM,所述单层LSTM含有LSTM区块,所述LSTM区块包含一个门,所述门决定输入是否重要,能否被记住以及能不能被输出;Establish a single-layer long-term and short-term memory network LSTM, where the single-layer LSTM contains LSTM blocks, and the LSTM block contains a gate that determines whether the input is important, can be remembered, and can be output;
    初始状态随机初始化,每个步骤对应一个输入值,所述输入值为每个词对应的词向量;The initial state is randomly initialized, each step corresponds to an input value, and the input value is a word vector corresponding to each word;
    采用Dropout机制随机地删除隐藏层的部分单元,保持输入输出神经元不变。The Dropout mechanism is used to randomly delete some units of the hidden layer, keeping the input and output neurons unchanged.
  5. 根据权利要求3所述的基于人工智能的作曲方法,其中,所述根据所述单层LSTM,将单层网络复制成双层网络包括:The composition method based on artificial intelligence according to claim 3, wherein the replicating a single-layer network into a double-layer network according to the single-layer LSTM comprises:
    利用多层RNN单元将多个基础LSTM单元汇总为一个;Use multiple RNN units to summarize multiple basic LSTM units into one;
    通过门结构控制RNN单元的状态,并向其中删减或增加信息,每次添加一个RNN单元重新调用一个基础LSTM单元;Control the state of the RNN unit through the gate structure, delete or add information to it, and add an RNN unit each time to recall a basic LSTM unit;
    为每一层网络的初始状态设置初始值。Set initial values for the initial state of each layer of the network.
  6. 一种基于人工智能的作曲系统,包括:A composition system based on artificial intelligence, including:
    获取模块,设置为获取音符信息,所述音符信息包括每个音符的播放开始时间、播放持续时间和音高值;An acquisition module, configured to acquire note information, where the note information includes a start time, a playback duration, and a pitch value of each note;
    创建模块,设置为创建音乐时间序列,进行自回归积分滑动平均ARIMA模型预测,得到时间序列预测结果;The creation module is set to create a music time series and perform autoregressive integrated moving average ARIMA model prediction to obtain the time series prediction results;
    构建模块,设置为根据所述音符信息和时间序列预测结果,构建音乐预测模型;A construction module configured to construct a music prediction model according to the note information and a time series prediction result;
    作曲模块,设置为确定音乐预测模型的拓扑结构,根据所述音符信息和时间序列预测结果结合音乐预测模型,以及确定的拓扑结构,获得预测出的音乐,实现自动作曲。The composition module is configured to determine a topology structure of the music prediction model, combine the music prediction model and the determined topology structure according to the note information and the time series prediction result, obtain the predicted music, and implement automatic composition.
  7. 根据权利要求6所述的基于人工智能的作曲系统,其中,所述创建模块还包括:The composition system based on artificial intelligence according to claim 6, wherein the creation module further comprises:
    识别单元,设置为根据自相关函数和偏自相关函数检验音频、音帧及其变化规律,对时间序列的平稳性进行识别;The recognition unit is configured to check the audio, sound frame and its change rule according to the autocorrelation function and the partial autocorrelation function, to identify the stability of the time series;
    平稳化处理单元,设置为对非平稳时间序列进行平稳化处理;The smoothing processing unit is set to smooth the non-stationary time series;
    参数估计单元,设置为根据时间序列的识别规则,选用ARIMA模型,进行ARIMA模型的参数估计;The parameter estimation unit is set to select the ARIMA model to perform parameter estimation of the ARIMA model according to the recognition rules of the time series;
    诊断单元,设置为进行假设检验,诊断残差序列是否为白噪声;A diagnosis unit configured to perform a hypothesis test to diagnose whether the residual sequence is white noise;
    预测单元,设置为利用已通过检验的模型进行预测分析。Prediction unit, set to perform predictive analysis using a model that has passed the test.
  8. 根据权利要求6所述的基于人工智能的作曲系统,其中,所述构建模块 还包括:The composition system based on artificial intelligence according to claim 6, wherein the building module further comprises:
    建立单元,设置为建立单层长短期记忆网络LSTM,采用Dropout机制建立隐藏层;The establishment unit is set to establish a single-layer long-term and short-term memory network LSTM, and a Dropout mechanism is used to establish a hidden layer;
    复制单元,设置为根据所述单层LSTM,将单层网络复制成双层网络;A replication unit configured to replicate a single-layer network into a double-layer network according to the single-layer LSTM;
    优化单元,设置为初始化权重参数,采用后向传播机制将所述双层网络中权重参数逐层反向调节,通过迭代提高网络的训练精度优化损失函数,构建音乐预测模型。The optimization unit is set to initialize the weight parameters, and the backward propagation mechanism is used to adjust the weight parameters in the two-layer network layer by layer. The iteratively improves the training accuracy of the network and optimizes the loss function to construct a music prediction model.
  9. 根据权利要求8所述的基于人工智能的作曲系统,其中,所述建立单元还包括:The composition system based on artificial intelligence according to claim 8, wherein the establishing unit further comprises:
    建立子单元,设置为建立单层长短期记忆网络LSTM,所述单层LSTM含有LSTM区块,所述LSTM区块包含一个门,所述门决定输入是否重要,能否被记住以及能不能被输出;The establishment subunit is set up to establish a single-layer long-term and short-term memory network LSTM. The single-layer LSTM contains an LSTM block, and the LSTM block contains a gate. The gate determines whether the input is important, can be remembered, and can Be output
    对应子单元,设置为初始状态随机初始化,每个步骤对应一个输入值,所述输入值为每个词对应的词向量;The corresponding subunit is set to be initialized randomly in an initial state, and each step corresponds to an input value, and the input value is a word vector corresponding to each word;
    删除子单元,设置为采用Dropout机制随机地删除隐藏层的部分单元,保持输入输出神经元不变。The deletion subunit is set to use the Dropout mechanism to randomly delete some of the hidden layer units, keeping the input and output neurons unchanged.
  10. 根据权利要求8所述的基于人工智能的作曲系统,其中,所述复制单元还包括:The composition system based on artificial intelligence according to claim 8, wherein the reproduction unit further comprises:
    汇总子单元,设置为利用多层RNN单元将多个基础LSTM单元汇总为一个;Aggregation subunit, set to use multiple RNN units to aggregate multiple basic LSTM units into one;
    删减子单元,设置为通过门结构控制RNN单元的状态,并向其中删减或增加信息,每次添加一个RNN单元重新调用一个基础LSTM单元;The deletion subunit is set to control the state of the RNN unit through the gate structure, and delete or add information to it, and each time an RNN unit is added, a basic LSTM unit is called again;
    设置子单元,设置为为每一层网络的初始状态设置初始值。Set the sub-units to set initial values for the initial state of each layer of the network.
  11. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor causes the processor to perform the following steps:
    获取音符信息,所述音符信息包括每个音符的播放开始时间、播放持续时间和音高值;Acquiring note information, where the note information includes a playback start time, a playback duration, and a pitch value of each note;
    创建音乐时间序列,进行自回归积分滑动平均ARIMA模型预测,得到时间 序列预测结果;Create music time series, and perform autoregressive integrated moving average ARIMA model prediction to obtain time series prediction results;
    根据所述音符信息和时间序列预测结果,构建音乐预测模型;Construct a music prediction model according to the note information and time series prediction results;
    确定音乐预测模型的拓扑结构,根据所述音符信息和时间序列预测结果结合音乐预测模型,以及确定的拓扑结构,获得预测出的音乐,实现自动作曲。Determine the topological structure of the music prediction model, combine the music prediction model and the determined topological structure according to the note information and time series prediction results to obtain the predicted music, and realize automatic composition.
  12. 根据权利要求11所述的计算机设备,其中,所述进行自回归积分滑动平均ARIMA模型预测时,使得所述处理器执行以下步骤:The computer device according to claim 11, wherein the performing the autoregressive integral moving average ARIMA model prediction causes the processor to perform the following steps:
    根据自相关函数和偏自相关函数检验音频、音帧及其变化规律,对时间序列的平稳性进行识别;Test the audio, sound frame and its changing rules according to the autocorrelation function and partial autocorrelation function to identify the stability of the time series;
    对非平稳时间序列进行平稳化处理;Smoothing non-stationary time series;
    根据时间序列的识别规则,选用ARIMA模型,进行ARIMA模型的参数估计;According to the identification rules of the time series, the ARIMA model is selected for parameter estimation of the ARIMA model;
    进行假设检验,诊断残差序列是否为白噪声;Perform a hypothesis test to diagnose whether the residual sequence is white noise;
    利用已通过检验的模型进行预测分析。Predictive analysis using validated models.
  13. 根据权利要求11所述的计算机设备,其中,所述根据音符信息和时间序列预测结果,构建音乐预测模型时,使得所述处理器执行以下步骤:The computer device according to claim 11, wherein when the music prediction model is constructed based on the note information and the time series prediction result, the processor is caused to perform the following steps:
    建立单层长短期记忆网络LSTM,采用Dropout机制建立隐藏层;Establish a single-layer long-term and short-term memory network LSTM, and use the Dropout mechanism to establish a hidden layer;
    根据所述单层LSTM,将单层网络复制成双层网络;Copying a single-layer network into a double-layer network according to the single-layer LSTM;
    初始化权重参数,采用后向传播机制将所述双层网络中权重参数逐层反向调节,通过迭代提高网络的训练精度优化损失函数,构建音乐预测模型。The weight parameters are initialized, the back-propagation mechanism is used to adjust the weight parameters in the two-layer network layer by layer, and iteratively improves the training accuracy of the network to optimize the loss function and construct a music prediction model.
  14. 根据权利要求13所述的计算机设备,其中,所述建立单层长短期记忆网络LSTM,采用Dropout机制建立隐藏层时,使得所述处理器执行以下步骤:The computer device according to claim 13, wherein, when the single-layer long-term short-term memory network (LSTM) is established, and a hidden layer is established by using a Dropout mechanism, the processor is caused to perform the following steps:
    建立单层长短期记忆网络LSTM,所述单层LSTM含有LSTM区块,所述LSTM区块包含一个门,所述门决定输入是否重要,能否被记住以及能不能被输出;Establish a single-layer long-term and short-term memory network LSTM, where the single-layer LSTM contains LSTM blocks, and the LSTM block contains a gate that determines whether the input is important, can be remembered, and can be output;
    初始状态随机初始化,每个步骤对应一个输入值,所述输入值为每个词对应的词向量;The initial state is randomly initialized, each step corresponds to an input value, and the input value is a word vector corresponding to each word;
    采用Dropout机制随机地删除隐藏层的部分单元,保持输入输出神经元不变。The Dropout mechanism is used to randomly delete some units of the hidden layer, keeping the input and output neurons unchanged.
  15. 根据权利要求13所述的计算机设备,其中,所述根据所述单层LSTM,将单层网络复制成双层网络时,使得所述处理器执行以下步骤:The computer device according to claim 13, wherein when the single-layer network is copied into a double-layer network according to the single-layer LSTM, the processor is caused to perform the following steps:
    利用多层RNN单元将多个基础LSTM单元汇总为一个;Use multiple RNN units to summarize multiple basic LSTM units into one;
    通过门结构控制RNN单元的状态,并向其中删减或增加信息,每次添加一个RNN单元重新调用一个基础LSTM单元;Control the state of the RNN unit through the gate structure, delete or add information to it, and add an RNN unit each time to recall a basic LSTM unit;
    为每一层网络的初始状态设置初始值。Set initial values for the initial state of each layer of the network.
  16. 一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:A storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    获取音符信息,所述音符信息包括每个音符的播放开始时间、播放持续时间和音高值;Acquiring note information, where the note information includes a playback start time, a playback duration, and a pitch value of each note;
    创建音乐时间序列,进行自回归积分滑动平均ARIMA模型预测,得到时间序列预测结果;Create music time series and perform autoregressive integrated moving average ARIMA model prediction to get time series prediction results;
    根据所述音符信息和时间序列预测结果,构建音乐预测模型;Construct a music prediction model according to the note information and time series prediction results;
    确定音乐预测模型的拓扑结构,根据所述音符信息和时间序列预测结果结合音乐预测模型,以及确定的拓扑结构,获得预测出的音乐,实现自动作曲。Determine the topological structure of the music prediction model, combine the music prediction model and the determined topological structure according to the note information and time series prediction results to obtain the predicted music, and realize automatic composition.
  17. 根据权利要求16所述的存储介质,其中,所述进行自回归积分滑动平均ARIMA模型预测时,使得一个或多个处理器执行以下步骤:The storage medium according to claim 16, wherein when performing the ARIMA model prediction of the autoregressive integral moving average, one or more processors are executed to perform the following steps:
    根据自相关函数和偏自相关函数检验音频、音帧及其变化规律,对时间序列的平稳性进行识别;Test the audio, sound frame and its changing rules according to the autocorrelation function and partial autocorrelation function to identify the stability of the time series;
    对非平稳时间序列进行平稳化处理;Smoothing non-stationary time series;
    根据时间序列的识别规则,选用ARIMA模型,进行ARIMA模型的参数估计;According to the identification rules of the time series, the ARIMA model is selected for parameter estimation of the ARIMA model;
    进行假设检验,诊断残差序列是否为白噪声;Perform a hypothesis test to diagnose whether the residual sequence is white noise;
    利用已通过检验的模型进行预测分析。Predictive analysis using validated models.
  18. 根据权利要求16所述的存储介质,其中,所述根据音符信息和时间序列预测结果,构建音乐预测模型时,使得一个或多个处理器执行以下步骤:The storage medium according to claim 16, wherein when the music prediction model is constructed based on the note information and the time series prediction result, one or more processors are caused to perform the following steps:
    建立单层长短期记忆网络LSTM,采用Dropout机制建立隐藏层;Establish a single-layer long-term and short-term memory network LSTM, and use the Dropout mechanism to establish a hidden layer;
    根据所述单层LSTM,将单层网络复制成双层网络;Copying a single-layer network into a double-layer network according to the single-layer LSTM;
    初始化权重参数,采用后向传播机制将所述双层网络中权重参数逐层反向调节,通过迭代提高网络的训练精度优化损失函数,构建音乐预测模型。The weight parameters are initialized, the back-propagation mechanism is used to adjust the weight parameters in the two-layer network layer by layer, and iteratively improves the training accuracy of the network to optimize the loss function and construct a music prediction model.
  19. 根据权利要求18所述的存储介质,其中,所述建立单层长短期记忆网 络LSTM,采用Dropout机制建立隐藏层时,使得一个或多个处理器执行:The storage medium according to claim 18, wherein when establishing a single-layer long-term and short-term memory network (LSTM) and using the Dropout mechanism to establish a hidden layer, one or more processors are executed:
    建立单层长短期记忆网络LSTM,所述单层LSTM含有LSTM区块,所述LSTM区块包含一个门,所述门决定输入是否重要,能否被记住以及能不能被输出;Establish a single-layer long-term and short-term memory network LSTM, where the single-layer LSTM contains LSTM blocks, and the LSTM block contains a gate that determines whether the input is important, can be remembered, and can be output;
    初始状态随机初始化,每个步骤对应一个输入值,所述输入值为每个词对应的词向量;The initial state is randomly initialized, each step corresponds to an input value, and the input value is a word vector corresponding to each word;
    采用Dropout机制随机地删除隐藏层的部分单元,保持输入输出神经元不变。The Dropout mechanism is used to randomly delete some units of the hidden layer, keeping the input and output neurons unchanged.
  20. 根据权利要求18所述的存储介质,其中,所述根据所述单层LSTM,将单层网络复制成双层网络时,使得一个或多个处理器执行以下步骤:The storage medium according to claim 18, wherein when the single-layer network is copied into a double-layer network according to the single-layer LSTM, one or more processors are caused to perform the following steps:
    利用多层RNN单元将多个基础LSTM单元汇总为一个;Use multiple RNN units to summarize multiple basic LSTM units into one;
    通过门结构控制RNN单元的状态,并向其中删减或增加信息,每次添加一个RNN单元重新调用一个基础LSTM单元;Control the state of the RNN unit through the gate structure, delete or add information to it, and add an RNN unit each time to recall a basic LSTM unit;
    为每一层网络的初始状态设置初始值。Set initial values for the initial state of each layer of the network.
PCT/CN2018/104715 2018-06-04 2018-09-08 Artificial intelligence-based composing method and system, computer device and storage medium WO2019232959A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810561621.5A CN109192187A (en) 2018-06-04 2018-06-04 Composing method, system, computer equipment and storage medium based on artificial intelligence
CN201810561621.5 2018-06-04

Publications (1)

Publication Number Publication Date
WO2019232959A1 true WO2019232959A1 (en) 2019-12-12

Family

ID=64948568

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/104715 WO2019232959A1 (en) 2018-06-04 2018-09-08 Artificial intelligence-based composing method and system, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN109192187A (en)
WO (1) WO2019232959A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110120212B (en) * 2019-04-08 2023-05-23 华南理工大学 Piano auxiliary composition system and method based on user demonstration audio frequency style
CN110288965B (en) * 2019-05-21 2021-06-18 北京达佳互联信息技术有限公司 Music synthesis method and device, electronic equipment and storage medium
CN111583891B (en) * 2020-04-21 2023-02-14 华南理工大学 Automatic musical note vector composing system and method based on context information
CN114282937A (en) * 2021-11-18 2022-04-05 青岛亿联信息科技股份有限公司 Building economy prediction method and system based on Internet of things

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160379105A1 (en) * 2015-06-24 2016-12-29 Microsoft Technology Licensing, Llc Behavior recognition and automation using a mobile device
CN107481048A (en) * 2017-08-08 2017-12-15 哈尔滨工业大学深圳研究生院 A kind of financial kind price expectation method and system based on mixed model
CN107644630A (en) * 2017-09-28 2018-01-30 清华大学 Melody generation method and device based on neutral net
CN107993636A (en) * 2017-11-01 2018-05-04 天津大学 Music score modeling and generation method based on recurrent neural network

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104952448A (en) * 2015-05-04 2015-09-30 张爱英 Method and system for enhancing features by aid of bidirectional long-term and short-term memory recurrent neural networks
CN107045867B (en) * 2017-03-22 2020-06-02 科大讯飞股份有限公司 Automatic composition method and device and terminal equipment
CN107102969A (en) * 2017-04-28 2017-08-29 湘潭大学 The Forecasting Methodology and system of a kind of time series data
CN107123415B (en) * 2017-05-04 2020-12-18 吴振国 Automatic song editing method and system
CN107121679A (en) * 2017-06-08 2017-09-01 湖南师范大学 Recognition with Recurrent Neural Network predicted method and memory unit structure for Radar Echo Extrapolation
CN107622329A (en) * 2017-09-22 2018-01-23 深圳市景程信息科技有限公司 The Methods of electric load forecasting of Memory Neural Networks in short-term is grown based on Multiple Time Scales
CN107769972B (en) * 2017-10-25 2019-12-10 武汉大学 Power communication network equipment fault prediction method based on improved LSTM

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160379105A1 (en) * 2015-06-24 2016-12-29 Microsoft Technology Licensing, Llc Behavior recognition and automation using a mobile device
CN107481048A (en) * 2017-08-08 2017-12-15 哈尔滨工业大学深圳研究生院 A kind of financial kind price expectation method and system based on mixed model
CN107644630A (en) * 2017-09-28 2018-01-30 清华大学 Melody generation method and device based on neutral net
CN107993636A (en) * 2017-11-01 2018-05-04 天津大学 Music score modeling and generation method based on recurrent neural network

Also Published As

Publication number Publication date
CN109192187A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
WO2019232959A1 (en) Artificial intelligence-based composing method and system, computer device and storage medium
US20210383235A1 (en) Neural networks with subdomain training
JP6817431B2 (en) Neural architecture search
Grover et al. Learning policy representations in multiagent systems
Chen et al. The effect of explicit structure encoding of deep neural networks for symbolic music generation
KR102158683B1 (en) Augmenting neural networks with external memory
US20230214555A1 (en) Simulation Training
CN110825884A (en) Embedded representation processing method and device based on artificial intelligence and electronic equipment
KR20140128384A (en) Methods and apparatus for spiking neural computation
JP7166683B2 (en) Neural Network Speech Recognition Method and System for Domestic Conversation Environment
Mozer Connectionist music composition based on melodic, stylistic, and psychophysical constraints
Wang et al. Controlling the risk of conversational search via reinforcement learning
CN116012627A (en) Causal time sequence dual-enhancement knowledge tracking method based on hypergraph clustering
Hiratani et al. Hebbian wiring plasticity generates efficient network structures for robust inference with synaptic weight plasticity
WO2022197615A1 (en) Techniques for adaptive generation and visualization of quantized neural networks
CN112214791B (en) Privacy policy optimization method and system based on reinforcement learning and readable storage medium
CN111488460A (en) Data processing method, device and computer readable storage medium
Zheng et al. Lazy Paired Hyper-Parameter Tuning.
KR20220066554A (en) Method, apparatus and computer program for buildding knowledge graph using qa model
Ávila et al. A gene expression programming algorithm for multi-label classification
KR20210115863A (en) Method and appartus of parallel processing for neural network model
Keizer et al. User simulation in the development of statistical spoken dialogue systems
CN115310004A (en) Graph nerve collaborative filtering recommendation method fusing project time sequence relation
CN113379392A (en) Method for acquiring high-quality data for numerical tasks in crowdsourcing scene
CN117575007B (en) Large model knowledge completion method and system based on post-decoding credibility enhancement

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18921391

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.03.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18921391

Country of ref document: EP

Kind code of ref document: A1