WO2022145145A1 - 情報処理装置、情報処理方法及び情報処理プログラム - Google Patents
情報処理装置、情報処理方法及び情報処理プログラム Download PDFInfo
- Publication number
- WO2022145145A1 WO2022145145A1 PCT/JP2021/042384 JP2021042384W WO2022145145A1 WO 2022145145 A1 WO2022145145 A1 WO 2022145145A1 JP 2021042384 W JP2021042384 W JP 2021042384W WO 2022145145 A1 WO2022145145 A1 WO 2022145145A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- series
- information
- data
- information processing
- input
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 111
- 238000003672 processing method Methods 0.000 title claims description 13
- 238000010801 machine learning Methods 0.000 claims abstract description 25
- 230000004075 alteration Effects 0.000 abstract 2
- 238000000034 method Methods 0.000 description 23
- 230000006870 function Effects 0.000 description 22
- 238000010586 diagram Methods 0.000 description 9
- 230000013016 learning Effects 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 238000012549 training Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- 230000015654 memory Effects 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000012447 hatching Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0025—Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0033—Recording/reproducing or transmission of music for electrophonic musical instruments
- G10H1/0041—Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
- G10H1/0058—Transmission between separate instruments or between individual components of a musical system
- G10H1/0066—Transmission between separate instruments or between individual components of a musical system using a MIDI interface
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/121—Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
- G10H2240/131—Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
- G10H2240/141—Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/311—Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation
Definitions
- This disclosure relates to an information processing device, an information processing method, and an information processing program.
- Patent Document 1 discloses a method of selectively learning a feature amount designated by a user so that a sequence is generated in a manner desired by the user.
- Patent Document 1 In some cases, you may want to generate a series in which only a part is newly generated and the rest is maintained. No specific study has been made on this point in Patent Document 1.
- One aspect of the present disclosure provides an information processing apparatus, an information processing method, and an information processing program capable of generating a series in which only a part is newly generated and the rest is maintained.
- the information processing apparatus includes a control means, a data input means for inputting series data, and a machine learning model for generating new series data based on the series data input by the data input means.
- a target series data that changes the series data when generating new series data by the machine learning model, and / or a series data selection means that selects the context series data that does not change.
- the control means either (i) generate new target sequence data that interpolates at least two sequence data already generated by the machine learning model, or (ii) for the sequence data already generated by the machine learning model. To generate different new series data.
- the information processing apparatus is determined by using input information, which is information about a sequence in which a part is composed of a target sequence and the rest is composed of a context sequence and gives a series of information, and a trained model.
- the trained model has a generator that generates a sequence including a created context sequence and a new target sequence, and when data corresponding to the input information is input, the trained model outputs the data corresponding to the new target sequence.
- the information processing apparatus uses input information, which is information about a sequence in which a part is composed of a target sequence and the rest is composed of a context sequence and gives a series of information, and a trained model. It has a generator that generates a series including a fixed context sequence and a new target sequence, and a user interface that accepts input information and presents the generation result of the generator. When the corresponding data is input, the data corresponding to the new target series is output.
- the information processing method is determined by using input information, which is information about a series in which a part is composed of a target series and the rest is composed of a context series and gives a series of information, and a trained model.
- the trained model outputs the data corresponding to the new target series when the data corresponding to the input information is input, including generating the generated context series and the series including the new target series.
- the information processing program is determined by using input information, which is information about a sequence in which a part is composed of a target sequence and the rest is composed of a context sequence and gives a series of information, and a trained model.
- input information which is information about a sequence in which a part is composed of a target sequence and the rest is composed of a context sequence and gives a series of information
- a trained model is used.
- the computer is made to generate a sequence including the created context sequence and the new target sequence, and the trained model outputs the data corresponding to the new target sequence when the data corresponding to the input information is input. do.
- the information that becomes the processing symmetry of the information processing apparatus according to the embodiment is a series (series data) that gives a series of information.
- sequences are music (music sequences, audio, etc.) and languages (documents, poetry, etc.).
- the sequence is a music sequence will be mainly described as an example.
- FIG. 1 is a diagram showing an example of the appearance of the information processing apparatus according to the embodiment.
- the information processing apparatus 1 is realized, for example, by executing a predetermined program (software) on a general-purpose computer.
- the information processing apparatus 1 is a laptop used by the user U.
- the information processing apparatus 1 includes a user interface (user interface 10 in FIG. 5 described later) for exchanging information with the user U.
- the user interface may include a display, keyboard, touchpad, speaker and the like.
- the display may be a touch panel display.
- the information processing device 1 may be realized not only by a laptop but also by various devices such as a tablet terminal and a smartphone.
- FIGS. 2 to 4 are diagrams showing an example of an operation screen presented by the user interface.
- a folder or the like is referred to and a file is selected.
- the series shown in the selected file is entered and visualized.
- a music sequence showing the time and the pitch value in association with each other is displayed as a series.
- the entire series is referred to as a series x and is illustrated.
- operation the operation by the user U (user operation) is simply referred to as "operation”.
- the series x is divided into a plurality of series by the operation related to the item "range specification". For example, a part of the visualized and displayed series x is range-selected and divided into a selected part and other parts. A part of the divided series x is referred to as a target series (shown by hatching), and the rest is referred to as a context series x C.
- the target series is the part that needs to be changed (changed).
- the context series x C is a part that is required not to be changed (maintained, not changed). Since the context sequence x C is unchanged, it can be said that it is the determined context sequence x C.
- the position information of the context series x C in the series x (corresponding to the position information R in FIG. 8 described later) is input.
- a series is generated by the operation related to the item "search". Details will be described later, but when "normal generation” is specified, the context series x C input by the operation related to the above-mentioned “series selection” and the position information input by the operation related to the “range specification” are added. Based on this, a series is generated.
- the generated series A is visualized and displayed in a mode in which operations such as reproduction can be performed.
- the sequence A is common in that it contains the same context sequence x C as compared to the original sequence x (FIG. 2), but differs in that it contains a new target sequence x T.
- the target sequence x T of the sequence A is referred to as a target sequence x TA and is illustrated.
- the operation related to the item "search” generates a further series based on the series A (as a starting point). Details will be described later, but when "variation generation” is specified, a series including a target series different from the target series x T of the generated series is generated.
- feature specification the feature of the series is specified. In this example, an arbitrary position (feature) in the latent space FS that defines the feature amount of the series is specified, and a series having the feature (feature amount corresponding to the specified position) is generated.
- This series is also a series including a target series different from the target series x TA of the series A. For example, through these operations, multiple generated sequences are obtained, each containing a different new target sequence.
- the further generated series B and series C are visualized and displayed together with the series A in a mode in which operations such as designation and reproduction can be performed.
- the target sequence x T of the sequence B is referred to as a target sequence x TB and is illustrated.
- the target sequence x T of the sequence C is referred to as a target sequence x TC and is illustrated.
- the series A, the series B, and the series C may be simply referred to as "series A and the like".
- various modes of operation may be presented by the user interface.
- FIG. 5 is a diagram showing an example of a schematic configuration of an information processing device.
- the information processing apparatus 1 includes a storage unit 20 and a generation unit 30 in addition to the user interface 10 described above with reference to FIG.
- the user interface 10 has a function as an input unit (reception unit) that receives information by user operation. It can be said that the user interface 10 has a function as a data input means for inputting series data.
- the user interface 10 also has a function as a series data selection means for selecting a target series (target series data) and / or a context series (context series data), for example, as described above with reference to FIG. I can say.
- the information received by the user interface 10 is referred to as "input information". Some examples of input information will be described.
- the input information includes information about the series.
- the information about the series is the information about the series including the determined context series x C. Examples of such input information are information about the series x described above with reference to FIG. 2 and information about the generated series (series A, etc.) described above with reference to FIGS. 3 and 4. be.
- the generated series is a series generated by the generation unit 30 described later.
- the input information may include information that specifies at least one series among a plurality of generated series.
- An example of such input information is information that specifies the sequence A or the like described above with reference to FIG.
- the input information may be information that specifies, for example, two series, series A and series B.
- the input information may include information that specifies the characteristics of the series.
- An example of such input information is information that specifies a position (characteristic of a series) in the latent space FS described above with reference to FIGS. 3 and 4.
- the user interface 10 has a function as an output unit (presentation unit) for presenting information to the user.
- the user interface 10 outputs the generation result of the generation unit 30 described later.
- the sequence A or the like is presented (screen display, sound output, etc.) in the manner described above with reference to FIGS. 3 and 4.
- features such as sequence A are presented as positions in the latent space FS. It can be said that the user interface 10 has a function as a display means for displaying a position in the latent space FS in a specifiable manner.
- the storage unit 20 stores various information used in the information processing device 1. As an example of the information stored in the storage unit 20, the trained model 21 and the information processing program 22 are illustrated.
- the trained model 21 is a trained model generated (learned) using the training data so as to output the data corresponding to the new target series x T when the data corresponding to the above input information is input. Is.
- the trained model 21 can be said to be a machine learning model that generates new series data based on the input series data.
- the generation unit 30 generates the corresponding data from the input information and inputs it to the trained model 21. Further, the generation unit 30 generates a corresponding series from the data output by the trained model 21.
- the input / output data of the trained model 21 includes, for example, a token sequence (token sequence).
- the data input to the trained model 21 includes a token of the context series x C.
- the data output by the trained model 21 includes a token of a new target sequence x T. The token will be described with reference to FIG.
- FIG. 6 is a diagram showing an example of a token.
- a music sequence is shown as an example of a sequence.
- the horizontal axis shows the time, and the vertical axis shows the pitch value (MIDI pitch).
- One unit time corresponds to one bar period. That is, in this example, the series of information given by the series is music information indicating the pitch value of the sound for each time.
- the token sequence corresponding to the music sequence is shown.
- the token indicates either the pitch value of the sound or the duration of the sound.
- the first token and the second token are arranged in chronological order.
- the first token is a token indicating the generation and stop of each sound included in the sequence.
- the second token is a token indicating the period during which the state shown in the corresponding first token is maintained.
- the part represented by angle brackets ⁇ > corresponds to one token.
- the tokens ⁇ ON, W, 60> are tokens (first tokens) indicating that the sound generation at the pitch value 60 of the sound source W (for example, indicating the type of musical instrument) starts at time 0. ..
- the following token ⁇ SHIFT, 1> is a token (corresponding second token) indicating that the state (sound source W, pitch value 60) shown in the corresponding first token is maintained for one unit time. be. That is, SHIFT means that only the time moves (only the time elapses) in the state shown in the immediately preceding token. Other tokens for ON and SHIFT will be described as well.
- the token ⁇ OFF, W, 60> is a token (first token) indicating that the sound generation at the pitch value 60 of the sound source W ends.
- the above is an example of a token of a series when the series is music. If the sequence is a language, the token is a word or the like.
- FIG. 7 is a diagram showing an example of a schematic configuration of a trained model.
- the trained model 21 is a variational autoencoder (VAE) model, including an encoder ENC and a decoder DEC.
- VAE variational autoencoder
- Examples of architectures are Transformers and RNNs (Recurrent Neural Networks).
- An example of RNN is RSTM (Long short-term memory).
- the encoder ENC outputs (extracts) the feature amount of the input token sequence.
- the decoder DEC generates (reconstructs) a sequence of tokens to be output from the feature amount output by the encoder ENC, for example, using the sequence of tokens having the highest probability.
- FIG. 8 is a diagram showing an example of learning.
- three models an encoder model 211, a plier model 212, and a decoder model 213, are used.
- the architectures of the encoder model 211 and the plier model 212 are Transformers and RNNs in this example.
- the architecture of the decoder model 213 is Transformer.
- the trained model 21 may include the plier model 212 and the decoder model 213 as the encoder ENC and decoder DEC of FIG. 7 described above.
- the encoder model 211 gives a feature amount z.
- the feature amount z may be a vector indicating a position (point) in the latent space FS. It can be said that the position in the latent space FS indicates the characteristics of the series.
- the latent space FS is a multidimensional space and is also called a latent feature space or the like. In the embodiment, the latent space FS can be said to be a context latent space learned under the condition (with context conditions) that the determined context sequence x C is maintained.
- the latent space FS of FIGS. 3 and 4 described above is a display (for example, a two-dimensional display) of a part of the multidimensional dimensions.
- the sequence x and the position information R are input to the encoder model 211.
- the position information R may be a variable j and a variable k as described below.
- the sequence x input to the encoder model 211 is illustrated as tokens s 1 , ... sk-1 , sk, ..., s j , s j + 1 , ..., s L.
- the subscript indicates the order in the series of each token in the series.
- the variable j and the variable k give the position information R.
- the tokens s 1 to sk-1 from the first to the k-1st and the tokens s j to s L from the jth to the Lth are specified as the positions of the context series x C.
- the tokens sk to s j-1 from the kth to the j-1th are specified as the positions of the new target series xT to be generated later.
- the encoder model 211 among the tokens whose positions are specified as described above, only the tokens of the context series x C are input to the RNN.
- the RNN outputs the feature amount z of the input context series x C (token).
- the encoder model 211 outputs the feature amount z when the sequence x and the position information R are input, and is therefore represented by the expression “q (z
- the plier model 212 also gives the feature amount z like the encoder model 211.
- the context sequence x C and the position information R are input to the plier model 212.
- the context sequence x C is shown as tokens 1 , ... Sk-1 , and tokens s j + 1 , ..., s L.
- the remaining tokens are given as predetermined tokens M. If there are a plurality of remaining tokens, they may all be given as the same token M. It can be said that the part of the series x other than the context series x c (the part of the new target series x T generated later) is masked by the token M.
- the token M may be defined to give a feature amount different from any of the feature amounts z corresponding to the tokens that may be input as tokens of the context series x C.
- the position information R is as described above.
- the tokens s 1 to sk-1 from the first to the k-1 and the tokens s j to s L from the jth to the Lth are specified as the positions of the context series x C.
- the plier model 212 among the tokens whose positions are specified as described above, only the token M is input to the RNN.
- the RNN outputs the feature amount z of the input token M.
- the plier model 212 outputs the feature amount z when the context sequence x C and the position information R are input, and is therefore represented by the expression “p (z
- the decoder model 213 generates a token of a new target sequence x T based on the token of the feature quantity z and the context sequence x C. Specifically, the decoder model 213 reconstructs only the token of the target sequence x T of the context sequence x C and the target sequence x T. The token of the reconstructed target sequence x T and the token of the originally determined context sequence x C are combined, for example, by the generation unit 30, and a sequence including the context sequence x C and the new target sequence x T is generated. Will be done.
- the decoder model 213 outputs a sequence in which only the target sequence x T is reconstructed when the feature quantity z, the context sequence x C , and the position information R are input, so that “p (x T
- the decoder model 213 generates tokens sk , ..., s j with reference to tokens s j + 1 , ..., s L , B, s 1 , ..., Sk-1 . do.
- the tokens s j + 1 , ..., s L , B, s 1 , ..., sk-1 are the original positions.
- the token Y the token indicating the start of the sequence
- the encoder model 211, plier model 212 and decoder model 213 described above are trained to minimize the loss function.
- the loss function L rec and the loss function L pri are used as the loss functions.
- the parameters of the encoder model 211, the plier model 212, and the decoder model 213 are learned so as to minimize the sum (addition value, etc.) of the loss function L rec and the loss function L pri .
- the loss function L rec is an error (reconstruction error) when the decoder model 213 reconstructs the target sequence using the feature amount z output by the plier model 212.
- the loss function L pri is the difference in distribution (pliers error) between the encoder model 211 and the plier model 212.
- An example of a prior error is the Kullback-Leibler (KL) distance.
- FIG. 9 is a flowchart showing an example of learning. As a premise, it is assumed that training data including a large number of various series x are prepared.
- step S1 a series mini-batch is acquired from the training data. For example, an arbitrary predetermined number (64 or the like) of series x is acquired (sampled) from the training data.
- step S2 set the location information.
- the position information R described above with reference to FIG. 8 more specifically, the values of j and k are set (sampled) to arbitrary values.
- step S3 the parameter is updated using the loss function. For example, using the mini-batch acquired and set in steps S1 and S2 above, to minimize the sum of the loss function L rec and the loss function L pri , as previously described with reference to FIG.
- the parameters of the encoder model 211, the plier model 212, and the decoder model 213 are updated (learned).
- step S4 when the number of learnings is less than a predetermined number (step S4: YES), the process is returned to step S1.
- step S4: NO When the predetermined number of times is reached (step S4: NO), the processing of the flowchart ends.
- the trained model 21 is generated as described above.
- the parameters may be updated by setting different position information for the same mini-batch.
- the processes of steps S2 and S3 may be repeatedly executed for the number of patterns of the set position information R.
- the information processing program 22 is a program (software) for realizing the processing executed by the information processing device 1.
- the generation unit 30 uses the input information input to the user interface 10 and the trained model 21 to generate a sequence including a determined context sequence x C and a new target sequence x T.
- the generated series is a generated series (series A or the like) described above with reference to FIGS. 3 and 4.
- the generation unit 30 can be said to be a control means for generating a series.
- the function of the control means may be realized by a processor or the like (for example, the CPU 1100 in FIG. 14 described later).
- 10 to 12 are flowcharts showing an example of processing (information processing method) executed in the information processing apparatus.
- FIG. 10 shows an example of the first generation method.
- the generation unit 30 generates a sequence (for example, randomly) using the context sequence x C , the position information R, and the trained model 21.
- the first generation method is referred to as "normal generation" and is illustrated.
- step S11 the feature amount is acquired (sampled) using the input context series and position information and the plier model.
- the user interface 10 accepts the context sequence x C and the position information R as input information by the operations related to the items “series selection” and “range designation” as described above with reference to FIG.
- the generation unit 30 inputs the sequence including the context sequence x C and the predetermined token M, and the position information R into the plier model 212.
- the plier model 212 outputs (extracts) the feature amount z corresponding to the token M.
- a target sequence is generated using the context sequence, the feature quantity, and the decoder.
- the generation unit 30 uses the trained model 21 to convert the context sequence x C used in the previous step S11 and the acquired feature quantity z into a decoder as described above with reference to FIG. Input to model 213.
- the decoder model 213 generates (reconstructs) the target sequence x T.
- step S13 a series including the context series and the target series is generated.
- the generation unit 30 combines the context sequence x C used in the previous step S12 and the generated new target sequence x T to generate a sequence including them.
- FIG. 11 shows an example of the second generation method.
- the generation unit 30 generates a series including a target series different from the target series of the generated series as a new target series.
- the generation unit 30 generates a series including a target series having a feature between the two designated series (complementing the two series data) as a new target series.
- the second generation method is referred to as "interpolation generation" and is illustrated.
- step S21 a feature amount different from the feature amount of the specified plurality of series is specified.
- the user interface 10 accepts information that specifies the series A and the series B and information that specifies "interpolation generation" as input information, as described above with reference to FIG.
- the generation unit 30 uses the trained model 21 to obtain the feature amount z AB of the position between the position of the feature amount z A of the series A and the position of the feature amount z B of the series B in the latent space FS. Specify as. Since the trained model 21 is learning the latent space FS, it is possible to specify such a feature amount z AB .
- the user interface 10 may provide a display or the like that allows the user to specify ⁇ .
- a target sequence is generated using the specified feature quantity, the context sequence, and the decoder.
- the generation unit 30 uses the trained model 21 to input the feature amount z AB specified in the previous step S21 into the decoder model 213.
- the decoder model 213 generates a target sequence x TAB corresponding to the feature amount z AB .
- the target sequence x TAB thus obtained and the context sequence x C are combined to generate a new sequence AB.
- FIG. 12 shows an example of the third generation method. Also in the third generation method, the generation unit 30 generates a series including a target series different from (or different from) the target series of the generated series as a new target series. In the third generation method, even one specified series is sufficient.
- the third generation method is referred to as "variation generation" and is illustrated.
- step S31 the feature amount in the vicinity of the feature amount of the designated series is specified.
- the user interface 10 accepts the information that specifies the series A and the information that specifies "variation generation" in the example of FIG. 3 or FIG. 4 described above as input information.
- the generation unit 30 uses the trained model 21 to specify the feature amount z A ′ at the position where the position of the feature amount z A of the series A in the latent space FS is slightly moved. The movement is performed, for example, by adding noise to the feature amount zA . Noise may be sampled from a normal distribution in each dimension of latent space FS.
- the mean and variance of the normal distribution may be arbitrary (eg, mean 0, variance 0.01).
- a target sequence is generated using the specified feature quantity, the context sequence, and the decoder.
- the generation unit 30 inputs the feature amount z A ′ specified in the previous step S31 into the decoder model 213 using the trained model 21.
- the decoder model 213 generates the target sequence xTA'corresponding to the feature amount z A '.
- the target sequence xTA ′ thus obtained and the context sequence xC are combined to generate a new sequence A ′.
- a plurality of different feature quantities may be specified in the previous step S32, and in that case, a new target sequence is generated by the same number as the number of feature quantities (number of variations), and eventually a new series is generated.
- the user interface 10 may provide a display or the like that allows the user to specify the number of variations.
- the above-mentioned series that is the source of variation generation and the series that is generated may overlap with the above-mentioned series that is the source of interpolation generation and the series that is generated.
- the series B is generated by the interpolation generation from the series A and the series C.
- the variation generation from the sequence B can generate the sequence A and the sequence C.
- the generation unit 30 may generate a series having the specified characteristics.
- the user interface 10 accepts information that specifies a position (characteristic of the series) in the latent space FS as input information, as described above with reference to FIGS. 3 and 4.
- the generation unit 30 inputs the feature amount at the specified position to the decoder model 213.
- the decoder model 213 generates a target sequence corresponding to the feature amount.
- the context series x C and the target series are combined to generate new series D, series E, series F, and the like.
- FIG. 13 is a diagram schematically showing an example of a series search.
- the search proceeds from the left side to the right side of the figure.
- the sequence A or the like is obtained by, for example, various generation methods described so far.
- the positions of the series A and the like in the latent space FS are schematically shown.
- a further series search is performed. For example, interpolation generation may be performed as shown in the upper part of the figure. In this example, sequence AB (shown with white circles) having features between series A and series B and sequence BC (shown with white circles) having features between series B and series C are generated. Further series may be generated from the generated series AB, series BC, etc. by interpolation generation, variation generation, feature specification, and the like.
- variation generation may be performed as shown in the middle part of the figure.
- a series A ′, a series A ′′ and a series A ′′ ′′ (all shown by white circles) having a feature obtained by adding noise to the feature of the sequence A are generated.
- Further series may be generated from the generated series A ′, series A ′′, series A ′′, etc. by interpolation generation, variation generation, feature specification, and the like.
- sequence D sequence D
- sequence E all shown by white circles
- Further series may be generated from the generated series D, series E, series F, etc. by interpolation generation, variation generation, feature specification, and the like.
- the user U can repeat the generation of the sequence until the desired sequence is obtained.
- the user U can narrow down so as to obtain a desired target sequence. For example, the user U can generate a series A to a series G including different target series, and further generate a series obtained by blending the favorite series B and the series F by interpolation generation. ..
- the user U can improve the favorite target series while making minor corrections.
- the user U can generate a series similar to the series A but slightly different (for example, series B to series E, etc.) by variation generation. Among those generated series, the series close to the image (for example, series C and series E) can be blended by interpolation generation to generate a further series.
- FIG. 14 is a diagram showing an example of a hardware configuration of an information processing apparatus.
- the information processing apparatus 1 is realized by the computer 1000.
- the computer 1000 has a CPU 1100, a RAM 1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input / output interface 1600.
- Each part of the computer 1000 is connected by a bus 1050.
- the CPU 1100 operates based on the program stored in the ROM 1300 or the HDD 1400, and controls each part. For example, the CPU 1100 expands the program stored in the ROM 1300 or the HDD 1400 into the RAM 1200, and executes processing corresponding to various programs.
- the ROM 1300 stores a boot program such as a BIOS (Basic Input Output System) executed by the CPU 1100 when the computer 1000 is started, a program depending on the hardware of the computer 1000, and the like.
- BIOS Basic Input Output System
- the HDD 1400 is a computer-readable recording medium that non-temporarily records a program executed by the CPU 1100 and data used by such a program.
- the HDD 1400 is a recording medium for recording an information processing program according to the present disclosure, which is an example of program data 1450.
- the communication interface 1500 is an interface for the computer 1000 to connect to an external network 1550 (for example, the Internet).
- the CPU 1100 receives data from another device or transmits data generated by the CPU 1100 to another device via the communication interface 1500.
- the input / output interface 1600 is an interface for connecting the input / output device 1650 and the computer 1000.
- the CPU 1100 receives data from an input device such as a keyboard or mouse via the input / output interface 1600. Further, the CPU 1100 transmits data to an output device such as a display, a speaker, or a printer via the input / output interface 1600. Further, the input / output interface 1600 may function as a media interface for reading a program or the like recorded on a predetermined recording medium (media).
- the media is, for example, an optical recording medium such as DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk), a magneto-optical recording medium such as MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory.
- an optical recording medium such as DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk)
- a magneto-optical recording medium such as MO (Magneto-Optical disk)
- tape medium such as DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk)
- MO Magneto-optical disk
- the CPU 1100 of the computer 1000 realizes the functions of the generation unit 30 and the like by executing the information processing program loaded on the RAM 1200.
- the HDD 1400 stores the program related to the present disclosure (information processing program 22 of the storage unit 20) and the data in the storage unit 20.
- the CPU 1100 reads the program data 1450 from the HDD 1400 and executes the program, but as another example, these programs may be acquired from another device via the external network 1550.
- FIG. 15 is a diagram showing an example of a schematic configuration of an RNN.
- the exemplified RNN includes an input layer, an intermediate layer, and an output layer. Some neurons contained in the layer are schematically illustrated by white circles.
- the token or the like described above with reference to FIG. 5 or the like is input to the input layer.
- the middle layer is configured to include the LSTM block in this example, and long-term dependencies have been learned, and is suitable for handling a series of music, documents, etc. (for example, time series data).
- the output layer is a fully connected layer, and for example, the token described above with reference to FIG. 5 and the like is output together with the probability.
- the information processing apparatus 1 may be realized outside the information processing apparatus 1 (for example, an external server). In that case, the information processing apparatus 1 may include a part or all of the functions of the storage unit 20 and the generation unit 30 in the external server. When the information processing apparatus 1 communicates with the external server, the processing of the information processing apparatus 1 described so far is similarly realized.
- the trained model 21 may also include the encoder model 211 as an encoder ENC. In this case, for example, it can be used for extracting a feature amount from a series x including a target series as described with reference to FIG. 1.
- the information processing apparatus 1 includes a control means (generation unit 30), a data input means (user interface 10) for inputting series data (series x, etc.), and the like.
- Target series data eg, target series x TA
- the sequence data selection means for selecting the context sequence data (context sequence x C ) to which the above is not added, and the control means (generation unit 30) are (i) machine learning model (trained model 21). Generate new target sequence data that interpolates at least two sequence data already generated by (eg, sequence A and sequence B), or (ii) already generated by a machine learning model (trained model 21). It is characterized in that new series data different from the series data (for example, series A) is generated.
- the information processing apparatus 1 displays a position in a space (latent space FS) that defines a feature amount of series data (for example, series A) learned by a machine learning model (learned model 21) in a specifiable manner.
- a means (user interface 10) is further provided, and the control means (generation unit 30) generates series data having a feature amount corresponding to a designated position in the space (latent space FS) as new series data. It may be characterized by that.
- the information processing device 1 is also specified as follows. As described with reference to FIGS. 1 to 5, the information processing apparatus 1 includes input information which is information about a sequence in which a part is composed of a target sequence and the rest is composed of a context sequence and gives a series of information. A generation unit 30 for generating a sequence (for example, sequence A, etc.) including a determined context sequence x C and a new target sequence x T by using the trained model 21 is provided, and the trained model 21 is used as input information. When the corresponding data is input, the data corresponding to the new target series x T is output.
- the information processing apparatus 1 may include a user interface 10 that accepts input information and presents the generation result of the generation unit 30.
- a sequence including a determined context sequence x C and a new target sequence x T is generated.
- the context sequence x C constitutes a part of the sequence
- the target sequence x T constitutes the rest of the sequence. Therefore, it is possible to generate a series in which only a part is newly generated and the rest is maintained.
- the input information (for example, received by the user interface 10) includes the predetermined context sequence x C and the position information R of the determined context sequence x C in the sequence. , May be included.
- the input information and the trained model 21 it is possible to generate a sequence including a determined context sequence x C and a new target sequence x T.
- the input information (accepted by, for example, the user interface 10) includes information about the sequence generated by the generation unit 30 (for example, sequence A, etc.), and the generation unit 30 includes.
- a series including a series different from the target series (for example, target series x TA , etc.) of the series generated by the generation unit 30 as a new target series may be generated. This makes it possible to further generate a series based on the generated series.
- the input information (for example, received by the user interface 10) includes at least one series among a plurality of series (for example, series A) generated by the generation unit 30.
- the generation unit 30 may generate a sequence including the specified information and including a target sequence different from the target sequence of the designated sequence (for example, target sequence x TA or the like) as a new target sequence. This allows further series to be generated based on the specified series.
- the input information (for example, received by the user interface 10) is the two series (for example, the series A and the series A) of the plurality of series (for example, the series A) generated by the generation unit 30.
- the generation unit 30 includes information for designating the sequence B), and the generation unit 30 makes a new target sequence having a feature between the target sequences of the two specified sequences (for example, the target sequence x TA and the target sequence x TB ). You may generate a series containing as. This makes it possible to generate a series with features between the two specified series.
- the input information (for example, received by the user interface 10) includes information that specifies the characteristics of the sequence (for example, the position in the latent space FS), and the generation unit 30 , May generate a sequence with the specified characteristics. This makes it possible to generate a series with the specified characteristics.
- the data input to the trained model 21 is a token of a predetermined context series x C (for example, tokens 1 , ... Sk-1 and tokens j + 1). , ..., s L ), and the data output by the trained model 21 may include tokens of the new target sequence x T (for example, tokens k , ..., s j ).
- the data to be input may further include a predetermined token M.
- the series of information given by the sequence is music information indicating the pitch value of the sound for each time, and the token may indicate at least one of the pitch value of the sound and the generation period of the sound.
- the trained model 21 can be used with such tokens as input / output data.
- the information processing method described with reference to FIGS. 10 to 12 and the like is also a positional aspect of the present disclosure.
- the information processing method is a context sequence x C determined by using input information, which is information about a sequence in which a part is composed of a target sequence and the rest is composed of a context sequence and gives a series of information, and a trained model 21. And to generate a sequence containing the new target sequence x T (step S13, step S22 and / or step S32), the trained model 21 receives new targets when data corresponding to the input information is input. The data corresponding to the series x T is output. Even with such an information processing method, as described above, it is possible to generate a series in which only a part is newly generated and the rest is maintained.
- the information processing program 22 described with reference to FIG. 5 and the like is also a positional aspect of the present disclosure.
- the information processing program 22 is a context sequence x determined by using input information which is information about a sequence in which a part is composed of a target sequence and the rest is composed of a context sequence and gives a series of information, and a trained model 21.
- a computer is made to generate a sequence including C and a new target sequence x T (step S13, step S22 and / or step S32), and the trained model 21 is input with data corresponding to the input information. And the data corresponding to the new target series x T is output.
- the present technology can also have the following configurations.
- Control means and Data input means for inputting series data and A machine learning model that generates new series data based on the series data input by the data input means, and When the machine learning model generates the new series data, the target series data that changes the series data and / or the context data selection means that selects the context series data that does not change, and the series data selection means.
- the control means is (I) Generate new target sequence data that interpolates at least two sequence data already generated by the machine learning model, or generate (Ii) Generate new series data that is different from the series data already generated by the machine learning model. An information processing device characterized by this.
- the control means generates series data having a feature amount corresponding to a designated position in the space as the new series data.
- the information processing apparatus characterized in that.
- (3) A series containing a determined context series and a new target series using input information, which is information about a series in which a part is composed of a target series and the rest is a context series and gives a series of information, and a trained model. Equipped with a generator to generate When the data corresponding to the input information is input, the trained model outputs the data corresponding to the new target series.
- the input information is With the above-mentioned determined context series, With the position information of the above-mentioned determined context series in the series, including, The information processing apparatus according to (3).
- the input information includes information about the series generated by the generator.
- the generation unit generates a series including a target series different from the target series of the series generated by the generation unit as the new target series.
- the input information includes information that specifies at least one series among the plurality of series generated by the generation unit.
- the generation unit generates a series including a target series different from the target series of the designated series as the new target series.
- the information processing apparatus according to any one of (3) to (5).
- the input information includes information that specifies two of the plurality of series generated by the generation unit.
- the generation unit generates a sequence including a target sequence having a feature between the target sequences of the two designated sequences as the new target sequence.
- the information processing apparatus according to any one of (3) to (6).
- the input information includes information that specifies the characteristics of the series.
- the generator generates a sequence with the specified characteristics.
- the information processing apparatus according to any one of (7).
- the data input to the trained model includes the tokens of the determined context series.
- the data output by the trained model includes the token of the new target series.
- the information processing apparatus according to any one of (3) to (8).
- the data input to the trained model includes the tokens of the determined context series and the predetermined tokens.
- the data output by the trained model includes the token of the new target series.
- the information processing apparatus according to any one of (3) to (9).
- the series of information given by the series is music information indicating the pitch value of the sound for each time.
- the token indicates at least one of the pitch value of the sound and the generation period of the sound.
- the information processing apparatus according to (9) or (10).
- (12) A series containing a determined context series and a new target series using input information, which is information about a series in which a part is composed of a target series and the rest is a context series and gives a series of information, and a trained model. And the generator that generates A user interface that accepts the input information and presents the generation result of the generation unit.
- the trained model outputs the data corresponding to the new target series.
- Information processing equipment (13)
- the user interface is With the above-mentioned determined context series, With the position information of the above-mentioned determined context series in the series, Is accepted as the input information, The information processing apparatus according to (12).
- the user interface receives information about the series generated by the generation unit as the input information, and receives the information.
- the generation unit generates a series including a target series different from the target series of the series generated by the generation unit as the new target series.
- the user interface accepts information that specifies at least one series among the plurality of series generated by the generation unit as the input information.
- the generation unit generates a series including a target series different from the target series of the designated series as the new target series.
- the information processing apparatus according to any one of (12) to (14).
- the user interface receives information that specifies two of the plurality of series generated by the generation unit as the input information.
- the generation unit generates a sequence including a target sequence having a feature between the target sequences of the two designated sequences as the new target sequence.
- the information processing apparatus according to any one of (12) to (15).
- the user interface receives information that specifies the characteristics of the series as the input information, and receives the information.
- the generator generates a sequence with the specified characteristics.
- the information processing apparatus according to any one of (12) to (16).
- the data input to the trained model includes the tokens of the determined context series.
- the data output by the trained model includes the token of the new target series.
- the information processing apparatus according to any one of (12) to (17).
- the data input to the trained model includes the tokens of the determined context series and the predetermined tokens.
- the data output by the trained model includes the token of the new target series.
- the series of information given by the series is music information indicating the pitch value of the sound for each time.
- the token indicates at least one of the pitch value of the sound and the generation period of the sound.
- the information processing apparatus according to (18) or (19).
- a series containing a determined context series and a new target series using input information which is information about a series in which a part is composed of a target series and the rest is a context series and gives a series of information, and a trained model. Including producing When the data corresponding to the input information is input, the trained model outputs the data corresponding to the new target series.
- Information processing method (22) A series containing a determined context series and a new target series using input information, which is information about a series in which a part is composed of a target series and the rest is a context series and gives a series of information, and a trained model. To generate, Let the computer run When the data corresponding to the input information is input, the trained model outputs the data corresponding to the new target series.
- Information processing program
- Information processing device 10 User interface (input means, selection means, display means) 20 Memory unit 21 Learned model (machine learning model) 22 Information processing program 30 Generation unit (control unit) 211 Encoder Model 212 Pliers Model 213 Decoder Model ENC Encoder DEC Decoder U User
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
1.実施形態
2.ハードウェア構成の例
3.RNNの構成の例
4.変形例
5.効果
実施形態に係る情報処理装置の処理対称となる情報は、一連の情報を与える系列(系列データ)である。系列の例は、音楽(音楽シーケンス、オーディオ等)及び言語(文書、詩)等である。以下では、主に、系列が音楽シーケンスである場合を例に挙げて説明する。
図14は、情報処理装置のハードウェア構成の例を示す図である。この例では、情報処理装置1は、コンピュータ1000によって実現される。コンピュータ1000は、CPU1100、RAM1200、ROM(Read Only Memory)1300、HDD(Hard Disk Drive)1400、通信インターフェイス1500、及び入出力インターフェイス1600を有する。コンピュータ1000の各部は、バス1050によって接続される。
図15は、RNNの概略構成の例を示す図である。例示されるRNNは、入力層と、中間層と、出力層とを含む。層に含まれるいくつかのニューロンが、白丸で模式的に図示される。入力層には、例えば先に図5等を参照して説明したトークン等が入力される。中間層は、この例ではLSTMブロックを含んで構成され、長期依存(long-term dependencies)が学習されており、音楽、文書等の系列(例えば時系列データ)の扱いに適する。出力層は、全結合層であり、例えば先に図5等を参照して説明したトークンを確率とともに出力する。
開示される技術は、上記実施形態に限定されない。いくつかの変形例について述べる。
以上説明した情報処理装置1は、例えば次のように特定される。図1~図5等を参照して説明したように、情報処理装置1は、制御手段(生成部30)と、系列データ(系列x等)を入力するデータ入力手段(ユーザインタフェース10)と、データ入力手段(ユーザインタフェース10)により入力された系列データ(例えば系列x)に基づいて、新たな系列データ(例えば系列A)を生成する機械学習モデル(学習済みモデル21)と、機械学習モデル(学習済みモデル21)により新たな系列データ(例えば系列A)を生成する際に、系列データ(例えば系列A)に対して変更を加えるターゲット系列データ(例えばターゲット系列xTA)、及び/又は、変更を加えないコンテキスト系列データ(コンテキスト系列xC)を選択する系列データ選択手段(ユーザインタフェース10)と、を備え、制御手段(生成部30)は、(i)機械学習モデル(学習済みモデル21)により既に生成された少なくとも2つの系列データ(例えば系列A及び系列B)を補間する新たなターゲット系列データを生成するか、又は、(ii)機械学習モデル(学習済みモデル21)により既に生成された系列データ(例えば系列A)に対して異なる新たな系列データを生成する、ことを特徴とする。
(1)
制御手段と、
系列データを入力するデータ入力手段と、
前記データ入力手段により入力された前記系列データに基づいて、新たな系列データを生成する機械学習モデルと、
前記機械学習モデルにより前記新たな系列データを生成する際に、前記系列データに対して変更を加えるターゲット系列データ、及び/又は、変更を加えないコンテキスト系列データを選択する系列データ選択手段と、
を備え、
前記制御手段は、
(i)前記機械学習モデルにより既に生成された少なくとも2つの系列データを補間する新たなターゲット系列データを生成するか、又は、
(ii)前記機械学習モデルにより既に生成された系列データに対して異なる新たな系列データを生成する、
ことを特徴とする情報処理装置。
(2)
前記機械学習モデルで学習された前記系列データの特徴量を規定する空間中の位置を指定可能な態様で表示する表示手段をさらに備え、
前記制御手段は、前記空間中の指定された位置に対応する特徴量を有する系列データを、前記新たな系列データとして生成する、
ことを特徴とする、(1)に記載の情報処理装置。
(3)
一部がターゲット系列で構成され残部がコンテキスト系列で構成され一連の情報を与える系列に関する情報である入力情報と、学習済みモデルとを用いて、決められたコンテキスト系列及び新たなターゲット系列を含む系列を生成する生成部を備え、
前記学習済みモデルは、前記入力情報に対応するデータが入力されると、前記新たなターゲット系列に対応するデータを出力する、
情報処理装置。
(4)
前記入力情報は、
前記決められたコンテキスト系列と、
系列における前記決められたコンテキスト系列の位置情報と、
を含む、
(3)に記載の情報処理装置。
(5)
前記入力情報は、前記生成部が生成した系列に関する情報を含み、
前記生成部は、前記生成部が生成した系列のターゲット系列とは異なるターゲット系列を前記新たなターゲット系列として含む系列を生成する、
(3)又は(4)に記載の情報処理装置。
(6)
前記入力情報は、前記生成部が生成した複数の系列のうちの少なくとも1つの系列を指定する情報を含み、
前記生成部は、指定された系列のターゲット系列とは異なるターゲット系列を前記新たなターゲット系列として含む系列を生成する、
(3)~(5)のいずれかに記載の情報処理装置。
(7)
前記入力情報は、前記生成部が生成した複数の系列のうちの2つの系列を指定する情報を含み、
前記生成部は、指定された2つの系列のターゲット系列どうしの間の特徴を有するターゲット系列を前記新たなターゲット系列として含む系列を生成する、
(3)~(6)のいずれかに記載の情報処理装置。
(8)
前記入力情報は、系列の特徴を指定する情報を含み、
前記生成部は、指定された特徴を有する系列を生成する、
(3)~(7)のいずれかに記載の情報処理装置。
(9)
前記学習済みモデルに入力されるデータは、前記決められたコンテキスト系列のトークンを含み、
前記学習済みモデルが出力するデータは、前記新たなターゲット系列のトークンを含む、
(3)~(8)のいずれかに記載の情報処理装置。
(10)
前記学習済みモデルに入力されるデータは、前記決められたコンテキスト系列のトークン及び所定のトークンを含み、
前記学習済みモデルが出力するデータは、前記新たなターゲット系列のトークンを含む、
(3)~(9)のいずれかに記載の情報処理装置。
(11)
前記系列が与える一連の情報は、時刻ごとの音の音高値を示す音楽情報であり、
前記トークンは、前記音の音高値及び音の発生期間の少なくとも一方を示す、
(9)又は(10)に記載の情報処理装置。
(12)
一部がターゲット系列で構成され残部がコンテキスト系列で構成され一連の情報を与える系列に関する情報である入力情報と、学習済みモデルとを用いて、決められたコンテキスト系列及び新たなターゲット系列を含む系列を生成する生成部と、
前記入力情報を受け付け、また、前記生成部の生成結果を提示するユーザインタフェースと、
を備え、
前記学習済みモデルは、前記入力情報に対応するデータが入力されると、前記新たなターゲット系列に対応するデータを出力する、
情報処理装置。
(13)
前記ユーザインタフェースは、
前記決められたコンテキスト系列と、
系列における前記決められたコンテキスト系列の位置情報と、
を、前記入力情報として受け付ける、
(12)に記載の情報処理装置。
(14)
前記ユーザインタフェースは、前記生成部が生成した系列に関する情報を、前記入力情報として受け付け、
前記生成部は、前記生成部が生成した系列のターゲット系列とは異なるターゲット系列を前記新たなターゲット系列として含む系列を生成する、
(12)又は(13)に記載の情報処理装置。
(15)
前記ユーザインタフェースは、前記生成部が生成した複数の系列のうちの少なくとも1つの系列を指定する情報を、前記入力情報として受け付け、
前記生成部は、指定された系列のターゲット系列とは異なるターゲット系列を前記新たなターゲット系列として含む系列を生成する、
(12)~(14)のいずれかに記載の情報処理装置。
(16)
前記ユーザインタフェースは、前記生成部が生成した複数の系列のうちの2つの系列を指定する情報を、前記入力情報として受け付け、
前記生成部は、指定された2つの系列のターゲット系列どうしの間の特徴を有するターゲット系列を前記新たなターゲット系列として含む系列を生成する、
(12)~(15)のいずれかに記載の情報処理装置。
(17)
前記ユーザインタフェースは、系列の特徴を指定する情報を、前記入力情報として受け付け、
前記生成部は、指定された特徴を有する系列を生成する、
(12)~(16)のいずれかに記載の情報処理装置。
(18)
前記学習済みモデルに入力されるデータは、前記決められたコンテキスト系列のトークンを含み、
前記学習済みモデルが出力するデータは、前記新たなターゲット系列のトークンを含む、
(12)~(17)のいずれかに記載の情報処理装置。
(19)
前記学習済みモデルに入力されるデータは、前記決められたコンテキスト系列のトークン及び所定のトークンを含み、
前記学習済みモデルが出力するデータは、前記新たなターゲット系列のトークンを含む、
(12)~(18)のいずれかに記載の情報処理装置。
(20)
前記系列が与える一連の情報は、時刻ごとの音の音高値を示す音楽情報であり、
前記トークンは、前記音の音高値及び音の発生期間の少なくとも一方を示す、
(18)又は(19)に記載の情報処理装置。
(21)
一部がターゲット系列で構成され残部がコンテキスト系列で構成され一連の情報を与える系列に関する情報である入力情報と、学習済みモデルとを用いて、決められたコンテキスト系列及び新たなターゲット系列を含む系列を生成することを含み、
前記学習済みモデルは、前記入力情報に対応するデータが入力されると、前記新たなターゲット系列に対応するデータを出力する、
情報処理方法。
(22)
一部がターゲット系列で構成され残部がコンテキスト系列で構成され一連の情報を与える系列に関する情報である入力情報と、学習済みモデルとを用いて、決められたコンテキスト系列及び新たなターゲット系列を含む系列を生成すること、
をコンピュータに実行させ、
前記学習済みモデルは、前記入力情報に対応するデータが入力されると、前記新たなターゲット系列に対応するデータを出力する、
情報処理プログラム。
10 ユーザインタフェース(入力手段、選択手段、表示手段)
20 記憶部
21 学習済みモデル(機械学習モデル)
22 情報処理プログラム
30 生成部(制御部)
211 エンコーダモデル
212 プライヤーモデル
213 デコーダモデル
ENC エンコーダ
DEC デコーダ
U ユーザ
Claims (22)
- 制御手段と、
系列データを入力するデータ入力手段と、
前記データ入力手段により入力された前記系列データに基づいて、新たな系列データを生成する機械学習モデルと、
前記機械学習モデルにより前記新たな系列データを生成する際に、前記系列データに対して変更を加えるターゲット系列データ、及び/又は、変更を加えないコンテキスト系列データを選択する系列データ選択手段と、
を備え、
前記制御手段は、
(i)前記機械学習モデルにより既に生成された少なくとも2つの系列データを補間する新たなターゲット系列データを生成するか、又は、
(ii)前記機械学習モデルにより既に生成された系列データに対して異なる新たな系列データを生成する、
ことを特徴とする情報処理装置。 - 前記機械学習モデルで学習された前記系列データの特徴量を規定する空間中の位置を指定可能な態様で表示する表示手段をさらに備え、
前記制御手段は、前記空間中の指定された位置に対応する特徴量を有する系列データを、前記新たな系列データとして生成する、
ことを特徴とする、請求項1に記載の情報処理装置。 - 一部がターゲット系列で構成され残部がコンテキスト系列で構成され一連の情報を与える系列に関する情報である入力情報と、学習済みモデルとを用いて、決められたコンテキスト系列及び新たなターゲット系列を含む系列を生成する生成部を備え、
前記学習済みモデルは、前記入力情報に対応するデータが入力されると、前記新たなターゲット系列に対応するデータを出力する、
情報処理装置。 - 前記入力情報は、
前記決められたコンテキスト系列と、
系列における前記決められたコンテキスト系列の位置情報と、
を含む、
請求項3に記載の情報処理装置。 - 前記入力情報は、前記生成部が生成した系列に関する情報を含み、
前記生成部は、前記生成部が生成した系列のターゲット系列とは異なるターゲット系列を前記新たなターゲット系列として含む系列を生成する、
請求項3に記載の情報処理装置。 - 前記入力情報は、前記生成部が生成した複数の系列のうちの少なくとも1つの系列を指定する情報を含み、
前記生成部は、指定された系列のターゲット系列とは異なるターゲット系列を前記新たなターゲット系列として含む系列を生成する、
請求項3に記載の情報処理装置。 - 前記入力情報は、前記生成部が生成した複数の系列のうちの2つの系列を指定する情報を含み、
前記生成部は、指定された2つの系列のターゲット系列どうしの間の特徴を有するターゲット系列を前記新たなターゲット系列として含む系列を生成する、
請求項3に記載の情報処理装置。 - 前記入力情報は、系列の特徴を指定する情報を含み、
前記生成部は、指定された特徴を有する系列を生成する、
請求項3に記載の情報処理装置。 - 前記学習済みモデルに入力されるデータは、前記決められたコンテキスト系列のトークンを含み、
前記学習済みモデルが出力するデータは、前記新たなターゲット系列のトークンを含む、
請求項3に記載の情報処理装置。 - 前記学習済みモデルに入力されるデータは、前記決められたコンテキスト系列のトークン及び所定のトークンを含み、
前記学習済みモデルが出力するデータは、前記新たなターゲット系列のトークンを含む、
請求項3に記載の情報処理装置。 - 前記系列が与える一連の情報は、時刻ごとの音の音高値を示す音楽情報であり、
前記トークンは、前記音の音高値及び音の発生期間の少なくとも一方を示す、
請求項9に記載の情報処理装置。 - 一部がターゲット系列で構成され残部がコンテキスト系列で構成され一連の情報を与える系列に関する情報である入力情報と、学習済みモデルとを用いて、決められたコンテキスト系列及び新たなターゲット系列を含む系列を生成する生成部と、
前記入力情報を受け付け、また、前記生成部の生成結果を提示するユーザインタフェースと、
を備え、
前記学習済みモデルは、前記入力情報に対応するデータが入力されると、前記新たなターゲット系列に対応するデータを出力する、
情報処理装置。 - 前記ユーザインタフェースは、
前記決められたコンテキスト系列と、
系列における前記決められたコンテキスト系列の位置情報と、
を、前記入力情報として受け付ける、
請求項12に記載の情報処理装置。 - 前記ユーザインタフェースは、前記生成部が生成した系列に関する情報を、前記入力情報として受け付け、
前記生成部は、前記生成部が生成した系列のターゲット系列とは異なるターゲット系列を前記新たなターゲット系列として含む系列を生成する、
請求項12に記載の情報処理装置。 - 前記ユーザインタフェースは、前記生成部が生成した複数の系列のうちの少なくとも1つの系列を指定する情報を、前記入力情報として受け付け、
前記生成部は、指定された系列のターゲット系列とは異なるターゲット系列を前記新たなターゲット系列として含む系列を生成する、
請求項12に記載の情報処理装置。 - 前記ユーザインタフェースは、前記生成部が生成した複数の系列のうちの2つの系列を指定する情報を、前記入力情報として受け付け、
前記生成部は、指定された2つの系列のターゲット系列どうしの間の特徴を有するターゲット系列を前記新たなターゲット系列として含む系列を生成する、
請求項12に記載の情報処理装置。 - 前記ユーザインタフェースは、系列の特徴を指定する情報を、前記入力情報として受け付け、
前記生成部は、指定された特徴を有する系列を生成する、
請求項12に記載の情報処理装置。 - 前記学習済みモデルに入力されるデータは、前記決められたコンテキスト系列のトークンを含み、
前記学習済みモデルが出力するデータは、前記新たなターゲット系列のトークンを含む、
請求項12に記載の情報処理装置。 - 前記学習済みモデルに入力されるデータは、前記決められたコンテキスト系列のトークン及び所定のトークンを含み、
前記学習済みモデルが出力するデータは、前記新たなターゲット系列のトークンを含む、
請求項12に記載の情報処理装置。 - 前記系列が与える一連の情報は、時刻ごとの音の音高値を示す音楽情報であり、
前記トークンは、前記音の音高値及び音の発生期間の少なくとも一方を示す、
請求項18に記載の情報処理装置。 - 一部がターゲット系列で構成され残部がコンテキスト系列で構成され一連の情報を与える系列に関する情報である入力情報と、学習済みモデルとを用いて、決められたコンテキスト系列及び新たなターゲット系列を含む系列を生成することを含み、
前記学習済みモデルは、前記入力情報に対応するデータが入力されると、前記新たなターゲット系列に対応するデータを出力する、
情報処理方法。 - 一部がターゲット系列で構成され残部がコンテキスト系列で構成され一連の情報を与える系列に関する情報である入力情報と、学習済みモデルとを用いて、決められたコンテキスト系列及び新たなターゲット系列を含む系列を生成すること、
をコンピュータに実行させ、
前記学習済みモデルは、前記入力情報に対応するデータが入力されると、前記新たなターゲット系列に対応するデータを出力する、
情報処理プログラム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21915006.7A EP4270267A4 (en) | 2020-12-28 | 2021-11-18 | DEVICE, METHOD AND PROGRAM FOR INFORMATION PROCESSING |
CN202180087191.8A CN116685987A (zh) | 2020-12-28 | 2021-11-18 | 信息处理装置、信息处理方法和信息处理程序 |
US18/256,639 US20240095500A1 (en) | 2020-12-28 | 2021-11-18 | Information processing apparatus, information processing method, and information processing program |
JP2022572935A JPWO2022145145A1 (ja) | 2020-12-28 | 2021-11-18 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020-219553 | 2020-12-28 | ||
JP2020219553 | 2020-12-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022145145A1 true WO2022145145A1 (ja) | 2022-07-07 |
Family
ID=82260389
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/042384 WO2022145145A1 (ja) | 2020-12-28 | 2021-11-18 | 情報処理装置、情報処理方法及び情報処理プログラム |
Country Status (5)
Country | Link |
---|---|
US (1) | US20240095500A1 (ja) |
EP (1) | EP4270267A4 (ja) |
JP (1) | JPWO2022145145A1 (ja) |
CN (1) | CN116685987A (ja) |
WO (1) | WO2022145145A1 (ja) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020080239A1 (ja) | 2018-10-19 | 2020-04-23 | ソニー株式会社 | 情報処理方法、情報処理装置及び情報処理プログラム |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10068556B2 (en) * | 2015-11-18 | 2018-09-04 | Pandora Media, Inc. | Procedurally generating background music for sponsored audio |
JP6747489B2 (ja) * | 2018-11-06 | 2020-08-26 | ヤマハ株式会社 | 情報処理方法、情報処理システムおよびプログラム |
-
2021
- 2021-11-18 CN CN202180087191.8A patent/CN116685987A/zh active Pending
- 2021-11-18 US US18/256,639 patent/US20240095500A1/en active Pending
- 2021-11-18 JP JP2022572935A patent/JPWO2022145145A1/ja active Pending
- 2021-11-18 WO PCT/JP2021/042384 patent/WO2022145145A1/ja active Application Filing
- 2021-11-18 EP EP21915006.7A patent/EP4270267A4/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020080239A1 (ja) | 2018-10-19 | 2020-04-23 | ソニー株式会社 | 情報処理方法、情報処理装置及び情報処理プログラム |
Non-Patent Citations (8)
Title |
---|
ADAM ROBERTS; JESSE ENGEL; COLIN RAFFEL; CURTIS HAWTHORNE; DOUGLAS ECK: "A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music", ARXIV.ORG, 13 March 2018 (2018-03-13), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081561200 * |
KE CHEN; CHENG-I WANG; TAYLOR BERG-KIRKPATRICK; SHLOMO DUBNOV: "Music SketchNet: Controllable Music Generation via Factorized Representations of Pitch and Rhythm", ARXIV.ORG, 4 August 2020 (2020-08-04), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081731260 * |
MANA SASAKI : "Investigation of melody editing method by automatic interpolation of melody", IPSJ SIG TECHNICAL REPORT, vol. 2018-MUS-119, no. 24, 9 June 2018 (2018-06-09), JP , pages 1 - 4, XP009538047, ISSN: 2188-8752 * |
NAKAMURA KOSUKE, NOSE TAKASHI, CHIBA YUYA, ITO AKINORI: "A Symbol-level Melody Completion Based on a Convolutional Neural Network with Generative Adversarial Learning", JOURNAL OF INFORMATION PROCESSING, vol. 28, 15 April 2020 (2020-04-15), pages 248 - 257, XP055948852, DOI: 10.2197/ipsjjip.28.248 * |
NAKAMURA KOSUKE: "A Study on Melody Completion Based on Convolutional Neural Networks and Adversarial Learning", IPSJ SIG TECHNICAL REPORT, vol. 2018-MUS-120, no. 12, 14 August 2018 (2018-08-14), pages 1 - 6, XP055948867 * |
See also references of EP4270267A4 |
UEMURA AIKO, KITAHARA TETSURO: "Preliminary Study on Morphing of Chord Progression", IPSJ SIG TECHNICAL REPORT, vol. 2018-SLP-122, no. 20, 9 June 2018 (2018-06-09), pages 1 - 5, XP055948837 * |
UEMURA AIKO, KITAHARA TETSURO: "Preliminary Study on Morphing of Chord Progression", PROCEEDINGS OF THE 3RD CONFERENCE ON COMPUTER SIMULATION OF MUSICAL CREATIVITY (CSMC 2018), 22 August 2018 (2018-08-22), pages 1 - 8, XP055948832 * |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022145145A1 (ja) | 2022-07-07 |
EP4270267A1 (en) | 2023-11-01 |
US20240095500A1 (en) | 2024-03-21 |
EP4270267A4 (en) | 2024-06-19 |
CN116685987A (zh) | 2023-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11030984B2 (en) | Method of scoring digital media objects using musical experience descriptors to indicate what, where and when musical events should appear in pieces of digital music automatically composed and generated by an automated music composition and generation system | |
US10854180B2 (en) | Method of and system for controlling the qualities of musical energy embodied in and expressed by digital music to be automatically composed and generated by an automated music composition and generation engine | |
US10068557B1 (en) | Generating music with deep neural networks | |
US9208821B2 (en) | Method and system to process digital audio data | |
US9190042B2 (en) | Systems and methods for musical sonification and visualization of data | |
JP5363355B2 (ja) | スタイル要素を用いた画面表示の選択した表示領域をコピーアンドペーストする方法、システム及びプログラム | |
US9082381B2 (en) | Method, system, and computer program for enabling flexible sound composition utilities | |
EP1962241A1 (en) | Content search device, content search system, server device for content search system, content searching method, and computer program and content output apparatus with search function | |
US20150082228A1 (en) | System and method for direct manipulation of a triangular distribution of information using a graphical user interface | |
US20230237980A1 (en) | Hands-on artificial intelligence education service | |
EP2524363A1 (en) | Interactive music notation layout and editing system | |
WO2022145145A1 (ja) | 情報処理装置、情報処理方法及び情報処理プログラム | |
WO2022264461A1 (ja) | 情報処理システム及び情報処理方法 | |
WO2021225008A1 (ja) | 情報処理方法、情報処理装置及び情報処理プログラム | |
US9293124B2 (en) | Tempo-adaptive pattern velocity synthesis | |
Kim-Boyle | Real-time score generation for extensible open forms | |
Schankler et al. | Improvising with digital auto-scaffolding: how mimi changes and enhances the creative process | |
CN105164747A (zh) | 经由链接对乐音设置信息进行设置和编辑 | |
Stoller et al. | Intuitive and efficient computer-aided music rearrangement with optimised processing of audio transitions | |
WO2024042962A1 (ja) | 情報処理装置、情報処理方法及び情報処理プログラム | |
Cheung et al. | An interactive automatic violin fingering recommendation interface | |
US20240144901A1 (en) | Systems and Methods for Sending, Receiving and Manipulating Digital Elements | |
US20230135118A1 (en) | Information processing device, information processing method, and program | |
EP4421658A1 (en) | Information processing device, information processing method, and program | |
US20240296022A1 (en) | Metadata-driven visualization library integration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21915006 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022572935 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18256639 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180087191.8 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2021915006 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2021915006 Country of ref document: EP Effective date: 20230728 |