WO2023170757A1 - Reproduction control method, information processing method, reproduction control system, and program - Google Patents

Reproduction control method, information processing method, reproduction control system, and program Download PDF

Info

Publication number
WO2023170757A1
WO2023170757A1 PCT/JP2022/009776 JP2022009776W WO2023170757A1 WO 2023170757 A1 WO2023170757 A1 WO 2023170757A1 JP 2022009776 W JP2022009776 W JP 2022009776W WO 2023170757 A1 WO2023170757 A1 WO 2023170757A1
Authority
WO
WIPO (PCT)
Prior art keywords
playback
performance
control
information
cost
Prior art date
Application number
PCT/JP2022/009776
Other languages
French (fr)
Japanese (ja)
Inventor
陽 前澤
Original Assignee
ヤマハ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ヤマハ株式会社 filed Critical ヤマハ株式会社
Priority to PCT/JP2022/009776 priority Critical patent/WO2023170757A1/en
Publication of WO2023170757A1 publication Critical patent/WO2023170757A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10GREPRESENTATION OF MUSIC; RECORDING MUSIC IN NOTATION FORM; ACCESSORIES FOR MUSIC OR MUSICAL INSTRUMENTS NOT OTHERWISE PROVIDED FOR, e.g. SUPPORTS
    • G10G3/00Recording music in notation form, e.g. recording the mechanical operation of a musical instrument
    • G10G3/04Recording music in notation form, e.g. recording the mechanical operation of a musical instrument using electrical means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments

Definitions

  • the present disclosure relates to technology for controlling audio or video playback.
  • Non-Patent Document 1 discloses a technique for estimating performance positions and performance speeds by integrating information on performances by a plurality of performers, and controlling playback of music according to the estimation results.
  • Non-Patent Document 2 discloses a configuration in which the reproduction of music is synchronized with the performance of a specific performer selected from a plurality of performers.
  • one aspect of the present disclosure aims to appropriately control the reproduction of a reproduction part according to a performance by a performer.
  • a playback control method performs model predictive control using a predictive model that predicts performance information including a performance position in a song for at least one performer. Control information is generated for at least one playback part of the song, and playback of the playback part of the song is controlled using the control information generated for the at least one playback part.
  • a playback control system performs model predictive control using a predictive model that predicts performance information including a performance position in a song for at least one performer, for at least one playback part of the song.
  • the present invention includes a predictive control section that generates control information, and a playback control section that controls playback of the playback part in the music piece based on the control information generated for the at least one playback part.
  • a program provides control information for at least one playback part of a song by model predictive control using a prediction model that predicts performance information including a performance position in a song for at least one performer.
  • the computer system is made to function as a predictive control unit that generates a prediction control unit that generates the playback part, and a playback control unit that controls playback of the playback part in the music piece using the control information generated for the at least one playback part.
  • An information processing method performs model predictive control using a predictive model that predicts performance information including a performance position in a music piece for at least one performer, for at least one playback part of the music piece. generating control information, controlling movement of the skeleton and joints represented by the motion data according to the control information, and generating a virtual demonstrator in a virtual space in a posture corresponding to the controlled skeleton and joints; An image of the virtual space captured by a virtual camera whose position and direction are controlled according to the behavior of the user's head is displayed on a display device.
  • FIG. 1 is a block diagram illustrating the configuration of a performance system.
  • FIG. 2 is a block diagram illustrating the functional configuration of a playback control system.
  • FIG. 2 is a block diagram illustrating the configuration of a predictive control unit. It is a schematic diagram of state cost and control cost. It is a graph showing the relationship between weight value and feedback gain. It is a flowchart of control processing. It is a schematic diagram of the setting screen in 2nd Embodiment. It is a schematic diagram of the setting screen in 2nd Embodiment.
  • FIG. 7 is an explanatory diagram of state variables and state costs in the third embodiment. It is an explanatory diagram of control information and control cost in a 3rd embodiment.
  • FIG. 1 is a block diagram illustrating the configuration of a performance system 100 according to a first embodiment.
  • a single performer plays a specific part (hereinafter referred to as a "performance part") out of a plurality of parts of a specific song (hereinafter referred to as a "target song”).
  • the performance parts are, for example, one or more parts that constitute the melody of the target song.
  • the performance system 100 controls the reproduction of parts other than the performance part (hereinafter referred to as "reproduction part") among the plurality of parts of the target music piece.
  • the reproduction parts are, for example, one or more parts that constitute the accompaniment of the target music piece.
  • the performance system 100 includes a playback control system 10 and a keyboard instrument 20.
  • the reproduction control system 10 and the keyboard instrument 20 are interconnected, for example, by wire or wirelessly.
  • the keyboard instrument 20 is an electronic musical instrument equipped with a plurality of keys corresponding to different pitches.
  • the performer plays the performance part by sequentially operating each key of the keyboard instrument 20.
  • the keyboard instrument 20 reproduces musical tones of pitches played by a player.
  • the keyboard instrument 20 supplies performance data E representing the performance to the reproduction control system 10 in parallel with the reproduction of musical tones according to the performance by the player.
  • the performance data E specifies the pitch and key depression intensity corresponding to the key operated by the player. That is, the performance data E is data representing a time series of notes played by the performer.
  • the performance data E is, for example, event data compliant with the MIDI (Musical Instrument Digital Interface) standard. Note that the instrument played by the player is not limited to the keyboard instrument 20.
  • the playback control system 10 includes a control device 11, a storage device 12, a display device 13, an operating device 14, and a sound emitting device 15.
  • the reproduction control system 10 is realized by a portable information device such as a smartphone or a tablet terminal, or a portable or stationary information device such as a personal computer. Note that the reproduction control system 10 is realized not only as a single device but also as a plurality of devices configured separately from each other. Furthermore, the playback control system 10 may be installed in the keyboard instrument 20. The entire performance system 100 including the playback control system 10 and the keyboard instrument 20 may be interpreted as a "playback control system.”
  • the control device 11 is one or more processors that control each element of the playback control system 10. Specifically, for example, CPU (Central Processing Unit), GPU (Graphics Processing Unit), SPU (Sound Processing Unit), DSP (Digital Signal Processor), FPGA (Field Programmable Gate Array), or ASIC (Application Specific Integrated Circuit).
  • the control device 11 is composed of one or more types of processors such as the following.
  • the storage device 12 is one or more memories that store programs executed by the control device 11 and various data used by the control device 11.
  • a known recording medium such as a semiconductor recording medium and a magnetic recording medium, or a combination of multiple types of recording media is used as the storage device 12.
  • a portable recording medium that can be attached to and detached from the playback control system 10 or a recording medium that can be accessed by the control device 11 via a communication network (for example, cloud storage) is used as the storage device 12.
  • a communication network for example, cloud storage
  • the storage device 12 stores music data D and audio signals Z.
  • the music data D is data that specifies the time series of a plurality of notes constituting the target music. That is, the music data D is data representing the musical score of the target music.
  • the music data D includes first musical score data D1 and second musical score data D2.
  • the first musical score data D1 specifies the note string of the performance part of the target musical piece.
  • the second musical score data D2 specifies the note string of the reproduction part of the target music piece.
  • the music data D (D1, D2) is, for example, a file in a format compliant with the MIDI (Musical Instrument Digital Interface) standard.
  • the acoustic signal Z is a time domain signal representing the waveform of the musical tone (ie, accompaniment tone) of the reproduction part.
  • the display device 13 displays various images.
  • the display device 13 is configured with a display panel such as a liquid crystal panel or an organic EL (Electroluminescence) panel.
  • the operating device 14 accepts operations by the user.
  • the operating device 14 is a plurality of operating elements operated by a user, or a touch panel configured integrally with the display surface of the display device 13.
  • the user who operates the operating device 14 is, for example, a performer of a performance part or an operator other than the performer.
  • the sound emitting device 15 reproduces sound under the control of the control device 11.
  • the sound emitting device 15 reproduces the musical tone of the reproduction part represented by the acoustic signal Z.
  • the sound emitting device 15 is, for example, a speaker or headphones.
  • a sound emitting device 15 that is separate from the playback control system 10 may be connected to the playback control system 10 by wire or wirelessly. Note that illustration of a D/A converter that converts the audio signal Z from digital to analog and an amplifier that amplifies the audio signal Z are omitted for convenience.
  • FIG. 2 is a block diagram illustrating the functional configuration of the playback control system 10.
  • the control device 11 executes a program stored in the storage device 12 to provide a plurality of functions (predictive control unit 30 and playback control Part 40) is realized.
  • the predictive control unit 30 generates control information U[t] using the performance data E and the music data D. Control information U[t] is generated at different times t on the time axis. That is, the predictive control unit 30 generates a time series of control information U[t].
  • the control information U[t] is data in an arbitrary format for controlling the reproduction of the reproduction part.
  • the control information U[t] is a two-dimensional vector including a playback position u1[t] and a playback speed u2[t].
  • the playback position u1[t] is the position (point on the time axis) at which the playback part should be played at time t. Specifically, the playback position u1[t] is based on the playback position (hereinafter referred to as "reference position") at each time t when the playback part is played back at a predetermined speed (hereinafter referred to as "reference speed"). It is a relative position. That is, the playback position u1[t] is expressed as a difference (amount of change) from the reference position.
  • the playback speed u2[t] is the speed at which the playback part should be played back at time t. Specifically, the playback speed u2[t] is a relative speed with respect to the reference speed. That is, the playback speed u2[t] is expressed as a difference (amount of change) from the reference speed.
  • the playback control unit 40 controls the playback of the musical tone of the playback part according to the control information U[t]. Specifically, the playback control unit 40 controls the playback of the musical tone of the playback part by the sound emitting device 15.
  • the reproduction control unit 40 generates reproduction information P[t] from the control information U[t], and causes the sound emitting device 15 to reproduce the reproduction part according to the reproduction information P[t]. Specifically, the reproduction control unit 40 outputs a sample sequence of a portion of the acoustic signal Z corresponding to the reproduction information P[t] to the sound emitting device 15.
  • the reproduction information P[t] is information representing the actual reproduction of the reproduction part by the sound emitting device 15.
  • the playback information P[t] is a two-dimensional vector including a playback position p1[t] and a playback speed p2[t].
  • the reproduction position p1[t] is the position (point on the time axis) of the reproduction part to be reproduced at time t.
  • the playback position p1[t] is a position based on the starting point of the target music piece.
  • the playback speed p2[t] is the speed at which the playback part should be played back at time t.
  • the playback speed p2[t] is a speed with the stop of playback as a reference value (zero).
  • the predictive control unit 30 generates the control information U[t] using the performance data E corresponding to the performance by the performer, and the playback control unit 40 controls the playback of the playback part by the sound emitting device 15. Control is performed according to control information U[t].
  • the predictive control unit 30 of the first embodiment generates control information U[t] so that the reproduction of the reproduction part by the sound emitting device 15 follows the performance of the performance part by the performer.
  • Model Predictive Control (MPC) is used to generate the control information U[t].
  • FIG. 3 is a block diagram illustrating a specific configuration of the predictive control unit 30.
  • the prediction control section 30 includes a performance prediction section 31 , an information generation section 32 , an arithmetic processing section 33 , and a variable setting section 34 .
  • the performance prediction unit 31 predicts the performance information S[t] using the prediction model.
  • Performance information S[t] is predicted for each time t on the time axis. That is, the performance prediction unit 31 generates a time series of performance information S[t].
  • the performance information S[t] is information predicted from the performance of the performance part by the performer (that is, the performance data E). Specifically, the performance information S[t] is a two-dimensional vector including a performance position s1[t] and a performance speed s2[t].
  • the prediction model is a mathematical model for predicting performance information S[t].
  • the performance position s1[t] is the position (point on the time axis) where the performer is predicted to perform at time t in the performance part.
  • the performance position s1[t] is a position based on the starting point of the target music piece.
  • the performance speed s2[t] is the predicted performance speed at time t.
  • the performance speed s2[t] is a speed with the stop of the performance as a reference value (zero).
  • the performance prediction section 31 includes an analysis section 311 and a prediction section 312.
  • the analysis unit 311 estimates the performance time t[k] and the performance position s[k] by analyzing the performance data E (k is a natural number). Each time the performer plays each note of the performance part, the performance time t[k] and the performance position s[k] are estimated.
  • the performance time t[k] is the time when the k-th note among the plurality of notes of the performance part is played.
  • the performance position s[k] is the position of the k-th note among the plurality of notes of the performance part.
  • a known performance analysis technique score alignment technique
  • the analysis unit 311 uses a statistical estimation model such as a deep neural network (DNN) or a hidden Markov model (HMM) to calculate the performance time t[k] and the performance position s[ k] may be estimated.
  • DNN deep neural network
  • HMM hidden Markov model
  • the prediction unit 312 generates performance information S[t] for a time t after (that is, in the future) the performance time t[k].
  • the prediction model is used by the prediction unit 312 to predict the performance information S[t].
  • the prediction model is, for example, a state space model that assumes that the performance by the performer progresses at a constant speed. Specifically, it is assumed that the performance progresses at a constant speed during the intervals between successive notes.
  • the state variable ⁇ [k] in the state space model is expressed by the following equation (1).
  • the symbol ⁇ [k] in Equation (1) is a noise component (eg, white noise).
  • the covariance of the noise component ⁇ [k] is calculated from the performance tendency of the performer.
  • the probability that the performance position s[k] occurs under the observed condition of the state variable ⁇ [k] follows a normal distribution with a predetermined variance.
  • the performance information S[t] can be predicted as shown in Equation (2) below.
  • the prediction unit 312 may calculate the performance information S[t] by calculating the following formulas (3a) and (3b).
  • the symbol dt in Equations (3a) and (3b) is a predetermined time length.
  • the symbol ⁇ (s1[t]) means the performance speed at the performance position s1[t] of the performance information S[t].
  • the performance speed ⁇ (s1[t]) is calculated in advance using, for example, the performance speed at which the performer played the performance part in the past.
  • the expected value of the past performance speed of the performance part of the target music piece is calculated as the performance speed ⁇ (s1[t]).
  • a statistical estimation model such as a deep neural network or a hidden Markov model may be made to learn the relationship between musical scores played by the performer in the past and the performance speed ⁇ (s1[t]) in the performance.
  • the prediction unit 312 generates the performance speed ⁇ (s1[t]) by processing the performance data E using a statistical estimation model.
  • the information generation unit 32 in FIG. 3 generates control information U[t] from performance information S[t].
  • the control information U[t] is set so that the reproduction of the reproduction part by the sound emitting device 15 (control information U[t]) follows the performance of the performance part by the performer (performance information S[t]). generated.
  • control law an arithmetic expression
  • LQG Linear-Quadratic-Gaussian
  • the state variable X[t] is a variable that represents the error between the performance information S[t] and the playback information P[t]. That is, the state variable X[t] represents the error between the performance of the performance part by the performer and the reproduction part by the sound emitting device 15.
  • the state variable X[t] in the first embodiment is a two-dimensional vector including a position error x1[t] and a speed error x2[t].
  • control information U[t] includes a playback position u1[t] based on the reference position and a playback speed u2[t] based on the reference speed.
  • a state transition expressed by the following equation (5) can be assumed. Note that matrix B in Equation (5) is a quadratic unit matrix.
  • the symbol Q[s1] in Equation (6) is the cost (hereinafter referred to as "state cost") regarding the state variable X[t] at each performance position s1[t] of the target music piece.
  • the state variable X[t] means the error between the performance information S[t] and the playback information P[t]. Therefore, the state cost Q[s1] means the cost for the error between the performance information S[t] and the playback information P[t] at the performance position s1[t] of the target music piece. That is, the state cost Q[s1] is a cost for the fact that the reproduction of the reproduction part does not follow the performance of the performance part. As understood from Equation (6), the state cost Q[s1] is a quadratic square matrix.
  • control cost means the cost for the playback position u1[t] and the playback speed u2[t].
  • the playback position u1[t] means the amount of change in the playback position p1[t] with respect to the reference position
  • the playback speed u2[t] means the amount of change in the playback speed p2[t] with respect to the reference speed. do.
  • control cost R[p1] is expressed as a cost related to temporal changes in the playback position p1[t] and the playback speed p2[t] represented by the playback information P[t]. That is, the control cost R[p1] is a cost for a change in the reproduction information P[t]. As understood from Equation (6), the control cost R[p1] is a quadratic square matrix.
  • cost (objective function) J includes state variable X[t], control information U[t], state cost Q[s1], and control cost R[p1].
  • the symbol O in formula (7d) is a zero matrix. That is, the matrix Y[t] is a matrix that becomes a zero matrix at time ⁇ .
  • the symbol L[t] in Equation (7a) is a feedback gain for the state variable X[t], and is expressed by a quadratic square matrix. As understood from equation (7a), the control information U[t] may assume linear feedback with respect to the state variable X[t]. Further, the feedback gain L[t] does not depend on either the control information U[t] or the state variable X[t]. On the other hand, the feedback gain L[t] depends on the state cost Q[s1] and the control cost R[p1].
  • the information generation unit 32 in FIG. 3 uses a mathematical formula ( The control information U[t] of the playback part is calculated by the calculations 7a) to (7d). That is, the control information U[t] is calculated so that the cost J in Equation (6) is reduced.
  • the model predictive control by the predictive control unit 30 in FIG. includes an optimization process for generating control information U[t] suitable from the viewpoint of reducing cost J.
  • the arithmetic processing unit 33 in FIG. 3 generates a state cost Q[s1] and a control cost R[p1] that are applied to the generation of control information U[t].
  • the generation of the state cost Q[s1] and the control cost R[p1] will be described in detail below.
  • FIG. 4 is a schematic diagram of the state cost Q[s1] and the control cost R[p1].
  • state cost Q[s1] and control cost R[p1] when the target music piece is expressed by the musical score shown in FIG. 4 are illustrated.
  • the numerical value of the element in the first row and first column of the state cost Q[s1] and the numerical value of the element in the first row and first column of the control cost R[p1] are illustrated for convenience. ing.
  • the state cost Q[s1] is expressed by the following formula (8).
  • the symbol ⁇ in Equation (8) is a small value for stabilizing each value of the state cost Q[s1].
  • the symbol I means a quadratic unit matrix.
  • Equation (8) represents a time series (hereinafter referred to as "pulse train”) Hq of a plurality of pulses q corresponding to different sound generation positions s'.
  • Pulse train a time series (hereinafter referred to as "pulse train") Hq of a plurality of pulses q corresponding to different sound generation positions s'.
  • Each pulse q is centered at a time point a time ⁇ after the sound generation position s'. Note that since the variable ⁇ is a small numerical value, in FIG.
  • Equation (8) is a variable that determines the maximum value of the pulse q corresponding to the sounding position s'.
  • the symbol ⁇ is a variable that determines the pulse width of each pulse q.
  • the function value of the pulse train Hq corresponding to the performance position s1[t] corresponds to the state cost Q[s1].
  • the symbol Cq[s1] in Equation (8) is a weight value for weighting the state cost Q[s1]. That is, the larger the weight value Cq[s1], the more the influence of the state cost Q[s1] on the feedback gain L[t] increases.
  • the calculation processing unit 33 specifies each sounding position s' by analyzing the first musical score data D1, and calculates the state cost Q[s1] by executing the calculation of formula (8).
  • the feedback gain L[t] is set so that the performance of the performance part by the performer and the reproduction of the reproduction part by the sound emitting device 15 are sufficiently similar.
  • a difference between performance information S[t] and reproduction information P[t] is allowed.
  • the control cost R[p1] is expressed by the following formula (9).
  • the symbol ⁇ in Equation (9) is a small value for stabilizing each value of the control cost R[p1].
  • the symbol I means a quadratic unit matrix.
  • Equation (9) is a set of positions at which each note of the playback part is to be played. That is, the set Gr includes the position (hereinafter referred to as "sounding position") p' of the starting point of each note specified by the second musical score data D2 of the music data D.
  • Equation (9) represents a time series (hereinafter referred to as "pulse train") Hr of a plurality of pulses r corresponding to different sound generation positions p'.
  • Each pulse r is set to have a shape that gradually increases from a point in time before the sound generation position p' and sharply decreases after the sound generation position p' has passed.
  • the symbol ⁇ (p) is a window function representing one pulse r, and is expressed, for example, by the following equation (10).
  • the coefficient c1 and the coefficient c2 in Equation (10) are predetermined positive numbers.
  • the function value of the pulse train Hr corresponding to the reproduction position p1[t] corresponds to the control cost R[p1].
  • the symbol Cr[p1] in Equation (9) is a weight value for weighting the control cost R[p1]. That is, the larger the weight value Cr[p1], the greater the influence of the control cost R[p1] on the feedback gain L[t].
  • the calculation processing unit 33 specifies each sounding position p' by analyzing the second musical score data D2, and calculates the control cost R[p1] by executing the calculation of formula (9).
  • the feedback gain L[t] is set so that the larger the control cost R[p1] at each position p1, the more the control information U[t] is sufficiently reduced.
  • Ru In the vicinity of each sounding position p' of the performance part, the reference position and the playback position p1[t] are sufficiently approximated (the playback position u1[t] is sufficiently reduced), and the reference speed and the playback position p1[t] A sufficient approximation (sufficient reduction of the playback speed u2[t]) is required.
  • the feedback gain L[t] is set so that the reproduction of the reproduction part by the sound emitting device 15 sufficiently approximates the note sequence represented by the second musical score data D2.
  • changes in the reproduction information P[t] are allowed at reproduction positions p1[t] that are sufficiently distant from each sound generation position p'.
  • the variable setting unit 34 in FIG. 3 sets variables that are applied to the generation of control information U[t]. Specifically, the variable setting unit 34 sets each variable ( ⁇ , ⁇ s', ⁇ , ⁇ , Cq[s1]) included in formula (8) and each variable ( ⁇ , Cr [p1], c1, c2). For example, the variable setting unit 34 sets each variable included in formula (8) or formula (9) to a numerical value stored in the storage device 12. As described above, the variable setting unit 34 of the first embodiment sets one or more variables included in the cost J of formula (6). The arithmetic processing unit 33 calculates the state cost Q[s1] and the control cost R[p1] by calculation using the variables set by the variable setting unit 34.
  • FIG. 5 is a graph showing the relationship between the weighted value Cq[s1], the weighted value Cr[p1], and the feedback gain L[t]. Note that in FIG. 5, the numerical value of the element in the first row and first column of the feedback gain L[t] is illustrated for convenience. As understood from equation (7a), there is a tendency that the larger the feedback gain L[t] is, the more strongly the reproduction information P[t] of the reproduction part is corrected. For example, the larger the feedback gain L[t] is, the more the playback of the playback part is corrected to approximate the performance information S[t] of the playback part.
  • the target music piece is divided into a section ⁇ 1, a section ⁇ 2, and a section ⁇ 3 on the time axis.
  • the interval ⁇ 1 is an interval in which the performance part is sounded and the reproduction part is kept silent.
  • the interval ⁇ 2 is an interval in which the playback part is sounded and the performance part is kept silent.
  • the interval ⁇ 3 is an interval in which both the performance part and the playback part are sounded.
  • the graph V1 in FIG. 5 is the feedback gain L[t] when the weighted value Cq[s1] and the weighted value Cr[p1] are set to equal values (case 1).
  • the feedback gain L[t] is set to a large value near the sounding position s' of the performance part within the interval ⁇ 1. That is, the reproduction of the reproduction part is strongly corrected so that the error between the performance information S[t] and the reproduction information P[t] is sufficiently reduced.
  • the feedback gain L[t] is maintained at a sufficiently small value within the interval ⁇ 2. That is, the reproduction of the reproduction part is hardly corrected in the interval ⁇ 2.
  • the feedback gain L[t] is maintained at a large value near the sound generation position s', although it is not as large as within the interval ⁇ 1. That is, in the section ⁇ 3 in which both the performance part and the playback part are sounded, the playback of the playback part is strongly corrected, although not as much as in the section ⁇ 1.
  • Graph V2 in FIG. 5 is the feedback gain L[t] when the weighted value Cq[s1] is sufficiently smaller than the weighted value Cr[p1] (case 2). Specifically, the weight value Cq[s1] was set to 0.1, and the weight value Cr[p1] was set to 1.0. In case 2, the feedback gain L[t] is maintained at a small value overall. That is, the correction of the error between the performance information S[t] and the playback information P[t] is suppressed compared to Case 1.
  • Graph V3 in FIG. 5 is the feedback gain L[t] when the weighted value Cq[s1] is sufficiently larger than the weighted value Cr[p1] (case 3). Specifically, the weight value Cq[s1] was set to 1.0, and the weight value Cr[p1] was set to 0.1. In case 3, the feedback gain L[t] is set to a large value in the vicinity of the sound generation position s' of the performance part, regardless of whether or not the playback part is generating sound. That is, the reproduction of the reproduced part is strongly corrected so that the error between the performance information S[t] and the reproduced information P[t] is sufficiently reduced, regardless of whether or not the reproduced part is sounded.
  • the reproduction behavior of the reproduction part with respect to the performance part changes according to the weight value Cq[s1] and the weight value Cr[p1].
  • the relationship between the performance part and the playback part changes depending on the magnitude relationship between the weighted value Cq[s1] and the weighted value Cr[p1].
  • FIG. 6 is a flowchart of the process (hereinafter referred to as "control process") executed by the control device 11. The control process is repeated at predetermined intervals.
  • control device 11 (analysis unit 311) estimates the performance time t[k] and the performance position s[k] by analyzing the performance data E (Sa1). Further, the control device 11 (prediction unit 312) generates performance information S[t] for time t after performance time t[k] by using the prediction model (Sa2: prediction process).
  • the control device 11 sets variables applied to the generation of control information U[t] (Sa3).
  • the control device 11 (arithmetic processing unit 33) generates a state cost Q[s1] and a control cost R[p1] (Sa4). Specifically, the control device 11 generates the state cost Q[s1] by analyzing the first musical score data D1, and generates the control cost R[p1] by analyzing the second musical score data D2.
  • the variables set in step Sa3 are applied to generate the state cost Q[s1] and the control cost R[p1].
  • the control device 11 calculates the formula (6) by calculating the formulas (7a) to (7d) applying the state variable X[t], the state cost Q[s1], and the control cost R[p1]. ) is calculated so that the cost J is reduced (Sa5: optimization process).
  • Sa2 prediction processing
  • Sa5 optimization processing
  • the control device 11 controls the playback of the playback part by the sound emitting device 15 according to the control information U[t] (Sa6). Specifically, the control device 11 generates reproduction information P[t] from the control information U[t], and reproduces a portion of the acoustic signal Z corresponding to the reproduction information P[t] on the sound emitting device 15.
  • control information U[t] since model predictive control is used to generate the control information U[t], it is possible to appropriately control the reproduction of the reproduction part according to the performance by the performer.
  • the control information U[t] is generated so that the cost J including the state variable X[t] representing the error between the performance information S[t] and the playback information P[t] is reduced. be done. Therefore, it is possible to link the reproduction of the reproduction part to the performance by the performer.
  • the cost J includes a state cost Q[s1] related to the state variable X[t] and a control cost R[p1] related to temporal changes in the reproduction information P[t].
  • state cost Q[s1] the error (state variable X(t)) between the performance information S[t] and the reproduction information P[t] is effectively reduced.
  • control cost R[p1] excessive changes in the reproduction information P[t] are suppressed. Therefore, the error between the performance information S[t] and the reproduction information P[t] and the excessive change in the reproduction information P[t] can be effectively reduced.
  • the second embodiment differs from the first embodiment in the operation of the variable setting unit 34.
  • the configuration and operation of elements other than the variable setting section 34 are the same as in the first embodiment. Therefore, the second embodiment also achieves the same effects as the first embodiment.
  • the variable setting unit 34 of the first embodiment sets the variable applied to the generation of the control information U[t] to a numerical value stored in advance in the storage device 12.
  • the variable setting unit 34 of the second embodiment sets variables to be applied to the generation of the control information U[t] in response to a user's instruction to the operating device 14 (Sa3). Specifically, the variable setting unit 34 sets each variable ( ⁇ , ⁇ s', ⁇ , ⁇ , Cq[s1]) in equation (8) and each variable ( ⁇ , Cr[p1], c1 , c2) are set variably according to instructions from the user.
  • the calculation processing unit 33 calculates the state cost Q[s1] and the control cost R[p1] by calculation using the variables set by the variable setting unit 34 (Sa4).
  • the variable related to the cost J in Equation (6) is set according to an instruction from the user, so that the user's intention can be reflected in the reproduction of the reproduction part.
  • variable setting unit 34 of the second embodiment sets the weighted value Cq[s1] of Equation (8) and the weighted value Cr[p1] of Equation (9).
  • the weight value Cq[s1] is an example of a "first weight value”
  • the weight value Cr[p1] is an example of a "second weight value”.
  • FIG. 7 is a schematic diagram of the setting screen 141 for the user to change the weight value Cq[s1] and the weight value Cr[p1].
  • the variable setting unit 34 displays a setting screen 141 on the display device 13.
  • the setting screen 141 includes the musical score 142 of the target song represented by the song data D.
  • the musical score 142 includes a musical score 143 of the performance part represented by the first musical score data D1, and a musical score 144 of the reproduction part represented by the second musical score data D2.
  • the user can specify an arbitrary section (hereinafter referred to as a "set section") 145 within the musical score 142 by operating the operating device 14.
  • the variable setting unit 34 accepts the designation of a setting section 145 by the user. Note that a plurality of setting sections 145 may be specified in the musical score 142.
  • the variable setting unit 34 accepts selections by the user.
  • the variable setting unit 34 displays the changed image 146 in FIG. 7 on the display device 13.
  • the modified image 146 includes the current value (synchrony) of the weight value Cq[s1].
  • the user can instruct increase (Increase synchrony) or decrease (Decrease synchrony) of weight value Cq[s1] by operating on change image 146.
  • the variable setting unit 34 changes the weight value Cq[s1] within the setting section 145 in response to an instruction from the user.
  • the variable setting unit 34 sets a weight value Cq[s1] for each setting section 145 specified by the user. Note that the variable setting unit 34 may set the weight value Cq[s1] to a numerical value directly specified by the user.
  • variable setting unit 34 displays the changed image 147 in FIG. 8 on the display device 13.
  • the modified image 147 includes the current value (rigidity) of the weight value Cr[p1].
  • the user can instruct increase (Increase rigidity) or decrease (Decrease rigidity) of weight value Cr[p1] by operating on change image 147.
  • the variable setting unit 34 changes the weight value Cr[p1] within the setting section 145 in response to an instruction from the user.
  • the variable setting unit 34 sets a weight value Cr[p1] for each setting section 145 specified by the user. Note that the variable setting unit 34 may set the weight value Cr[p1] to a numerical value directly designated by the user.
  • the arithmetic processing unit 33 of the second embodiment generates the state cost Q[s1] and the control cost R[p1] according to the weighted value Cq[s1] and the weighted value Cr[p1] set by the variable setting unit 34. (Sa4). Specifically, the arithmetic processing unit 33 calculates the performance position s1[t] within the set section 145 of the target music according to formula (8) by applying the weighted value Cq[s1] of the set section 145. Calculate the state cost Q[s1].
  • the calculation processing unit 33 calculates the control cost R by calculating the playback position p1[t] within the set section 145 of the target music according to formula (9) to which the weighted value Cr[p1] of the set section 145 is applied. Calculate [p1]. For sections other than the set section 145 of the target song, the weight value Cq[s1] and the weight value Cr[p1] are set to predetermined initial values.
  • the reproduction behavior of the reproduction part with respect to the performance part changes according to the weight value Cq[s1] and the weight value Cr[p1].
  • the relationship between the performance part and the playback part can be changed according to the setting of the weight value Cq[s1] and the weight value Cr[p1] by the variable setting section 34.
  • each of the weight value Cq[s1] and the weight value Cr[p1] is set according to instructions from the user. Therefore, the user can change the relationship between the performance parts and the playback parts.
  • the music data D of the third embodiment includes N pieces of first score data D1 and M pieces of second score data D2.
  • the N pieces of first musical score data D1 correspond to different performance parts of the target musical piece.
  • the M pieces of second musical score data D2 correspond to different reproduction parts of the target musical piece.
  • the storage device 12 also stores M audio signals Z corresponding to different reproduction parts.
  • the acoustic signal Z of each reproduction part represents the waveform of the musical tone of the reproduction part.
  • the performance prediction unit 31 predicts performance information S[t] for each of the N performance parts using a prediction model. That is, performance information S[t] is predicted for each of the N performers.
  • the process of predicting performance information S[t] is the same as in the first embodiment.
  • Performance information S[t] of each performance part is predicted from the performance of the performance part (ie, performance data E).
  • the performance prediction unit 31 may predict the performance information S[t] of each performance part using a separate prediction model for each performance part, or may use a prediction model common to N performance parts. The performance information S[t] of each performance part may be predicted by doing so.
  • FIG. 9 is an explanatory diagram of the state variable X[t] and state cost Q[s1] in the third embodiment.
  • the state variable X[t] includes state variables Xn,m[t] for all combinations of selecting one of the N performance parts and one of the M reproduction parts.
  • the state variable Xn,m[t] corresponds to the state variable X[t] of the first embodiment.
  • the state variable Xn,m[t] is a two-dimensional vector representing the error between the performance information S[t] of the nth performance part and the reproduction information P[t] of the mth performance part. . That is, the state variable Xn,m[t] represents the error between the performance of the n-th performance part and the m-th reproduction part by the sound emitting device 15.
  • the state cost Q[s1] is a block diagonal matrix whose diagonal components are N ⁇ M submatrices Qn,m[s1]. Elements of the state cost Q[s1] other than the submatrix Qn,m[s1] are set to zero. Specifically, the state cost Q[s1] includes submatrices Qn,m[s1] for all combinations of selecting one of the N performance parts and one of the M reproduction parts. The submatrix Qn,m[s1] corresponds to the state cost Q[s1] of the first embodiment.
  • the submatrix Qn,m[s1] is composed of the performance information S[t] at the performance position s1[t] of the nth performance part and the reproduction information P[t] of the mth performance part. is the cost for the error.
  • the arithmetic processing unit 33 calculates the submatrix Qn,m[s1] using Equation (8) similarly to the state cost Q[s1] in the first embodiment.
  • the set Gq of formula (8) applied to the calculation of the submatrix Qn,m[s1] is the sounding position s' of each note specified by the first score data D1 of the n-th performance part of the music data D. It is.
  • the variable setting unit 34 of the third embodiment individually sets the weight value Cq[s1] of formula (8) for each submatrix Qn,m[s1].
  • the storage device 12 stores a plurality of different setting data. N ⁇ M weight values Cq[s1] corresponding to different combinations of performance parts and playback parts are registered in each of the plurality of setting data. The numerical value of each weight value Cq[s1] differs for each setting data.
  • the variable setting unit 34 selects any one of the plurality of setting data according to a user's instruction to the operating device 14. Selection of setting data corresponds to setting of weight value Cq[s1] corresponding to each submatrix Qn,m[s1].
  • the calculation processing unit 33 calculates the submatrix Qn,m[s1] by calculating the formula (8) applying each weight value Cq[s1] registered in the setting data. As understood from the above description, the weight value Cq[s1] applied to the generation of each submatrix Qn,m[s1] is changed according to instructions from the user. Note that the variable setting unit 34 may individually set each of the N ⁇ M weight values Cq[s1] according to instructions from the user.
  • FIG. 10 is an explanatory diagram of control information U[t] and control cost R[p1] in the third embodiment.
  • the control information U[t] of the third embodiment includes M pieces of control information U1[t] to UM[t] corresponding to different playback parts of the target song.
  • the control information Um[t] corresponds to the control information U[t] of the first embodiment. Therefore, the control information Um[t] is a two-dimensional vector including the playback position u1[t] and the playback speed u2[t].
  • the playback control unit 40 controls the playback of the m-th playback part by the sound emitting device 15 according to the control information Um[t].
  • the playback control unit 40 generates playback information Pm[t] from the control information Um[t], and outputs the m-th playback part to the sound emitting device 15 according to the playback information Pm[t]. Let it play. That is, the reproduction control unit 40 causes the sound emitting device 15 to reproduce a portion of the audio signal Z of the m-th reproduction part that corresponds to the reproduction information Pm[t]. Therefore, the musical tones of M reproduction parts of the target music piece are reproduced in parallel.
  • the control cost R[p1] is a block diagonal matrix whose diagonal components are M submatrices R1[p1] to RM[p1]. Elements of the control cost R[p1] other than the submatrix Rm[p1] are set to zero.
  • the submatrix Rm[p1] corresponds to the control cost R[p1] of the first embodiment. Specifically, it is a cost related to a change in the reproduction information Pm[t] at the reproduction position p1[t] of the m-th reproduction part.
  • the arithmetic processing unit 33 calculates the submatrix Rm[p1] using Equation (9) similarly to the control cost R[p1] of the first embodiment.
  • the set Gr of formula (9) to which the calculation of the submatrix Rm[p1] is applied is the pronunciation position p' of each note specified by the second score data D2 of the m-th playback part of the music data D. .
  • the variable setting unit 34 of the third embodiment individually sets the weight value Cr[p1] of formula (9) for each submatrix Rm[p1].
  • the storage device 12 stores a plurality of different setting data.
  • M weight values Cr[p1] corresponding to different playback parts are registered in each of the plurality of setting data.
  • the numerical value of each weight value Cr[p1] differs for each setting data.
  • the variable setting unit 34 selects any one of the plurality of setting data according to a user's instruction to the operating device 14. Selection of setting data corresponds to setting of weight value Cr[p1] corresponding to each submatrix Rm[p1].
  • the arithmetic processing unit 33 calculates the submatrix Rm[p1] by calculating the formula (8) applying each weight value Cr[p1] registered in the setting data. As understood from the above description, the weight value Cr[p1] applied to the generation of each submatrix Rm[p1] is changed in accordance with instructions from the user. Note that the variable setting unit 34 may individually set each of the M weight values Cr[p1] according to instructions from the user.
  • the information generation unit 32 calculates the following equations (7a) to (7d) using the state variable X[t], state cost Q[s1], and control cost R[p1]. Calculate the control information U[t] of the playback part (Sa5). That is, the information generation unit 32 generates control information Um (U1[t] to UM[t]) for each of the M reproduction parts. Therefore, the third embodiment also achieves the same effects as the first embodiment.
  • the total number N of performance parts and the total number M of reproduction parts are generalized.
  • performance information S[t] is predicted for each of the plurality of performers (performance parts). Therefore, the reproduction of the reproduction parts can be appropriately controlled according to the performances by a plurality of performers.
  • control information Um[t] is generated for each of the plurality of reproduction parts. Therefore, the reproduction of each of the plurality of reproduction parts can be controlled according to the performance by the performer.
  • both the total number N of performance parts and the total number M of reproduction parts are 2 or more, the reproduction of each of the plurality of reproduction parts can be controlled in accordance with the performances by the plurality of performers.
  • N ⁇ M weight values Cq[s1] corresponding to different combinations of performance parts and playback parts are controlled.
  • M weight values Cr[p1] corresponding to different playback parts are controlled.
  • the relationship between the reproduction of the playback part and the performance of the performance part depends on the weight value Cq[s1] and the weight value Cr[p1]. Therefore, the relationship between each of the N performance parts and each of the M playback parts can be controlled in detail according to the weighting value Cq[s1] and the weighting value Cr[p1]. That is, the degree to which the reproduction of the reproduction part is linked to the performance of the performance part can be individually controlled for each combination of each performance part and each reproduction part. For example, various types of control can be realized, such as strongly linking the playback of a specific playback part with the performance of a specific performance part, while hardly linking the playback of other playback parts with the performance of the performance part.
  • the prediction control unit 30 uses a prediction model that predicts the performance information S[t] including the performance position s1[t] in the target music piece for at least one performer.
  • Control information U[t] is generated for at least one playback part of the target song using model predictive control.
  • the acoustic signal Z stored in the storage device 12 is used to reproduce the reproduction part, but the method of reproducing the musical tone of the reproduction part is not limited to the above examples.
  • the audio signal Z may be generated by the reproduction control section 40 sequentially supplying the second musical score data D2 to the sound source section.
  • the second musical score data D2 is supplied to the sound source section in parallel with the performance of the performance part by the performer.
  • the playback control section 40 functions as a sequencer that processes the second musical score data D2.
  • the sound source section is a hardware sound source or a software sound source.
  • the reproduction control unit 40 controls the timing of supplying the second musical score data D2 to the sound source unit according to the control information U[t].
  • the musical tone of the reproduction part is reproduced by the sound emitting device 15, but the reproduction part is not limited to the above examples.
  • the playback control unit 40 may cause an electronic musical instrument capable of automatic performance to play the musical tone of the playback part. That is, the playback control unit 40 causes the electronic musical instrument to automatically perform the playback part by controlling the electronic musical instrument according to the control information U[t].
  • the playback control unit 40 may, for example, control the playback of a video related to the playback part (hereinafter referred to as "target video").
  • the target video is a video that shows a specific performer playing the playback part of the target song.
  • the target video may be a captured video of a real performer playing the playback part on an instrument, or a composite video generated by image processing of a virtual performer playing the playback part. Ru. Note that it does not matter whether or not there is sound in the target video.
  • Video data representing the target video is stored in the storage device 12.
  • the playback control unit 40 displays the target moving image on the display device 13 by outputting the moving image data.
  • the playback control unit 40 controls playback of the target video according to the control information U[t].
  • the playback control unit 40 generates playback information P[t] from the control information U[t], and displays a portion of the target video that corresponds to the playback information P[t] on the display device 13. . That is, the playback position p1[t] and playback speed p2[t] of the target moving image are controlled in conjunction with the performance of the performance part by the performer.
  • the virtual performer (hereinafter referred to as "virtual performer”) represented by the target video is, for example, an avatar existing in the virtual space.
  • the playback control unit 40 displays on the display device 13 a virtual performer and a background image photographed by a virtual camera in the virtual space.
  • the display device 13 may be installed in an HMD (Head Mounted Display) that is worn on the user's head.
  • HMD Head Mounted Display
  • the position and direction of the virtual camera in the virtual space are dynamically controlled according to the behavior (eg, position and direction) of the user's head. Therefore, by moving their head appropriately, the user can visually recognize the virtual performer from any position and direction in the virtual space.
  • the video data for displaying the virtual performer in the virtual space includes, for example, motion data representing the movements of the skeleton and joints of the virtual performer.
  • the motion data specifies, for example, changes in relative angle and position over time for each of the skeleton and joints.
  • the reproduction control unit 40 controls movement of the skeleton and joints represented by the motion data according to control information U[t] (or reproduction information P[t]).
  • the playback control unit 40 generates a virtual performer in a posture specified by the motion data as an object in the virtual space.
  • the virtual performer in the virtual space is controlled to have a posture corresponding to the skeleton and joints specified by the portion of the motion data that corresponds to the playback information P[t].
  • the reproduction control unit 40 changes the speed of movement of the skeleton and joints specified by the motion data according to the control information U[t]. Therefore, the performance by the virtual performer in the virtual space progresses in conjunction with the performance by the performer in the real space. For example, image processing such as modeling and texturing is used to generate a three-dimensional virtual performer. Then, the playback control unit 40 generates a planar image (target moving image) of the virtual performer in the virtual space captured by the virtual camera, through image processing such as rendering, for example. As mentioned above, the position and direction of the virtual camera change depending on the behavior of the user's head. The playback control unit 40 displays the target video generated by the above processing on the display device 13.
  • image processing such as modeling and texturing
  • the user can view the virtual performer playing the playback part from any position and direction in the virtual space.
  • a performer wearing an HMD can check from any position and direction in the virtual space how a virtual performer is playing a playback part in conjunction with the performer's performance of the playback part.
  • a virtual performer who plays the playback part is displayed, but for example, a virtual dancer who dances in conjunction with the progress of the playback part may be displayed on the display device 13.
  • Virtual performers and virtual dancers are collectively represented as virtual performers.
  • the display device 13 is attached to the user's head. person may be displayed.
  • the playback control unit 40 is comprehensively expressed as an element that controls the playback of the playback part.
  • "Reproduction of the reproduction part” includes reproduction of the musical tone of the reproduction part and reproduction of the moving image (target moving image) of the reproduction part.
  • the display device 13 and the sound emitting device 15 are playback devices that play back the playback part.
  • the first to third embodiments it is possible to control the reproduction of musical tones related to the reproduction part according to the performance by the performer.
  • this modification it is possible to control the reproduction of the moving image related to the reproduction part in accordance with the performance by the performer.
  • the performance data E representing the performance by the performer is supplied to the playback control system 10, but the input information corresponding to the performance by the performer is limited to the performance data E.
  • a signal representing the waveform of a musical tone played by a performer (hereinafter referred to as a "performance signal”) may be supplied to the playback control system 10 instead of the performance data E.
  • the performance signal is a signal generated by collecting musical tones produced by a musical instrument during a performance by a performer using a microphone.
  • the performance prediction unit 31 generates performance information S[t] by analyzing the performance signal. For example, the analysis unit 311 estimates the performance time t[k] and the performance position s[k] by analyzing the performance signal.
  • the prediction unit 312 generates performance information S[t] using a prediction model, as in the first embodiment. The above configuration also achieves the same effects as those of the above-described embodiments.
  • the state space model is exemplified as the prediction model used to predict the performance information S[t], but the form of the prediction model is not limited to the above examples.
  • a statistical model such as a deep neural network or a hidden Markov model may be used as a predictive model.
  • the performance information S[t] includes the performance position s1[t] and the performance speed s2[t], but the format of the performance information S[t] is as follows. Not limited to examples. For example, the performance speed s2[t] may be omitted. That is, the performance information S[t] is comprehensively expressed as information including the performance position s1[t].
  • the reproduction information P[t] is not limited to information including the reproduction position p1[t] and the reproduction speed p2[t]. For example, the playback speed p2[t] may be omitted. That is, the playback information P[t] is comprehensively expressed as information including the playback position p1[t].
  • the formats of the state variable X[t] and the control information U[t] are not limited to the examples in each of the above-mentioned forms.
  • the speed error x2[t] may be omitted from the state variable X[t].
  • the playback speed u2[t] may be omitted from the control information U[t].
  • the predictive control unit 30 generates the control information U[t] by model predictive control using one predictive model, but a plurality of different predictive models are selected. It may be used for The prediction control unit 30 generates control information U[t] for one or more playback parts of the target song using any one of the plurality of prediction models.
  • a prediction model is prepared for each performer.
  • Each performer's prediction model is a state space model that reflects the performance tendency of the performer.
  • the predictive control unit 30 generates control information U[t] for one or more playback parts of the target song by using a prediction model corresponding to the performer of the performance part from among the plurality of prediction models.
  • a prediction model may be prepared for each set of a plurality of performers (for example, for each orchestra).
  • a prediction model may be prepared for each attribute of the target song, for example.
  • the attributes of the target song are, for example, the music genre of the target song (for example, rock, pop, jazz, trance, hip-hop, etc.) or the musical impression (for example, "a song with a bright impression", "a song with a dark impression", etc.).
  • the prediction control unit 30 generates control information U[t] for one or more playback parts of the target song by using a prediction model corresponding to the attribute of the target song among the plurality of prediction models.
  • the reproduction of the reproduction part can be controlled in various ways according to the selection conditions of the prediction model (for example, performer or attribute).
  • the playback control system 10 may be realized by a server device that communicates with a terminal device such as a mobile phone or a smartphone.
  • the predictive control unit 30 of the playback control system 10 generates the control information U[t] by processing the performance data E (or performance signal) received from the terminal device.
  • the music data D stored in the storage device 12 of the playback control system 10 or the music data D transmitted from the terminal device is used to generate the control information U[t].
  • the playback control unit 40 transmits a portion of the audio signal Z (or video data of the target video) that corresponds to the control information U[t] to the terminal device. Note that in a configuration in which the playback control unit 40 is installed in a terminal device, the control information U[t] may be transmitted from the playback control system 10 to the terminal device.
  • the functions of the playback control system 10 are realized through cooperation between one or more processors forming the control device 11 and the program stored in the storage device 12. .
  • the programs exemplified above may be provided in a form stored in a computer-readable recording medium and installed on a computer.
  • the recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but any known recording medium such as a semiconductor recording medium or a magnetic recording medium is used. Also included are recording media in the form of.
  • the non-transitory recording medium includes any recording medium excluding transitory, propagating signals, and does not exclude volatile recording media.
  • a recording medium that stores a program in the distribution device corresponds to the above-mentioned non-transitory recording medium.
  • a playback control method includes a method for controlling at least one of the pieces of music through model predictive control using a prediction model that predicts performance information including a performance position in the piece of music for at least one performer.
  • control information is generated for the playback part, and the playback of the playback part in the music piece is controlled by the control information generated for the at least one playback part.
  • model predictive control is used to generate the control information, it is possible to appropriately control the reproduction of the reproduction part according to the performance by the performer.
  • Performance information is data in any format including the performance position.
  • the performance information includes a performance position and a performance speed.
  • the performance position is the position in the song where the performer is playing.
  • the performance speed is the speed (tempo) at which the performer plays the music.
  • control information is data in any format for controlling reproduction of a reproduction part.
  • the control information includes the amount of change in playback position and the amount of change in playback speed.
  • the information and processing used to predict performance information are arbitrary. For example, it is assumed that performance data representing a performance by a performer or a performance signal representing a waveform of a musical tone played by a performer is used for predicting performance information. Further, for example, a video of a user playing a performance may be used for predicting performance information.
  • Various prediction models are used to predict performance information. As the prediction model, for example, a state space model such as a Kalman filter is used.
  • the “playback part” is a music part that is to be controlled by the control information among the plurality of music parts that make up the song.
  • “Reproduction of a playback part” includes not only playback of audio (for example, automatic performance) related to the playback part, but also playback of video related to the playback part.
  • the at least one performer is a plurality of performers, and in predicting the performance information, the performance information is predicted for each of the plurality of performers. According to the above aspect, it is possible to appropriately control the reproduction of the reproduction part according to performances by a plurality of performers.
  • the at least one reproduction part is a plurality of reproduction parts, and in generating the control information, the control information is generated for each of the plurality of reproduction parts. do.
  • the reproduction of each of the plurality of reproduction parts can be controlled according to the performance by the performer.
  • the playback control includes controlling the playback of musical tones related to the at least one playback part of the music piece. According to the above aspect, it is possible to control the reproduction of musical tones related to the reproduction part according to the performance by the performer.
  • the playback control includes controlling the playback of a video related to the at least one playback part of the song.
  • the moving image is, for example, a moving image in which a virtual performer (for example, a performer or a dancer) in a virtual space performs a reproduction part.
  • any one of aspects 1 to 5 (aspect 6), in the model predictive control, playback including performance information predicted for the at least one performer and a playback position of the at least one playback part.
  • Generating control information for the at least one playback part such that a cost including a state variable representing an error in the information and the at least one playback part is reduced.
  • the control information is generated so that the cost including the state variable representing the error between the performance information and the playback information is reduced. Therefore, it is possible to link the reproduction of the reproduction part to the performance by the performer.
  • “Reproduction information” is data in any format including the reproduction position.
  • the playback information includes a playback position and a playback speed.
  • the playback position is the position within the song where the song is being played.
  • the playback speed is the speed at which the song is played.
  • At least one variable included in the cost is further set in accordance with an instruction from the user. According to the above aspect, since the variable related to the cost is set according to the instruction from the user, the user's intention can be reflected in the reproduction of the reproduction part.
  • variables of a cost are various variables that are applied to calculations related to the cost. Specifically, in a form where the objective variable includes a state cost and a control cost, the first weight value for the state cost and the second weight value for the control cost are set as "variables" according to instructions from the user. is set.
  • the cost includes the state variable and the control information, a state cost and a control cost
  • the state cost is a cost related to the state variable
  • the control cost is a cost related to temporal changes in the reproduction information.
  • the costs include state costs related to state variables and control costs related to temporal changes in reproduction information.
  • a first weight value and a second weight value are further set, the state cost is a cost weighted by the first weight value, and the control cost is a cost weighted by the first weight value, and the control cost is a cost weighted by the first weight value. This is the cost weighted by the second weight value.
  • the state cost is weighted by the first weight value, and the control cost is weighted by the second weight value. Therefore, the relationship between the performance by the performer and the reproduction of the reproduction part can be changed according to the settings of the first weight value and the second weight value.
  • aspect 10 in setting the first weight value and the second weight value, the first weight value and the second weight value are changed according to instructions from the user.
  • each of the first weight value and the second weight value is set according to an instruction from the user. Therefore, the user can change the relationship between the performance by the performer and the reproduction of the reproduction part.
  • the performance information in predicting the performance information, is predicted by using any of a plurality of prediction models for predicting the performance information. Predict performance information about.
  • the playback of the playback part can be controlled in various ways according to the selection conditions of the prediction model.
  • a playback control system performs model prediction control using a prediction model that predicts performance information including a performance position in a song for at least one performer. a predictive control unit that generates control information for the playback part; and a playback control unit that controls playback of the playback part in the song based on the control information generated for the at least one playback part.
  • a program according to one aspect (aspect 13) of the present disclosure performs playback of at least one piece of music through model predictive control using a prediction model that predicts performance information including a performance position in a piece of music for at least one performer.
  • the computer system is caused to function as a predictive control unit that generates control information for the part, and a playback control unit that controls playback of the playback part in the music piece using the control information generated for the at least one playback part.
  • An information processing method provides a model predictive control using a predictive model that predicts performance information including a performance position in a song for at least one performer. generates control information for the playback part, controls the movement of the skeleton and joints represented by the motion data according to the control information, and moves a virtual demonstrator in a posture corresponding to the controlled skeleton and joints in a virtual space.
  • An image of the virtual space captured by a virtual camera whose position and direction are controlled according to the behavior of the user's head is displayed on the display device.
  • Performance system 10
  • Playback control system 11
  • Control device 12
  • Storage device 13
  • Display device 14
  • Operating device 15
  • Sound emitting device 20
  • Keyboard instrument 30
  • Prediction control unit 31
  • Performance prediction section 311
  • Analysis section 312
  • Prediction section 32
  • Information generation section 33
  • Arithmetic processing section 34
  • Variable setting section 40... Playback control section.

Abstract

This reproduction control system is provided with: a prediction control unit that generates control information pertaining to at least one reproduced part of a musical composition by using model prediction control which uses a prediction model for predicting performance information including the performance position, within the musical composition, of at least one musician; and a reproduction control unit that controls the reproduction of the reproduced part of the musical composition by using the control information generated pertaining to the at least one reproduced part.

Description

再生制御方法、情報処理方法、再生制御システムおよびプログラムReproduction control method, information processing method, reproduction control system and program
 本開示は、音響または映像の再生を制御する技術に関する。 The present disclosure relates to technology for controlling audio or video playback.
 例えば楽曲の再生を演奏者による演奏に同期させる技術が従来から提案されている。複数の演奏者による演奏に並行して楽曲を再生する場面が想定される。非特許文献1には、複数の演奏者による演奏の情報を統合することで演奏位置および演奏速度を推定し、推定結果に応じて楽曲の再生を制御する技術が開示されている。また、非特許文献2には、複数の演奏者から選択された特定の演奏者の演奏に楽曲の再生を同期させる構成が開示されている。 For example, techniques for synchronizing the reproduction of music with the performance by a performer have been proposed in the past. A situation is assumed in which a music piece is played in parallel with performances by a plurality of performers. Non-Patent Document 1 discloses a technique for estimating performance positions and performance speeds by integrating information on performances by a plurality of performers, and controlling playback of music according to the estimation results. Furthermore, Non-Patent Document 2 discloses a configuration in which the reproduction of music is synchronized with the performance of a specific performer selected from a plurality of performers.
 以上に例示した従前の制御技術のもとでは、演奏者による楽器の演奏に対して楽曲の再生を適切に追従させることは実際には困難であり、さらなる改善の余地がある。以上の事情を考慮して、本開示のひとつの態様は、演奏者による演奏に応じて再生パートの再生を適切に制御することを目的とする。 Under the conventional control techniques exemplified above, it is actually difficult to make the playback of music appropriately follow the performance of a musical instrument by a performer, and there is room for further improvement. In consideration of the above circumstances, one aspect of the present disclosure aims to appropriately control the reproduction of a reproduction part according to a performance by a performer.
 以上の課題を解決するために、本開示のひとつの態様に係る再生制御方法は、少なくとも1の演奏者について楽曲内の演奏位置を含む演奏情報を予測する予測モデルを利用したモデル予測制御により、前記楽曲の少なくとも1の再生パートについて制御情報を生成し、前記少なくとも1の再生パートについて生成された制御情報により、前記楽曲における当該再生パートの再生を制御する。 In order to solve the above problems, a playback control method according to one aspect of the present disclosure performs model predictive control using a predictive model that predicts performance information including a performance position in a song for at least one performer. Control information is generated for at least one playback part of the song, and playback of the playback part of the song is controlled using the control information generated for the at least one playback part.
 本開示のひとつの態様に係る再生制御システムは、少なくとも1の演奏者について楽曲内の演奏位置を含む演奏情報を予測する予測モデルを利用したモデル予測制御により、前記楽曲の少なくとも1の再生パートについて制御情報を生成する予測制御部と、前記少なくとも1の再生パートについて生成された制御情報により、前記楽曲における当該再生パートの再生を制御する再生制御部とを具備する。 A playback control system according to one aspect of the present disclosure performs model predictive control using a predictive model that predicts performance information including a performance position in a song for at least one performer, for at least one playback part of the song. The present invention includes a predictive control section that generates control information, and a playback control section that controls playback of the playback part in the music piece based on the control information generated for the at least one playback part.
 本開示のひとつの態様に係るプログラムは、少なくとも1の演奏者について楽曲内の演奏位置を含む演奏情報を予測する予測モデルを利用したモデル予測制御により、前記楽曲の少なくとも1の再生パートについて制御情報を生成する予測制御部、および、前記少なくとも1の再生パートについて生成された制御情報により、前記楽曲における当該再生パートの再生を制御する再生制御部、としてコンピュータシステムを機能させる。 A program according to one aspect of the present disclosure provides control information for at least one playback part of a song by model predictive control using a prediction model that predicts performance information including a performance position in a song for at least one performer. The computer system is made to function as a predictive control unit that generates a prediction control unit that generates the playback part, and a playback control unit that controls playback of the playback part in the music piece using the control information generated for the at least one playback part.
 本開示のひとつの態様に係る情報処理方法は、少なくとも1の演奏者について楽曲内の演奏位置を含む演奏情報を予測する予測モデルを利用したモデル予測制御により、前記楽曲の少なくとも1の再生パートについて制御情報を生成し、動作データが表す骨格および関節の移動を前記制御情報に応じて制御し、前記制御された骨格および関節に対応する姿勢の仮想的な実演者を仮想空間内に生成し、利用者の頭部の挙動に応じて位置および方向が制御される仮想カメラにより前記仮想空間を撮像した画像を表示装置に表示する。 An information processing method according to one aspect of the present disclosure performs model predictive control using a predictive model that predicts performance information including a performance position in a music piece for at least one performer, for at least one playback part of the music piece. generating control information, controlling movement of the skeleton and joints represented by the motion data according to the control information, and generating a virtual demonstrator in a virtual space in a posture corresponding to the controlled skeleton and joints; An image of the virtual space captured by a virtual camera whose position and direction are controlled according to the behavior of the user's head is displayed on a display device.
演奏システムの構成を例示するブロック図である。FIG. 1 is a block diagram illustrating the configuration of a performance system. 再生制御システムの機能的な構成を例示するブロック図である。FIG. 2 is a block diagram illustrating the functional configuration of a playback control system. 予測制御部の構成を例示するブロック図である。FIG. 2 is a block diagram illustrating the configuration of a predictive control unit. 状態コストおよび制御コストの模式図である。It is a schematic diagram of state cost and control cost. 加重値とフィードバックゲインとの関係を表すグラフである。It is a graph showing the relationship between weight value and feedback gain. 制御処理のフローチャートである。It is a flowchart of control processing. 第2実施形態における設定画面の模式図である。It is a schematic diagram of the setting screen in 2nd Embodiment. 第2実施形態における設定画面の模式図である。It is a schematic diagram of the setting screen in 2nd Embodiment. 第3実施形態における状態変数および状態コストの説明図である。FIG. 7 is an explanatory diagram of state variables and state costs in the third embodiment. 第3実施形態における制御情報および制御コストの説明図である。It is an explanatory diagram of control information and control cost in a 3rd embodiment.
A:第1実施形態
 図1は、第1実施形態に係る演奏システム100の構成を例示するブロック図である。第1実施形態においては、ひとりの演奏者が、特定の楽曲(以下「目標楽曲」という)の複数のパートのうちの特定のパート(以下「演奏パート」という)を演奏する場合を想定する。演奏パートは、例えば目標楽曲の旋律を構成する1以上のパートである。演奏システム100は、目標楽曲の複数のパートのうち演奏パート以外のパート(以下「再生パート」という)の再生を制御する。再生パートは、例えば目標楽曲の伴奏を構成する1以上のパートである。
A: First Embodiment FIG. 1 is a block diagram illustrating the configuration of a performance system 100 according to a first embodiment. In the first embodiment, a case is assumed in which a single performer plays a specific part (hereinafter referred to as a "performance part") out of a plurality of parts of a specific song (hereinafter referred to as a "target song"). The performance parts are, for example, one or more parts that constitute the melody of the target song. The performance system 100 controls the reproduction of parts other than the performance part (hereinafter referred to as "reproduction part") among the plurality of parts of the target music piece. The reproduction parts are, for example, one or more parts that constitute the accompaniment of the target music piece.
 演奏システム100は、再生制御システム10と鍵盤楽器20とを具備する。再生制御システム10と鍵盤楽器20とは、例えば有線または無線により相互に接続される。 The performance system 100 includes a playback control system 10 and a keyboard instrument 20. The reproduction control system 10 and the keyboard instrument 20 are interconnected, for example, by wire or wirelessly.
 鍵盤楽器20は、相異なる音高に対応する複数の鍵を具備する電子楽器である。演奏者は、鍵盤楽器20の各鍵を順次に操作することで演奏パートを演奏する。鍵盤楽器20は、演奏者が演奏した音高の楽音を再生する。また、鍵盤楽器20は、演奏者による演奏に応じた楽音の再生に並行して、当該演奏を表す演奏データEを再生制御システム10に供給する。演奏データEは、演奏者が操作した鍵に対応する音高と押鍵の強度とを指定する。すなわち、演奏データEは、演奏者が演奏した音符の時系列を表すデータである。演奏データEは、例えばMIDI(Musical Instrument Digital Interface)規格に準拠したイベントデータである。なお、演奏者が演奏する楽器は鍵盤楽器20に限定されない。 The keyboard instrument 20 is an electronic musical instrument equipped with a plurality of keys corresponding to different pitches. The performer plays the performance part by sequentially operating each key of the keyboard instrument 20. The keyboard instrument 20 reproduces musical tones of pitches played by a player. In addition, the keyboard instrument 20 supplies performance data E representing the performance to the reproduction control system 10 in parallel with the reproduction of musical tones according to the performance by the player. The performance data E specifies the pitch and key depression intensity corresponding to the key operated by the player. That is, the performance data E is data representing a time series of notes played by the performer. The performance data E is, for example, event data compliant with the MIDI (Musical Instrument Digital Interface) standard. Note that the instrument played by the player is not limited to the keyboard instrument 20.
 再生制御システム10は、制御装置11と記憶装置12と表示装置13と操作装置14と放音装置15とを具備する。再生制御システム10は、例えばスマートフォンまたはタブレット端末等の可搬型の情報装置、またはパーソナルコンピュータ等の可搬型または据置型の情報装置により実現される。なお、再生制御システム10は、単体の装置として実現されるほか、相互に別体で構成された複数の装置でも実現される。また、再生制御システム10は、鍵盤楽器20に搭載されてもよい。再生制御システム10と鍵盤楽器20とを含む演奏システム100の全体を「再生制御システム」と解釈してもよい。 The playback control system 10 includes a control device 11, a storage device 12, a display device 13, an operating device 14, and a sound emitting device 15. The reproduction control system 10 is realized by a portable information device such as a smartphone or a tablet terminal, or a portable or stationary information device such as a personal computer. Note that the reproduction control system 10 is realized not only as a single device but also as a plurality of devices configured separately from each other. Furthermore, the playback control system 10 may be installed in the keyboard instrument 20. The entire performance system 100 including the playback control system 10 and the keyboard instrument 20 may be interpreted as a "playback control system."
 制御装置11は、再生制御システム10の各要素を制御する単数または複数のプロセッサである。具体的には、例えばCPU(Central Processing Unit)、GPU(Graphics Processing Unit)、SPU(Sound Processing Unit)、DSP(Digital Signal Processor)、FPGA(Field Programmable Gate Array)、またはASIC(Application Specific Integrated Circuit)等の1種類以上のプロセッサにより、制御装置11が構成される。 The control device 11 is one or more processors that control each element of the playback control system 10. Specifically, for example, CPU (Central Processing Unit), GPU (Graphics Processing Unit), SPU (Sound Processing Unit), DSP (Digital Signal Processor), FPGA (Field Programmable Gate Array), or ASIC (Application Specific Integrated Circuit). The control device 11 is composed of one or more types of processors such as the following.
 記憶装置12は、制御装置11が実行するプログラムと、制御装置11が使用する各種のデータとを記憶する単数または複数のメモリである。例えば半導体記録媒体および磁気記録媒体等の公知の記録媒体、または複数種の記録媒体の組合せが、記憶装置12として利用される。なお、例えば、再生制御システム10に対して着脱される可搬型の記録媒体、または、制御装置11が通信網を介してアクセス可能な記録媒体(例えばクラウドストレージ)が、記憶装置12として利用されてもよい。 The storage device 12 is one or more memories that store programs executed by the control device 11 and various data used by the control device 11. For example, a known recording medium such as a semiconductor recording medium and a magnetic recording medium, or a combination of multiple types of recording media is used as the storage device 12. Note that, for example, a portable recording medium that can be attached to and detached from the playback control system 10 or a recording medium that can be accessed by the control device 11 via a communication network (for example, cloud storage) is used as the storage device 12. Good too.
 記憶装置12は、楽曲データDと音響信号Zとを記憶する。楽曲データDは、目標楽曲を構成する複数の音符の時系列を指定するデータである。すなわち、楽曲データDは、目標楽曲の楽譜を表すデータである。楽曲データDは、第1楽譜データD1と第2楽譜データD2とを含む。第1楽譜データD1は、目標楽曲の演奏パートの音符列を指定する。第2楽譜データD2は、目標楽曲の再生パートの音符列を指定する。楽曲データD(D1,D2)は、例えばMIDI(Musical Instrument Digital Interface)規格に準拠した形式のファイルである。音響信号Zは、再生パートの楽音(すなわち伴奏音)の波形を表す時間領域の信号である。 The storage device 12 stores music data D and audio signals Z. The music data D is data that specifies the time series of a plurality of notes constituting the target music. That is, the music data D is data representing the musical score of the target music. The music data D includes first musical score data D1 and second musical score data D2. The first musical score data D1 specifies the note string of the performance part of the target musical piece. The second musical score data D2 specifies the note string of the reproduction part of the target music piece. The music data D (D1, D2) is, for example, a file in a format compliant with the MIDI (Musical Instrument Digital Interface) standard. The acoustic signal Z is a time domain signal representing the waveform of the musical tone (ie, accompaniment tone) of the reproduction part.
 表示装置13は、各種の画像を表示する。表示装置13は、例えば液晶パネルまたは有機EL(Electroluminescence)パネル等の表示パネルで構成される。操作装置14は、利用者による操作を受付ける。具体的には、操作装置14は、利用者が操作する複数の操作子、または、表示装置13の表示面と一体に構成されたタッチパネルである。操作装置14を操作する利用者は、例えば演奏パートの演奏者または演奏者以外の操作者である。 The display device 13 displays various images. The display device 13 is configured with a display panel such as a liquid crystal panel or an organic EL (Electroluminescence) panel. The operating device 14 accepts operations by the user. Specifically, the operating device 14 is a plurality of operating elements operated by a user, or a touch panel configured integrally with the display surface of the display device 13. The user who operates the operating device 14 is, for example, a performer of a performance part or an operator other than the performer.
 放音装置15は、制御装置11による制御のもとで音響を再生する。例えば、放音装置15は、音響信号Zが表す再生パートの楽音を再生する。放音装置15は、例えばスピーカまたはヘッドホンである。なお、再生制御システム10とは別体の放音装置15を再生制御システム10に有線または無線により接続してもよい。なお、音響信号Zをデジタルからアナログに変換するD/A変換器、および、音響信号Zを増幅する増幅器の図示は、便宜的に省略されている。 The sound emitting device 15 reproduces sound under the control of the control device 11. For example, the sound emitting device 15 reproduces the musical tone of the reproduction part represented by the acoustic signal Z. The sound emitting device 15 is, for example, a speaker or headphones. Note that a sound emitting device 15 that is separate from the playback control system 10 may be connected to the playback control system 10 by wire or wirelessly. Note that illustration of a D/A converter that converts the audio signal Z from digital to analog and an amplifier that amplifies the audio signal Z are omitted for convenience.
 図2は、再生制御システム10の機能的な構成を例示するブロック図である。制御装置11は、記憶装置12に記憶されたプログラムを実行することで、演奏者による鍵盤楽器20の演奏に追従して音響信号Zを再生するための複数の機能(予測制御部30および再生制御部40)を実現する。 FIG. 2 is a block diagram illustrating the functional configuration of the playback control system 10. The control device 11 executes a program stored in the storage device 12 to provide a plurality of functions (predictive control unit 30 and playback control Part 40) is realized.
 予測制御部30は、演奏データEおよび楽曲データDを利用して制御情報U[t]を生成する。時間軸上の相異なる時刻t毎に制御情報U[t]が生成される。すなわち、予測制御部30は、制御情報U[t]の時系列を生成する。制御情報U[t]は、再生パートの再生を制御するための任意の形式のデータである。制御情報U[t]は、再生位置u1[t]と再生速度u2[t]とを含む2次元ベクトルである。 The predictive control unit 30 generates control information U[t] using the performance data E and the music data D. Control information U[t] is generated at different times t on the time axis. That is, the predictive control unit 30 generates a time series of control information U[t]. The control information U[t] is data in an arbitrary format for controlling the reproduction of the reproduction part. The control information U[t] is a two-dimensional vector including a playback position u1[t] and a playback speed u2[t].
 再生位置u1[t]は、再生パートのうち時刻tにおいて再生すべき位置(時間軸上の地点)である。具体的には、再生位置u1[t]は、所定の速度(以下「基準速度」という)により再生パートを再生した場合の各時刻tにおける再生位置(以下「基準位置」という)を基準とした相対位置である。すなわち、再生位置u1[t]は、基準位置からの差異(変化量)として表現される。 The playback position u1[t] is the position (point on the time axis) at which the playback part should be played at time t. Specifically, the playback position u1[t] is based on the playback position (hereinafter referred to as "reference position") at each time t when the playback part is played back at a predetermined speed (hereinafter referred to as "reference speed"). It is a relative position. That is, the playback position u1[t] is expressed as a difference (amount of change) from the reference position.
 再生速度u2[t]は、時刻tにおいて再生パートを再生すべき速度である。具体的には、再生速度u2[t]は、基準速度を基準とした相対速度である。すなわち、再生速度u2[t]は、基準速度からの差異(変化量)として表現される。 The playback speed u2[t] is the speed at which the playback part should be played back at time t. Specifically, the playback speed u2[t] is a relative speed with respect to the reference speed. That is, the playback speed u2[t] is expressed as a difference (amount of change) from the reference speed.
 再生制御部40は、再生パートの楽音の再生を制御情報U[t]に応じて制御する。具体的には、再生制御部40は、放音装置15による再生パートの楽音の再生を制御する。再生制御部40は、制御情報U[t]から再生情報P[t]を生成し、当該再生情報P[t]に応じて再生パートを放音装置15に再生させる。具体的には、再生制御部40は、音響信号Zのうち再生情報P[t]に対応する部分のサンプル系列を放音装置15に出力する。 The playback control unit 40 controls the playback of the musical tone of the playback part according to the control information U[t]. Specifically, the playback control unit 40 controls the playback of the musical tone of the playback part by the sound emitting device 15. The reproduction control unit 40 generates reproduction information P[t] from the control information U[t], and causes the sound emitting device 15 to reproduce the reproduction part according to the reproduction information P[t]. Specifically, the reproduction control unit 40 outputs a sample sequence of a portion of the acoustic signal Z corresponding to the reproduction information P[t] to the sound emitting device 15.
 再生情報P[t]は、放音装置15による再生パートの実際の再生を表す情報である。具体的には、再生情報P[t]は、再生位置p1[t]と再生速度p2[t]とを含む2次元ベクトルである。再生位置p1[t]は、再生パートのうち時刻tにおいて再生すべき位置(時間軸上の地点)である。再生位置p1[t]は、目標楽曲の始点を基準とした位置である。他方、再生速度p2[t]は、時刻tにおいて再生パートを再生すべき速度である。具体的には、再生速度p2[t]は、再生停止を基準値(ゼロ)とした速度である。 The reproduction information P[t] is information representing the actual reproduction of the reproduction part by the sound emitting device 15. Specifically, the playback information P[t] is a two-dimensional vector including a playback position p1[t] and a playback speed p2[t]. The reproduction position p1[t] is the position (point on the time axis) of the reproduction part to be reproduced at time t. The playback position p1[t] is a position based on the starting point of the target music piece. On the other hand, the playback speed p2[t] is the speed at which the playback part should be played back at time t. Specifically, the playback speed p2[t] is a speed with the stop of playback as a reference value (zero).
 以上の通り、予測制御部30は、演奏者による演奏に応じた演奏データEを利用して制御情報U[t]を生成し、再生制御部40は、放音装置15による再生パートの再生を制御情報U[t]に応じて制御する。第1実施形態の予測制御部30は、放音装置15による再生パートの再生が演奏者による演奏パートの演奏に追従するように、制御情報U[t]を生成する。制御情報U[t]の生成には、モデル予測制御(MPC: Model Predictive Control)が利用される。 As described above, the predictive control unit 30 generates the control information U[t] using the performance data E corresponding to the performance by the performer, and the playback control unit 40 controls the playback of the playback part by the sound emitting device 15. Control is performed according to control information U[t]. The predictive control unit 30 of the first embodiment generates control information U[t] so that the reproduction of the reproduction part by the sound emitting device 15 follows the performance of the performance part by the performer. Model Predictive Control (MPC) is used to generate the control information U[t].
 図3は、予測制御部30の具体的な構成を例示するブロック図である。予測制御部30は、演奏予測部31と情報生成部32と演算処理部33と変数設定部34とを具備する。 FIG. 3 is a block diagram illustrating a specific configuration of the predictive control unit 30. The prediction control section 30 includes a performance prediction section 31 , an information generation section 32 , an arithmetic processing section 33 , and a variable setting section 34 .
 演奏予測部31は、予測モデルを利用して演奏情報S[t]を予測する。演奏情報S[t]は、時間軸上の各時刻tについて予測される。すなわち、演奏予測部31は、演奏情報S[t]の時系列を生成する。演奏情報S[t]は、演奏者による演奏パートの演奏(すなわち演奏データE)から予測される情報である。具体的には、演奏情報S[t]は、演奏位置s1[t]と演奏速度s2[t]とを含む2次元ベクトルである。予測モデルは、演奏情報S[t]を予測するための数理モデルである。 The performance prediction unit 31 predicts the performance information S[t] using the prediction model. Performance information S[t] is predicted for each time t on the time axis. That is, the performance prediction unit 31 generates a time series of performance information S[t]. The performance information S[t] is information predicted from the performance of the performance part by the performer (that is, the performance data E). Specifically, the performance information S[t] is a two-dimensional vector including a performance position s1[t] and a performance speed s2[t]. The prediction model is a mathematical model for predicting performance information S[t].
 演奏位置s1[t]は、演奏パートのうち時刻tにおいて演奏者が演奏すると予測される位置(時間軸上の地点)である。演奏位置s1[t]は、目標楽曲の始点を基準とした位置である。他方、演奏速度s2[t]は、時刻tについて予測される演奏の速度である。演奏速度s2[t]は、演奏停止を基準値(ゼロ)とした速度である。 The performance position s1[t] is the position (point on the time axis) where the performer is predicted to perform at time t in the performance part. The performance position s1[t] is a position based on the starting point of the target music piece. On the other hand, the performance speed s2[t] is the predicted performance speed at time t. The performance speed s2[t] is a speed with the stop of the performance as a reference value (zero).
 演奏予測部31は、解析部311と予測部312とを含む。解析部311は、演奏データEを解析することで演奏時刻t[k]と演奏位置s[k]とを推定する(kは自然数)。演奏者が演奏パートの各音符を演奏するたびに演奏時刻t[k]と演奏位置s[k]とが推定される。演奏時刻t[k]は、演奏パートの複数の音符のうち第k番目の音符が演奏された時刻である。演奏位置s[k]は、演奏パートの複数の音符のうち第k番目の音符の位置である。解析部311による解析には、公知の演奏解析技術(スコアアライメント技術)が任意に採用される。例えば、特開2016-099512号公報に開示された解析技術が、演奏時刻t[k]および演奏位置s[k]の推定に利用される。なお、解析部311は、深層ニューラルネットワーク(DNN:Deep Neural Network)または隠れマルコフモデル(HMM:Hidden Markov Model)等の統計的推定モデルを利用して、演奏時刻t[k]と演奏位置s[k]とを推定してもよい。 The performance prediction section 31 includes an analysis section 311 and a prediction section 312. The analysis unit 311 estimates the performance time t[k] and the performance position s[k] by analyzing the performance data E (k is a natural number). Each time the performer plays each note of the performance part, the performance time t[k] and the performance position s[k] are estimated. The performance time t[k] is the time when the k-th note among the plurality of notes of the performance part is played. The performance position s[k] is the position of the k-th note among the plurality of notes of the performance part. For the analysis by the analysis unit 311, a known performance analysis technique (score alignment technique) is arbitrarily employed. For example, the analysis technique disclosed in Japanese Patent Application Publication No. 2016-099512 is used to estimate the performance time t[k] and the performance position s[k]. The analysis unit 311 uses a statistical estimation model such as a deep neural network (DNN) or a hidden Markov model (HMM) to calculate the performance time t[k] and the performance position s[ k] may be estimated.
 予測部312は、演奏時刻t[k]の後方(すなわち未来)の時刻tについて演奏情報S[t]を生成する。予測部312による演奏情報S[t]の予測に予測モデルが利用される。予測モデルは、例えば、演奏者による演奏が等速で進行すると仮定した状態空間モデルである。具体的には、相前後する各音符の間隔の期間においては演奏が等速で進行すると仮定される。以上の仮定のもとでは、状態空間モデルにおける状態変数Λ[k]は以下の数式(1)で表現される。
Figure JPOXMLDOC01-appb-M000001
 数式(1)の記号ε[k]は雑音成分(例えば白色雑音)である。雑音成分ε[k]の共分散は、演奏者による演奏の傾向から算出される。予測モデルにおいて、状態変数Λ[k]が観測された条件のもとで演奏位置s[k]が発生する確率は、所定の分散の正規分布に従う。以上の前提のもとで、例えばカルマンフィルタ等の演算処理により状態変数Λ[k]を更新することで、以下の数式(2)のように演奏情報S[t]が予測される。
Figure JPOXMLDOC01-appb-M000002
The prediction unit 312 generates performance information S[t] for a time t after (that is, in the future) the performance time t[k]. The prediction model is used by the prediction unit 312 to predict the performance information S[t]. The prediction model is, for example, a state space model that assumes that the performance by the performer progresses at a constant speed. Specifically, it is assumed that the performance progresses at a constant speed during the intervals between successive notes. Under the above assumptions, the state variable Λ[k] in the state space model is expressed by the following equation (1).
Figure JPOXMLDOC01-appb-M000001
The symbol ε[k] in Equation (1) is a noise component (eg, white noise). The covariance of the noise component ε[k] is calculated from the performance tendency of the performer. In the prediction model, the probability that the performance position s[k] occurs under the observed condition of the state variable Λ[k] follows a normal distribution with a predetermined variance. Under the above premise, by updating the state variable Λ[k] through arithmetic processing such as a Kalman filter, the performance information S[t] can be predicted as shown in Equation (2) below.
Figure JPOXMLDOC01-appb-M000002
 なお、以上の説明においては演奏パートの演奏が等速で進行することを仮定したが、演奏者が過去に演奏パートを演奏したときの演奏速度の傾向を考慮して、演奏情報S[t]の予測において演奏速度を変化させてもよい。例えば、予測部312は、以下の数式(3a)および数式(3b)の演算により、演奏情報S[t]を算定してもよい。
Figure JPOXMLDOC01-appb-M000003
 数式(3a)および数式(3b)における記号dtは、所定の時間長である。また、記号τ(s1[t])は、演奏情報S[t]の演奏位置s1[t]における演奏速度を意味する。前述の通り、演奏速度τ(s1[t])は、例えば演奏者が過去に演奏パートを演奏したときの演奏速度を利用して事前に算定される。例えば、目標楽曲の演奏パートに関する過去の演奏速度の期待値を演奏速度τ(s1[t])として算定する形態が想定される。また、演奏者が過去に演奏した楽譜と当該演奏における演奏速度τ(s1[t])との関係を、深層ニューラルネットワークまたは隠れマルコフモデル等の統計的推定モデルに学習させてもよい。予測部312は、演奏データEを統計的推定モデルにより処理することで演奏速度τ(s1[t])を生成する。
Note that in the above explanation, it is assumed that the performance of the performance part progresses at a constant speed, but the performance information S[t] The performance speed may be changed in the prediction. For example, the prediction unit 312 may calculate the performance information S[t] by calculating the following formulas (3a) and (3b).
Figure JPOXMLDOC01-appb-M000003
The symbol dt in Equations (3a) and (3b) is a predetermined time length. Further, the symbol τ(s1[t]) means the performance speed at the performance position s1[t] of the performance information S[t]. As described above, the performance speed τ(s1[t]) is calculated in advance using, for example, the performance speed at which the performer played the performance part in the past. For example, it is assumed that the expected value of the past performance speed of the performance part of the target music piece is calculated as the performance speed τ(s1[t]). Further, a statistical estimation model such as a deep neural network or a hidden Markov model may be made to learn the relationship between musical scores played by the performer in the past and the performance speed τ(s1[t]) in the performance. The prediction unit 312 generates the performance speed τ(s1[t]) by processing the performance data E using a statistical estimation model.
 図3の情報生成部32は、演奏情報S[t]から制御情報U[t]を生成する。前述の通り、放音装置15による再生パートの再生(制御情報U[t])が演奏者による演奏パートの演奏(演奏情報S[t])に追従するように、制御情報U[t]が生成される。以上の前提のもとで制御情報U[t]を生成するための演算式(以下「制御則」という)の導出を、以下に検討する。第1実施形態においては、制御則の導出に、LQG(Linear-Quadratic-Gaussian)制御を利用する。 The information generation unit 32 in FIG. 3 generates control information U[t] from performance information S[t]. As mentioned above, the control information U[t] is set so that the reproduction of the reproduction part by the sound emitting device 15 (control information U[t]) follows the performance of the performance part by the performer (performance information S[t]). generated. The derivation of an arithmetic expression (hereinafter referred to as "control law") for generating control information U[t] under the above premise will be discussed below. In the first embodiment, LQG (Linear-Quadratic-Gaussian) control is used to derive the control law.
 まず、以下の数式(4)で表現される状態変数X[t]を想定する。
Figure JPOXMLDOC01-appb-M000004
 数式(4)から理解される通り、状態変数X[t]は、演奏情報S[t]と再生情報P[t]との誤差を表す変数である。すなわち、状態変数X[t]は、演奏者による演奏パートの演奏と放音装置15による再生パートとの誤差を表す。第1実施形態の状態変数X[t]は、位置誤差x1[t]と速度誤差x2[t]とを含む2次元ベクトルである。位置誤差x1[t]は、演奏位置s1[t]と再生位置p1[t]との誤差である(x1[t]=s1[t]-p1[t])。また、速度誤差x2[t]は、演奏速度s2[t]と再生速度p2[t]との誤差である(x2[t]=s2[t]-p2[t])。
First, assume a state variable X[t] expressed by the following equation (4).
Figure JPOXMLDOC01-appb-M000004
As understood from equation (4), the state variable X[t] is a variable that represents the error between the performance information S[t] and the playback information P[t]. That is, the state variable X[t] represents the error between the performance of the performance part by the performer and the reproduction part by the sound emitting device 15. The state variable X[t] in the first embodiment is a two-dimensional vector including a position error x1[t] and a speed error x2[t]. The positional error x1[t] is the error between the performance position s1[t] and the reproduction position p1[t] (x1[t]=s1[t]-p1[t]). Further, the speed error x2[t] is the error between the performance speed s2[t] and the playback speed p2[t] (x2[t]=s2[t]-p2[t]).
 前述の通り、制御情報U[t]は、基準位置を基準とした再生位置u1[t]と、基準速度を基準とした再生速度u2[t]とを含む。いま、再生パートの再生が微小時間dtの間において等速であると仮定すると、以下の数式(5)で表現される状態遷移を想定できる。なお、数式(5)における行列Bは2次の単位行列である。
Figure JPOXMLDOC01-appb-M000005
As described above, the control information U[t] includes a playback position u1[t] based on the reference position and a playback speed u2[t] based on the reference speed. Now, assuming that the playback part is played back at a constant speed during the minute time dt, a state transition expressed by the following equation (5) can be assumed. Note that matrix B in Equation (5) is a quadratic unit matrix.
Figure JPOXMLDOC01-appb-M000005
 いま、ある時刻t'から時間長δが経過した時刻(t'+δ)における制御情報U[t]を算定するための制御則を導出する観点から、以下の数式(6)で表現されるコストJを低減(例えば最小化)することを検討する。
Figure JPOXMLDOC01-appb-M000006
 記号Tは行列の転置を意味する。また、数式(6)の記号Δは、時間軸上において時刻(t'+δ)よりも充分に後方(未来)の時刻である。
Now, from the viewpoint of deriving the control law for calculating the control information U[t] at a time (t'+δ) when a time length δ has elapsed from a certain time t', it is expressed by the following formula (6). Consider reducing (eg, minimizing) the cost J.
Figure JPOXMLDOC01-appb-M000006
The symbol T means transpose of a matrix. Further, the symbol Δ in Equation (6) is a time that is sufficiently behind (in the future) the time (t'+δ) on the time axis.
 数式(6)の記号Q[s1]は、目標楽曲の各演奏位置s1[t]における状態変数X[t]に関するコスト(以下「状態コスト」という)である。前述の通り、状態変数X[t]は、演奏情報S[t]と再生情報P[t]との誤差を意味する。したがって、状態コストQ[s1]は、目標楽曲の演奏位置s1[t]における演奏情報S[t]と再生情報P[t]との誤差に対するコストを意味する。すなわち、状態コストQ[s1]は、再生パートの再生が演奏パートの演奏に追従しないことに対するコストである。数式(6)から理解される通り、状態コストQ[s1]は、2次の正方行列である。 The symbol Q[s1] in Equation (6) is the cost (hereinafter referred to as "state cost") regarding the state variable X[t] at each performance position s1[t] of the target music piece. As mentioned above, the state variable X[t] means the error between the performance information S[t] and the playback information P[t]. Therefore, the state cost Q[s1] means the cost for the error between the performance information S[t] and the playback information P[t] at the performance position s1[t] of the target music piece. That is, the state cost Q[s1] is a cost for the fact that the reproduction of the reproduction part does not follow the performance of the performance part. As understood from Equation (6), the state cost Q[s1] is a quadratic square matrix.
 数式(6)の記号R[p1]は、制御情報U[t]に関するコスト(以下「制御コスト」という)である。具体的には、制御コストR[p1]は、再生位置u1[t]と再生速度u2[t]とに対するコストを意味する。前述の通り、再生位置u1[t]は、基準位置に対する再生位置p1[t]の変化量を意味し、再生速度u2[t]は、基準速度に対する再生速度p2[t]の変化量を意味する。したがって、制御コストR[p1]は、再生情報P[t]が表す再生位置p1[t]および再生速度p2[t]の時間的な変化に関するコストと表現される。すなわち、制御コストR[p1]は、再生情報P[t]の変化に対するコストである。数式(6)から理解される通り、制御コストR[p1]は、2次の正方行列である。 The symbol R[p1] in formula (6) is the cost related to the control information U[t] (hereinafter referred to as "control cost"). Specifically, the control cost R[p1] means the cost for the playback position u1[t] and the playback speed u2[t]. As mentioned above, the playback position u1[t] means the amount of change in the playback position p1[t] with respect to the reference position, and the playback speed u2[t] means the amount of change in the playback speed p2[t] with respect to the reference speed. do. Therefore, the control cost R[p1] is expressed as a cost related to temporal changes in the playback position p1[t] and the playback speed p2[t] represented by the playback information P[t]. That is, the control cost R[p1] is a cost for a change in the reproduction information P[t]. As understood from Equation (6), the control cost R[p1] is a quadratic square matrix.
 数式(6)から理解される通り、コスト(目的関数)Jは、状態変数X[t]と制御情報U[t]と状態コストQ[s1]と制御コストR[p1]とを含む。数式(6)を利用することで、以下の数式(7a)~(7d)の通り、制御情報U[t]を生成するための制御則が導出される。
Figure JPOXMLDOC01-appb-M000007
As understood from formula (6), cost (objective function) J includes state variable X[t], control information U[t], state cost Q[s1], and control cost R[p1]. By using equation (6), a control law for generating control information U[t] is derived as shown in equations (7a) to (7d) below.
Figure JPOXMLDOC01-appb-M000007
 数式(7d)の記号Oは零行列である。すなわち、行列Y[t]は、時刻Δにおいて零行列となる行列である。数式(7a)の記号L[t]は、状態変数X[t]に対するフィードバックゲインであり、2次の正方行列で表現される。数式(7a)から理解される通り、制御情報U[t]は、状態変数X[t]に対して線形なフィードバックを想定すればよい。また、フィードバックゲインL[t]は、制御情報U[t]および状態変数X[t]の何れにも依存しない。他方、フィードバックゲインL[t]は、状態コストQ[s1]および制御コストR[p1]に依存する。 The symbol O in formula (7d) is a zero matrix. That is, the matrix Y[t] is a matrix that becomes a zero matrix at time Δ. The symbol L[t] in Equation (7a) is a feedback gain for the state variable X[t], and is expressed by a quadratic square matrix. As understood from equation (7a), the control information U[t] may assume linear feedback with respect to the state variable X[t]. Further, the feedback gain L[t] does not depend on either the control information U[t] or the state variable X[t]. On the other hand, the feedback gain L[t] depends on the state cost Q[s1] and the control cost R[p1].
 図3の情報生成部32は、状態変数X[t](演奏情報S[t]および再生情報P[t])と状態コストQ[s1]と制御コストR[p1]とを適用した数式(7a)~(7d)の演算により、再生パートの制御情報U[t]を算定する。すなわち、数式(6)のコストJが低減されるように制御情報U[t]が算定される。以上の説明から理解される通り、図2の予測制御部30によるモデル予測制御は、演奏予測部31が予測モデルを利用して演奏情報S[t]を予測する予測処理と、情報生成部32がコストJの低減の観点から好適な制御情報U[t]を生成する最適化処理とを含む。 The information generation unit 32 in FIG. 3 uses a mathematical formula ( The control information U[t] of the playback part is calculated by the calculations 7a) to (7d). That is, the control information U[t] is calculated so that the cost J in Equation (6) is reduced. As can be understood from the above explanation, the model predictive control by the predictive control unit 30 in FIG. includes an optimization process for generating control information U[t] suitable from the viewpoint of reducing cost J.
 図3の演算処理部33は、制御情報U[t]の生成に適用される状態コストQ[s1]および制御コストR[p1]を生成する。状態コストQ[s1]および制御コストR[p1]の生成について以下に詳述する。 The arithmetic processing unit 33 in FIG. 3 generates a state cost Q[s1] and a control cost R[p1] that are applied to the generation of control information U[t]. The generation of the state cost Q[s1] and the control cost R[p1] will be described in detail below.
 図4は、状態コストQ[s1]および制御コストR[p1]の模式図である。図4においては、目標楽曲が図4の楽譜で表現される場合の状態コストQ[s1]および制御コストR[p1]が例示されている。図4には、状態コストQ[s1]のうち第1行第1列の要素の数値と、制御コストR[p1]のうち第1行第1列の要素の数値とが便宜的に図示されている。 FIG. 4 is a schematic diagram of the state cost Q[s1] and the control cost R[p1]. In FIG. 4, state cost Q[s1] and control cost R[p1] when the target music piece is expressed by the musical score shown in FIG. 4 are illustrated. In FIG. 4, the numerical value of the element in the first row and first column of the state cost Q[s1] and the numerical value of the element in the first row and first column of the control cost R[p1] are illustrated for convenience. ing.
 状態コストQ[s1]は、以下の数式(8)で表現される。
Figure JPOXMLDOC01-appb-M000008
 数式(8)の記号εは、状態コストQ[s1]の各数値を安定させるための微小な数値である。記号Iは2次の単位行列を意味する。
The state cost Q[s1] is expressed by the following formula (8).
Figure JPOXMLDOC01-appb-M000008
The symbol ε in Equation (8) is a small value for stabilizing each value of the state cost Q[s1]. The symbol I means a quadratic unit matrix.
 数式(8)の記号Gqは、演奏パートの各音符が演奏者により演奏されるべき位置の集合である。すなわち、集合Gqは、楽曲データDの第1楽譜データD1が指定する各音符の始点の位置(以下「発音位置」という)s'を包含する。図4に例示される通り、数式(8)は、相異なる発音位置s'に対応する複数のパルスqの時系列(以下「パルス列」という)Hqを表す。各パルスqは、発音位置s'に対して時間αだけ後方の時点を中心として配置される。なお、変数αは微小な数値であるため、図4においては発音位置s'と各パルスqの中心とが略一致した状態で図示されている。数式(8)の記号κs'は、発音位置s'に対応するパルスqの最大値を決定する変数である。記号γは、各パルスqのパルス幅を決定する変数である。 The symbol Gq in formula (8) is a set of positions at which each note of the performance part is to be played by the performer. That is, the set Gq includes the starting point position (hereinafter referred to as "sounding position") s' of each note specified by the first musical score data D1 of the music data D. As illustrated in FIG. 4, Equation (8) represents a time series (hereinafter referred to as "pulse train") Hq of a plurality of pulses q corresponding to different sound generation positions s'. Each pulse q is centered at a time point a time α after the sound generation position s'. Note that since the variable α is a small numerical value, in FIG. 4, the sound generation position s' and the center of each pulse q are shown to substantially coincide with each other. The symbol κs' in Equation (8) is a variable that determines the maximum value of the pulse q corresponding to the sounding position s'. The symbol γ is a variable that determines the pulse width of each pulse q.
 図4の横軸において演奏位置s1[t]に対応するパルス列Hqの関数値が状態コストQ[s1]に相当する。数式(8)の記号Cq[s1]は、状態コストQ[s1]を加重するための加重値である。すなわち、加重値Cq[s1]が大きいほど、フィードバックゲインL[t]に対する状態コストQ[s1]の影響が増加する。 On the horizontal axis of FIG. 4, the function value of the pulse train Hq corresponding to the performance position s1[t] corresponds to the state cost Q[s1]. The symbol Cq[s1] in Equation (8) is a weight value for weighting the state cost Q[s1]. That is, the larger the weight value Cq[s1], the more the influence of the state cost Q[s1] on the feedback gain L[t] increases.
 演算処理部33は、第1楽譜データD1を解析することで各発音位置s'を特定し、数式(8)の演算を実行することで状態コストQ[s1]を算定する。前述の数式(6)から理解される通り、各演奏位置s1[t]における状態コストQ[s1]が大きいほど、演奏情報S[t]と再生情報P[t]との誤差(すなわち状態変数X[t])が充分に低減されるように、フィードバックゲインL[t]が設定される。すなわち、演奏パートの各発音位置s'の近傍の演奏位置s1[t]においては、演奏パートの演奏情報S[t]と再生パートの再生情報P[t]との充分な近似が要求される。具体的には、演奏者による演奏パートの演奏と放音装置15による再生パートの再生とが充分に近似するようにフィードバックゲインL[t]が設定される。他方、各発音位置s'から充分に離間した演奏位置s1[t]においては、演奏情報S[t]と再生情報P[t]との相違が許容される。 The calculation processing unit 33 specifies each sounding position s' by analyzing the first musical score data D1, and calculates the state cost Q[s1] by executing the calculation of formula (8). As can be understood from the above equation (6), the larger the state cost Q[s1] at each performance position s1[t], the larger the error between the performance information S[t] and the playback information P[t] (i.e., the state variable Feedback gain L[t] is set so that X[t]) is sufficiently reduced. That is, at the performance position s1[t] near each sound generation position s' of the performance part, sufficient approximation between the performance information S[t] of the performance part and the reproduction information P[t] of the reproduction part is required. . Specifically, the feedback gain L[t] is set so that the performance of the performance part by the performer and the reproduction of the reproduction part by the sound emitting device 15 are sufficiently similar. On the other hand, at a performance position s1[t] sufficiently spaced from each sound generation position s', a difference between performance information S[t] and reproduction information P[t] is allowed.
 制御コストR[p1]は、以下の数式(9)で表現される。
Figure JPOXMLDOC01-appb-M000009
 数式(9)の記号εは、制御コストR[p1]の各数値を安定させるための微小な数値である。記号Iは2次の単位行列を意味する。
The control cost R[p1] is expressed by the following formula (9).
Figure JPOXMLDOC01-appb-M000009
The symbol ε in Equation (9) is a small value for stabilizing each value of the control cost R[p1]. The symbol I means a quadratic unit matrix.
 数式(9)の記号Grは、再生パートの各音符が再生されるべき位置の集合である。すなわち、集合Grは、楽曲データDの第2楽譜データD2が指定する各音符の始点の位置(以下「発音位置」という)p'を包含する。図4に例示される通り、数式(9)は、相異なる発音位置p'に対応する複数のパルスrの時系列(以下「パルス列」という)Hrを表す。各パルスrは、発音位置p'の前方の時点から徐々に増加し、当該発音位置p'の経過後に急峻に減少する形状に設定される。記号ω(p)は、1個のパルスrを表す窓関数であり、例えば以下の数式(10)で表現される。数式(10)の係数c1および係数c2は所定の正数である。
Figure JPOXMLDOC01-appb-M000010
The symbol Gr in Equation (9) is a set of positions at which each note of the playback part is to be played. That is, the set Gr includes the position (hereinafter referred to as "sounding position") p' of the starting point of each note specified by the second musical score data D2 of the music data D. As illustrated in FIG. 4, Equation (9) represents a time series (hereinafter referred to as "pulse train") Hr of a plurality of pulses r corresponding to different sound generation positions p'. Each pulse r is set to have a shape that gradually increases from a point in time before the sound generation position p' and sharply decreases after the sound generation position p' has passed. The symbol ω(p) is a window function representing one pulse r, and is expressed, for example, by the following equation (10). The coefficient c1 and the coefficient c2 in Equation (10) are predetermined positive numbers.
Figure JPOXMLDOC01-appb-M000010
 図4の横軸において再生位置p1[t]に対応するパルス列Hrの関数値が制御コストR[p1]に相当する。数式(9)の記号Cr[p1]は、制御コストR[p1]を加重するための加重値である。すなわち、加重値Cr[p1]が大きいほど、フィードバックゲインL[t]に対する制御コストR[p1]の影響が増加する。 On the horizontal axis of FIG. 4, the function value of the pulse train Hr corresponding to the reproduction position p1[t] corresponds to the control cost R[p1]. The symbol Cr[p1] in Equation (9) is a weight value for weighting the control cost R[p1]. That is, the larger the weight value Cr[p1], the greater the influence of the control cost R[p1] on the feedback gain L[t].
 演算処理部33は、第2楽譜データD2を解析することで各発音位置p'を特定し、数式(9)の演算を実行することで制御コストR[p1]を算定する。前述の数式(6)から理解される通り、各位置p1における制御コストR[p1]が大きいほど、制御情報U[t]が充分に低減されるように、フィードバックゲインL[t]が設定される。演奏パートの各発音位置p'の近傍においては、基準位置と再生位置p1[t]との充分な近似(再生位置u1[t]の充分な減少)と、基準速度と再生位置p1[t]との充分な近似(再生速度u2[t]の充分な減少)とが要求される。具体的には、放音装置15による再生パートの再生が、第2楽譜データD2が表す音符列に充分に近似するように、フィードバックゲインL[t]が設定される。他方、各発音位置p'から充分に離間した再生位置p1[t]においては、再生情報P[t]の変化が許容される。 The calculation processing unit 33 specifies each sounding position p' by analyzing the second musical score data D2, and calculates the control cost R[p1] by executing the calculation of formula (9). As understood from the above equation (6), the feedback gain L[t] is set so that the larger the control cost R[p1] at each position p1, the more the control information U[t] is sufficiently reduced. Ru. In the vicinity of each sounding position p' of the performance part, the reference position and the playback position p1[t] are sufficiently approximated (the playback position u1[t] is sufficiently reduced), and the reference speed and the playback position p1[t] A sufficient approximation (sufficient reduction of the playback speed u2[t]) is required. Specifically, the feedback gain L[t] is set so that the reproduction of the reproduction part by the sound emitting device 15 sufficiently approximates the note sequence represented by the second musical score data D2. On the other hand, changes in the reproduction information P[t] are allowed at reproduction positions p1[t] that are sufficiently distant from each sound generation position p'.
 図3の変数設定部34は、制御情報U[t]の生成に適用される変数を設定する。具体的には、変数設定部34は、数式(8)に含まれる各変数(ε,κs',γ,α,Cq[s1])と、数式(9)に含まれる各変数(ε,Cr[p1],c1,c2)とを設定する。例えば、変数設定部34は、数式(8)または数式(9)に含まれる各変数を、記憶装置12に記憶された数値に設定する。以上の通り、第1実施形態の変数設定部34は、数式(6)のコストJに含まれる1以上の変数を設定する。演算処理部33は、変数設定部34が設定した変数を適用した演算により、状態コストQ[s1]および制御コストR[p1]を算定する。 The variable setting unit 34 in FIG. 3 sets variables that are applied to the generation of control information U[t]. Specifically, the variable setting unit 34 sets each variable (ε, κs', γ, α, Cq[s1]) included in formula (8) and each variable (ε, Cr [p1], c1, c2). For example, the variable setting unit 34 sets each variable included in formula (8) or formula (9) to a numerical value stored in the storage device 12. As described above, the variable setting unit 34 of the first embodiment sets one or more variables included in the cost J of formula (6). The arithmetic processing unit 33 calculates the state cost Q[s1] and the control cost R[p1] by calculation using the variables set by the variable setting unit 34.
 図5は、加重値Cq[s1]および加重値Cr[p1]とフィードバックゲインL[t]との関係を表すグラフである。なお、図5においては、フィードバックゲインL[t]のうち第1行第1列の要素の数値が便宜的に図示されている。数式(7a)から理解される通り、フィードバックゲインL[t]が大きいほど、再生パートの再生情報P[t]が強力に補正されるという傾向がある。例えば、フィードバックゲインL[t]が大きいほど、演奏パートの演奏情報S[t]に近似するように再生パートの再生が補正される。 FIG. 5 is a graph showing the relationship between the weighted value Cq[s1], the weighted value Cr[p1], and the feedback gain L[t]. Note that in FIG. 5, the numerical value of the element in the first row and first column of the feedback gain L[t] is illustrated for convenience. As understood from equation (7a), there is a tendency that the larger the feedback gain L[t] is, the more strongly the reproduction information P[t] of the reproduction part is corrected. For example, the larger the feedback gain L[t] is, the more the playback of the playback part is corrected to approximate the performance information S[t] of the playback part.
 図5においては、図4と同様の楽譜が想定されている。目標楽曲は、時間軸上において区間σ1と区間σ2と区間σ3とに区分される。区間σ1は、演奏パートが発音され、再生パートは無音に維持される区間である。区間σ2は、再生パートが発音され、演奏パートは無音に維持される区間である。区間σ3は、演奏パートおよび再生パートの双方が発音される区間である。 In FIG. 5, a musical score similar to that in FIG. 4 is assumed. The target music piece is divided into a section σ1, a section σ2, and a section σ3 on the time axis. The interval σ1 is an interval in which the performance part is sounded and the reproduction part is kept silent. The interval σ2 is an interval in which the playback part is sounded and the performance part is kept silent. The interval σ3 is an interval in which both the performance part and the playback part are sounded.
 図5のグラフV1は、加重値Cq[s1]と加重値Cr[p1]とを相等しい数値に設定した場合(ケース1)のフィードバックゲインL[t]である。ケース1においては、区間σ1内における演奏パートの発音位置s'の近傍においてフィードバックゲインL[t]が大きい数値に設定される。すなわち、演奏情報S[t]と再生情報P[t]との誤差が充分に低減されるように再生パートの再生が強目に補正される。他方、区間σ2内においてフィードバックゲインL[t]は充分に小さい数値に維持される。すなわち、区間σ2において再生パートの再生は殆ど補正されない。また、区間σ3内においては、区間σ1内ほどではないが、発音位置s'の近傍においてフィードバックゲインL[t]が大きい数値に維持される。すなわち、演奏パートおよび再生パートの双方が発音される区間σ3においては、区間σ1ほどではないものの再生パートの再生は強目に補正される。 The graph V1 in FIG. 5 is the feedback gain L[t] when the weighted value Cq[s1] and the weighted value Cr[p1] are set to equal values (case 1). In case 1, the feedback gain L[t] is set to a large value near the sounding position s' of the performance part within the interval σ1. That is, the reproduction of the reproduction part is strongly corrected so that the error between the performance information S[t] and the reproduction information P[t] is sufficiently reduced. On the other hand, the feedback gain L[t] is maintained at a sufficiently small value within the interval σ2. That is, the reproduction of the reproduction part is hardly corrected in the interval σ2. Furthermore, within the interval σ3, the feedback gain L[t] is maintained at a large value near the sound generation position s', although it is not as large as within the interval σ1. That is, in the section σ3 in which both the performance part and the playback part are sounded, the playback of the playback part is strongly corrected, although not as much as in the section σ1.
 図5のグラフV2は、加重値Cq[s1]が加重値Cr[p1]よりも充分に小さい場合(ケース2)のフィードバックゲインL[t]である。具体的には、加重値Cq[s1]を0.1に設定し、加重値Cr[p1]を1.0に設定した。ケース2においては、フィードバックゲインL[t]は全体的に小さい数値に維持される。すなわち、演奏情報S[t]と再生情報P[t]との誤差の補正は、ケース1と比較して抑制される。 Graph V2 in FIG. 5 is the feedback gain L[t] when the weighted value Cq[s1] is sufficiently smaller than the weighted value Cr[p1] (case 2). Specifically, the weight value Cq[s1] was set to 0.1, and the weight value Cr[p1] was set to 1.0. In case 2, the feedback gain L[t] is maintained at a small value overall. That is, the correction of the error between the performance information S[t] and the playback information P[t] is suppressed compared to Case 1.
 図5のグラフV3は、加重値Cq[s1]が加重値Cr[p1]よりも充分に大きい場合(ケース3)のフィードバックゲインL[t]である。具体的には、加重値Cq[s1]を1.0に設定し、加重値Cr[p1]を0.1に設定した。ケース3においては、再生パートの発音の有無とは無関係に、演奏パートの発音位置s'の近傍においてフィードバックゲインL[t]が大きい数値に設定される。すなわち、再生パートの発音の有無に関わらず、演奏情報S[t]と再生情報P[t]との誤差が充分に低減されるように再生パートの再生が強目に補正される。 Graph V3 in FIG. 5 is the feedback gain L[t] when the weighted value Cq[s1] is sufficiently larger than the weighted value Cr[p1] (case 3). Specifically, the weight value Cq[s1] was set to 1.0, and the weight value Cr[p1] was set to 0.1. In case 3, the feedback gain L[t] is set to a large value in the vicinity of the sound generation position s' of the performance part, regardless of whether or not the playback part is generating sound. That is, the reproduction of the reproduced part is strongly corrected so that the error between the performance information S[t] and the reproduced information P[t] is sufficiently reduced, regardless of whether or not the reproduced part is sounded.
 以上の説明から理解される通り、演奏パートに対する再生パートの再生の挙動は、加重値Cq[s1]および加重値Cr[p1]に応じて変化する。具体的には、加重値Cq[s1]と加重値Cr[p1]との大小関係に応じて演奏パートと再生パートとの関係が変化する。 As can be understood from the above explanation, the reproduction behavior of the reproduction part with respect to the performance part changes according to the weight value Cq[s1] and the weight value Cr[p1]. Specifically, the relationship between the performance part and the playback part changes depending on the magnitude relationship between the weighted value Cq[s1] and the weighted value Cr[p1].
 図6は、制御装置11が実行する処理(以下「制御処理」という)のフローチャートである。制御処理は所定の周期で反復される。 FIG. 6 is a flowchart of the process (hereinafter referred to as "control process") executed by the control device 11. The control process is repeated at predetermined intervals.
 制御処理が開始されると、制御装置11(解析部311)は、演奏データEを解析することで演奏時刻t[k]と演奏位置s[k]とを推定する(Sa1)。また、制御装置11(予測部312)は、予測モデルを利用することで、演奏時刻t[k]の後方の時刻tについて演奏情報S[t]を生成する(Sa2:予測処理)。 When the control process is started, the control device 11 (analysis unit 311) estimates the performance time t[k] and the performance position s[k] by analyzing the performance data E (Sa1). Further, the control device 11 (prediction unit 312) generates performance information S[t] for time t after performance time t[k] by using the prediction model (Sa2: prediction process).
 制御装置11(変数設定部34)は、制御情報U[t]の生成に適用される変数を設定する(Sa3)。制御装置11(演算処理部33)は、状態コストQ[s1]および制御コストR[p1]を生成する(Sa4)。具体的には、制御装置11は、第1楽譜データD1を解析することで状態コストQ[s1]を生成し、第2楽譜データD2を解析することで制御コストR[p1]を生成する。状態コストQ[s1]および制御コストR[p1]の生成には、ステップSa3にて設定した変数が適用される。 The control device 11 (variable setting unit 34) sets variables applied to the generation of control information U[t] (Sa3). The control device 11 (arithmetic processing unit 33) generates a state cost Q[s1] and a control cost R[p1] (Sa4). Specifically, the control device 11 generates the state cost Q[s1] by analyzing the first musical score data D1, and generates the control cost R[p1] by analyzing the second musical score data D2. The variables set in step Sa3 are applied to generate the state cost Q[s1] and the control cost R[p1].
 制御装置11(情報生成部32)は、状態変数X[t]と状態コストQ[s1]と制御コストR[p1]とを適用した数式(7a)~(7d)の演算により、数式(6)のコストJが低減されるように制御情報U[t]を算定する(Sa5:最適化処理)。予測モデルを利用して演奏情報S[t]を生成する予測処理(Sa2)と、演奏情報S[t]を利用して制御情報U[t]を生成する最適化処理(Sa5)とにより、モデル予測制御が実現される。 The control device 11 (information generation unit 32) calculates the formula (6) by calculating the formulas (7a) to (7d) applying the state variable X[t], the state cost Q[s1], and the control cost R[p1]. ) is calculated so that the cost J is reduced (Sa5: optimization process). Through prediction processing (Sa2) that generates performance information S[t] using a prediction model and optimization processing (Sa5) that generates control information U[t] using performance information S[t], Model predictive control is realized.
 制御装置11(再生制御部40)は、放音装置15による再生パートの再生を制御情報U[t]に応じて制御する(Sa6)。具体的には、制御装置11は、制御情報U[t]から再生情報P[t]を生成し、音響信号Zのうち当該再生情報P[t]に対応する部分を放音装置15に再生させる。 The control device 11 (playback control unit 40) controls the playback of the playback part by the sound emitting device 15 according to the control information U[t] (Sa6). Specifically, the control device 11 generates reproduction information P[t] from the control information U[t], and reproduces a portion of the acoustic signal Z corresponding to the reproduction information P[t] on the sound emitting device 15. let
 以上に説明した通り、第1実施形態においては、制御情報U[t]の生成にモデル予測制御が利用されるから、演奏者による演奏に応じて再生パートの再生を適切に制御できる。第1実施形態においては特に、演奏情報S[t]と再生情報P[t]との誤差を表す状態変数X[t]を含むコストJが低減されるように制御情報U[t]が生成される。したがって、再生パートの再生を演奏者による演奏に連動させることが可能である。 As explained above, in the first embodiment, since model predictive control is used to generate the control information U[t], it is possible to appropriately control the reproduction of the reproduction part according to the performance by the performer. In particular, in the first embodiment, the control information U[t] is generated so that the cost J including the state variable X[t] representing the error between the performance information S[t] and the playback information P[t] is reduced. be done. Therefore, it is possible to link the reproduction of the reproduction part to the performance by the performer.
 また、状態変数X[t]に関する状態コストQ[s1]と、再生情報P[t]の時間的な変化に関する制御コストR[p1]とがコストJに含まれる。状態コストQ[s1]の低減により、演奏情報S[t]と再生情報P[t]との誤差(状態変数X(t))が有効に低減される。また、制御コストR[p1]の低減により、再生情報P[t]の過度な変化が抑制される。したがって、演奏情報S[t]と再生情報P[t]との誤差と、再生情報P[t]の過度な変化とを有効に低減できる。 Furthermore, the cost J includes a state cost Q[s1] related to the state variable X[t] and a control cost R[p1] related to temporal changes in the reproduction information P[t]. By reducing the state cost Q[s1], the error (state variable X(t)) between the performance information S[t] and the reproduction information P[t] is effectively reduced. Further, by reducing the control cost R[p1], excessive changes in the reproduction information P[t] are suppressed. Therefore, the error between the performance information S[t] and the reproduction information P[t] and the excessive change in the reproduction information P[t] can be effectively reduced.
B:第2実施形態
 第2実施形態を説明する。なお、以下に例示する各態様において機能が第1実施形態と同様である要素については、第1実施形態の説明と同様の符号を流用して各々の詳細な説明を適宜に省略する。
B: Second Embodiment The second embodiment will be described. In addition, in each aspect illustrated below, for elements whose functions are similar to those in the first embodiment, the same reference numerals as in the description of the first embodiment are used, and detailed descriptions of each are omitted as appropriate.
 第2実施形態は、変数設定部34の動作が第1実施形態とは相違する。変数設定部34以外の要素の構成および動作は第1実施形態と同様である。したがって、第2実施形態においても第1実施形態と同様の効果が実現される。 The second embodiment differs from the first embodiment in the operation of the variable setting unit 34. The configuration and operation of elements other than the variable setting section 34 are the same as in the first embodiment. Therefore, the second embodiment also achieves the same effects as the first embodiment.
 第1実施形態の変数設定部34は、制御情報U[t]の生成に適用される変数を、記憶装置12に事前に記憶された数値に設定する。第2実施形態の変数設定部34は、操作装置14に対する利用者からの指示に応じて、制御情報U[t]の生成に適用される変数を設定する(Sa3)。 具体的には、変数設定部34は、数式(8)の各変数(ε,κs',γ,α,Cq[s1])と数式(9)の各変数(ε,Cr[p1],c1,c2)とを、利用者からの指示に応じて可変に設定する。演算処理部33は、変数設定部34が設定した変数を適用した演算により、状態コストQ[s1]および制御コストR[p1]を算定する(Sa4)。第2実施形態によれば、数式(6)のコストJに関する変数が、利用者からの指示に応じて設定されるから、再生パートの再生に利用者の意図を反映させることができる。 The variable setting unit 34 of the first embodiment sets the variable applied to the generation of the control information U[t] to a numerical value stored in advance in the storage device 12. The variable setting unit 34 of the second embodiment sets variables to be applied to the generation of the control information U[t] in response to a user's instruction to the operating device 14 (Sa3). Specifically, the variable setting unit 34 sets each variable (ε, κs', γ, α, Cq[s1]) in equation (8) and each variable (ε, Cr[p1], c1 , c2) are set variably according to instructions from the user. The calculation processing unit 33 calculates the state cost Q[s1] and the control cost R[p1] by calculation using the variables set by the variable setting unit 34 (Sa4). According to the second embodiment, the variable related to the cost J in Equation (6) is set according to an instruction from the user, so that the user's intention can be reflected in the reproduction of the reproduction part.
 以上の説明の通り、第2実施形態の変数設定部34は、数式(8)の加重値Cq[s1]と、数式(9)の加重値Cr[p1]とを設定する。加重値Cq[s1]は「第1加重値」の一例であり、加重値Cr[p1]は「第2加重値」の一例である。 As explained above, the variable setting unit 34 of the second embodiment sets the weighted value Cq[s1] of Equation (8) and the weighted value Cr[p1] of Equation (9). The weight value Cq[s1] is an example of a "first weight value", and the weight value Cr[p1] is an example of a "second weight value".
 図7は、加重値Cq[s1]および加重値Cr[p1]を利用者が変更するための設定画面141の模式図である。変数設定部34は、設定画面141を表示装置13に表示させる。 FIG. 7 is a schematic diagram of the setting screen 141 for the user to change the weight value Cq[s1] and the weight value Cr[p1]. The variable setting unit 34 displays a setting screen 141 on the display device 13.
 設定画面141は、楽曲データDが表す目標楽曲の楽譜142を含む。楽譜142は、第1楽譜データD1が表す演奏パートの楽譜143と、第2楽譜データD2が表す再生パートの楽譜144とを含む。利用者は、操作装置14を操作することで、楽譜142内の任意の区間(以下「設定区間」という)145を指定できる。変数設定部34は、利用者による設定区間145の指定を受付ける。なお、楽譜142には、複数の設定区間145が指定されてもよい。 The setting screen 141 includes the musical score 142 of the target song represented by the song data D. The musical score 142 includes a musical score 143 of the performance part represented by the first musical score data D1, and a musical score 144 of the reproduction part represented by the second musical score data D2. The user can specify an arbitrary section (hereinafter referred to as a "set section") 145 within the musical score 142 by operating the operating device 14. The variable setting unit 34 accepts the designation of a setting section 145 by the user. Note that a plurality of setting sections 145 may be specified in the musical score 142.
 利用者は、操作装置14を操作することで、加重値Cq[s1]および加重値Cr[p1]の何れかを選択する。変数設定部34は、利用者による選択を受付ける。加重値Cq[s1]が選択された場合、変数設定部34は、図7の変更画像146を表示装置13に表示する。変更画像146は、加重値Cq[s1]の現在の数値(synchrony)を含む。利用者は、変更画像146に対する操作により加重値Cq[s1]の増加(Increase synchrony)または減少(Decrease synchrony)を指示できる。変数設定部34は、利用者からの指示に応じて設定区間145内の加重値Cq[s1]を変更する。変更画像146には、変更後の加重値Cq[s1](synchrony=3)が表示される。変数設定部34は、利用者が指定した設定区間145毎に加重値Cq[s1]を設定する。なお、変数設定部34は、加重値Cq[s1]を、利用者が直接的に指定した数値に設定してもよい。 The user selects either the weighted value Cq[s1] or the weighted value Cr[p1] by operating the operating device 14. The variable setting unit 34 accepts selections by the user. When the weight value Cq[s1] is selected, the variable setting unit 34 displays the changed image 146 in FIG. 7 on the display device 13. The modified image 146 includes the current value (synchrony) of the weight value Cq[s1]. The user can instruct increase (Increase synchrony) or decrease (Decrease synchrony) of weight value Cq[s1] by operating on change image 146. The variable setting unit 34 changes the weight value Cq[s1] within the setting section 145 in response to an instruction from the user. The changed image 146 displays the changed weight value Cq[s1] (synchrony=3). The variable setting unit 34 sets a weight value Cq[s1] for each setting section 145 specified by the user. Note that the variable setting unit 34 may set the weight value Cq[s1] to a numerical value directly specified by the user.
 また、加重値Cr[p1]が選択された場合、変数設定部34は、図8の変更画像147を表示装置13に表示する。変更画像147は、加重値Cr[p1]の現在の数値(rigidity)を含む。利用者は、変更画像147に対する操作により加重値Cr[p1]の増加(Increase rigidity)または減少(Decrease rigidity)を指示できる。変数設定部34は、利用者からの指示に応じて設定区間145内の加重値Cr[p1]を変更する。変更画像147には、変更後の加重値Cr[p1](rigidity=3)が表示される。変数設定部34は、利用者が指定した設定区間145毎に加重値Cr[p1]を設定する。なお、変数設定部34は、加重値Cr[p1]を、利用者が直接的に指定した数値に設定してもよい。 Further, when the weight value Cr[p1] is selected, the variable setting unit 34 displays the changed image 147 in FIG. 8 on the display device 13. The modified image 147 includes the current value (rigidity) of the weight value Cr[p1]. The user can instruct increase (Increase rigidity) or decrease (Decrease rigidity) of weight value Cr[p1] by operating on change image 147. The variable setting unit 34 changes the weight value Cr[p1] within the setting section 145 in response to an instruction from the user. The changed image 147 displays the changed weight value Cr[p1] (rigidity=3). The variable setting unit 34 sets a weight value Cr[p1] for each setting section 145 specified by the user. Note that the variable setting unit 34 may set the weight value Cr[p1] to a numerical value directly designated by the user.
 第2実施形態の演算処理部33は、変数設定部34が設定した加重値Cq[s1]および加重値Cr[p1]に応じて状態コストQ[s1]および制御コストR[p1]を生成する(Sa4)。具体的には、演算処理部33は、目標楽曲のうち設定区間145内の演奏位置s1[t]について、当該設定区間145の加重値Cq[s1]を適用した数式(8)の演算により、状態コストQ[s1]を算定する。また、演算処理部33は、目標楽曲のうち設定区間145内の再生位置p1[t]について、当該設定区間145の加重値Cr[p1]を適用した数式(9)の演算により、制御コストR[p1]を算定する。目標楽曲のうち設定区間145以外の区間について、加重値Cq[s1]および加重値Cr[p1]は、所定の初期値に設定される。 The arithmetic processing unit 33 of the second embodiment generates the state cost Q[s1] and the control cost R[p1] according to the weighted value Cq[s1] and the weighted value Cr[p1] set by the variable setting unit 34. (Sa4). Specifically, the arithmetic processing unit 33 calculates the performance position s1[t] within the set section 145 of the target music according to formula (8) by applying the weighted value Cq[s1] of the set section 145. Calculate the state cost Q[s1]. In addition, the calculation processing unit 33 calculates the control cost R by calculating the playback position p1[t] within the set section 145 of the target music according to formula (9) to which the weighted value Cr[p1] of the set section 145 is applied. Calculate [p1]. For sections other than the set section 145 of the target song, the weight value Cq[s1] and the weight value Cr[p1] are set to predetermined initial values.
 図5を参照して説明した通り、演奏パートに対する再生パートの再生の挙動は、加重値Cq[s1]および加重値Cr[p1]に応じて変化する。第2実施形態においては、変数設定部34による加重値Cq[s1]および加重値Cr[p1]の設定に応じて、演奏パートと再生パートとの関係を変更できる。第2実施形態においては特に、加重値Cq[s1]および加重値Cr[p1]の各々が利用者からの指示に応じて設定される。したがって、演奏パートと再生パートとの関係を利用者が変更できる。 As explained with reference to FIG. 5, the reproduction behavior of the reproduction part with respect to the performance part changes according to the weight value Cq[s1] and the weight value Cr[p1]. In the second embodiment, the relationship between the performance part and the playback part can be changed according to the setting of the weight value Cq[s1] and the weight value Cr[p1] by the variable setting section 34. Particularly in the second embodiment, each of the weight value Cq[s1] and the weight value Cr[p1] is set according to instructions from the user. Therefore, the user can change the relationship between the performance parts and the playback parts.
C:第3実施形態
 第1実施形態においては、1個の演奏パートの演奏と1個の再生パートの再生とを想定した。一方、前述のモデル予測制御は、多入出力(MIMO:Multi-Input-Multi-Output)に拡張し易いという利点がある。以上の事情を考慮して、第3実施形態においては、N個の(Nは自然数)の演奏パートの演奏に連動するようにM個(Mは自然数)の再生パートの再生を制御する場合を想定する。例えばN人の演奏者の各々が目標楽曲の相異なる演奏パートを演奏する。したがって、相異なる演奏者(演奏パート)に対応するN個の演奏データEが再生制御システム10に並列に供給される。演奏者の総数Nおよび再生パートの総数Mの双方を「1」とした場合が、前述の第1実施形態に相当する。
C: Third Embodiment In the first embodiment, it is assumed that one performance part is played and one reproduction part is played. On the other hand, the above-described model predictive control has the advantage of being easily expandable to multi-input-multi-output (MIMO). Considering the above circumstances, in the third embodiment, a case is described in which the playback of M (M is a natural number) playback parts is controlled in conjunction with the performance of N (N is a natural number) playback parts. Suppose. For example, each of the N performers plays a different performance part of the target piece of music. Therefore, N performance data E corresponding to different performers (performance parts) are supplied to the reproduction control system 10 in parallel. The case where both the total number N of performers and the total number M of reproduction parts are set to "1" corresponds to the first embodiment described above.
 第3実施形態の楽曲データDは、N個の第1楽譜データD1とM個の第2楽譜データD2とを含む。N個の第1楽譜データD1は、目標楽曲の相異なる演奏パートに対応する。M個の第2楽譜データD2は、目標楽曲の相異なる再生パートに対応する。また、記憶装置12は、相異なる再生パートに対応するM個の音響信号Zを記憶する。各再生パートの音響信号Zは、当該再生パートの楽音の波形を表す。 The music data D of the third embodiment includes N pieces of first score data D1 and M pieces of second score data D2. The N pieces of first musical score data D1 correspond to different performance parts of the target musical piece. The M pieces of second musical score data D2 correspond to different reproduction parts of the target musical piece. The storage device 12 also stores M audio signals Z corresponding to different reproduction parts. The acoustic signal Z of each reproduction part represents the waveform of the musical tone of the reproduction part.
 演奏予測部31は、N個の演奏パートの各々について、予測モデルを利用して演奏情報S[t]を予測する。すなわち、N人の演奏者の各々について演奏情報S[t]が予測される。演奏情報S[t]を予測する処理は第1実施形態と同様である。各演奏パートの演奏情報S[t]は、当該演奏パートの演奏(すなわち演奏データE)から予測される。なお、演奏予測部31は、演奏パート毎に別個の予測モデルを利用して各演奏パートの演奏情報S[t]を予測してもよいし、N個の演奏パートに共通の予測モデルを利用して各演奏パートの演奏情報S[t]を予測してもよい。 The performance prediction unit 31 predicts performance information S[t] for each of the N performance parts using a prediction model. That is, performance information S[t] is predicted for each of the N performers. The process of predicting performance information S[t] is the same as in the first embodiment. Performance information S[t] of each performance part is predicted from the performance of the performance part (ie, performance data E). Note that the performance prediction unit 31 may predict the performance information S[t] of each performance part using a separate prediction model for each performance part, or may use a prediction model common to N performance parts. The performance information S[t] of each performance part may be predicted by doing so.
 図9は、第3実施形態における状態変数X[t]および状態コストQ[s1]の説明図である。第3実施形態の状態変数X[t]は、N×M個の状態変数Xn,m[t]を含む(n=1~N,m=1~M)。具体的には、状態変数X[t]は、N個の演奏パートの何れかとM個の再生パートの何れかとを選択する全通りの組合せについて状態変数Xn,m[t]を含む。状態変数Xn,m[t]は、第1実施形態の状態変数X[t]に相当する。したがって、状態変数Xn,m[t]は、第n番目の演奏パートの演奏情報S[t]と、第m番目の再生パートの再生情報P[t]との誤差を表す2次元ベクトルである。すなわち、状態変数Xn,m[t]は、第n番目の演奏パートの演奏と放音装置15による第m番目の再生パートとの誤差を表す。 FIG. 9 is an explanatory diagram of the state variable X[t] and state cost Q[s1] in the third embodiment. The state variables X[t] of the third embodiment include N×M state variables Xn,m[t] (n=1 to N, m=1 to M). Specifically, the state variable X[t] includes state variables Xn,m[t] for all combinations of selecting one of the N performance parts and one of the M reproduction parts. The state variable Xn,m[t] corresponds to the state variable X[t] of the first embodiment. Therefore, the state variable Xn,m[t] is a two-dimensional vector representing the error between the performance information S[t] of the nth performance part and the reproduction information P[t] of the mth performance part. . That is, the state variable Xn,m[t] represents the error between the performance of the n-th performance part and the m-th reproduction part by the sound emitting device 15.
 状態コストQ[s1]は、N×M個の部分行列Qn,m[s1]を対角成分とするブロック対角行列である。状態コストQ[s1]のうち部分行列Qn,m[s1]以外の要素はゼロに設定される。具体的には、状態コストQ[s1]は、N個の演奏パートの何れかとM個の再生パートの何れかとを選択する全通りの組合せについて部分行列Qn,m[s1]を含む。部分行列Qn,m[s1]は、第1実施形態の状態コストQ[s1]に相当する。具体的には、部分行列Qn,m[s1]は、第n番目の演奏パートの演奏位置s1[t]における演奏情報S[t]と第m番目の再生パートの再生情報P[t]との誤差に対するコストである。演算処理部33は、第1実施形態の状態コストQ[s1]と同様に、数式(8)により部分行列Qn,m[s1]を算定する。部分行列Qn,m[s1]の算定に適用される数式(8)の集合Gqは、楽曲データDのうち第n番目の演奏パートの第1楽譜データD1が指定する各音符の発音位置s'である。 The state cost Q[s1] is a block diagonal matrix whose diagonal components are N×M submatrices Qn,m[s1]. Elements of the state cost Q[s1] other than the submatrix Qn,m[s1] are set to zero. Specifically, the state cost Q[s1] includes submatrices Qn,m[s1] for all combinations of selecting one of the N performance parts and one of the M reproduction parts. The submatrix Qn,m[s1] corresponds to the state cost Q[s1] of the first embodiment. Specifically, the submatrix Qn,m[s1] is composed of the performance information S[t] at the performance position s1[t] of the nth performance part and the reproduction information P[t] of the mth performance part. is the cost for the error. The arithmetic processing unit 33 calculates the submatrix Qn,m[s1] using Equation (8) similarly to the state cost Q[s1] in the first embodiment. The set Gq of formula (8) applied to the calculation of the submatrix Qn,m[s1] is the sounding position s' of each note specified by the first score data D1 of the n-th performance part of the music data D. It is.
 第3実施形態の変数設定部34は、数式(8)の加重値Cq[s1]を部分行列Qn,m[s1]毎に個別に設定する。例えば、記憶装置12には相異なる複数の設定データが記憶される。複数の設定データの各々には、演奏パートと再生パートとの相異なる組合せに対応するN×M個の加重値Cq[s1]が登録される。各加重値Cq[s1]の数値は、設定データ毎に相違する。変数設定部34は、操作装置14に対する利用者からの指示に応じて複数の設定データの何れかを選択する。設定データの選択は、各部分行列Qn,m[s1]に対応する加重値Cq[s1]の設定に相当する。演算処理部33は、設定データに登録された各加重値Cq[s1]を適用した数式(8)の演算により部分行列Qn,m[s1]を算定する。以上の説明から理解される通り、各部分行列Qn,m[s1]の生成に適用される加重値Cq[s1]が、利用者からの指示に応じて変更される。なお、変数設定部34は、N×M個の加重値Cq[s1]の各々を、利用者からの指示に応じて個別に設定してもよい。 The variable setting unit 34 of the third embodiment individually sets the weight value Cq[s1] of formula (8) for each submatrix Qn,m[s1]. For example, the storage device 12 stores a plurality of different setting data. N×M weight values Cq[s1] corresponding to different combinations of performance parts and playback parts are registered in each of the plurality of setting data. The numerical value of each weight value Cq[s1] differs for each setting data. The variable setting unit 34 selects any one of the plurality of setting data according to a user's instruction to the operating device 14. Selection of setting data corresponds to setting of weight value Cq[s1] corresponding to each submatrix Qn,m[s1]. The calculation processing unit 33 calculates the submatrix Qn,m[s1] by calculating the formula (8) applying each weight value Cq[s1] registered in the setting data. As understood from the above description, the weight value Cq[s1] applied to the generation of each submatrix Qn,m[s1] is changed according to instructions from the user. Note that the variable setting unit 34 may individually set each of the N×M weight values Cq[s1] according to instructions from the user.
 図10は、第3実施形態における制御情報U[t]および制御コストR[p1]の説明図である。第3実施形態の制御情報U[t]は、目標楽曲の相異なる再生パートに対応するM個の制御情報U1[t]~UM[t]を含む。制御情報Um[t]は、第1実施形態の制御情報U[t]に相当する。したがって、制御情報Um[t]は、再生位置u1[t]と再生速度u2[t]とを含む2次元ベクトルである。再生制御部40は、放音装置15による第m番目の再生パートの再生を、制御情報Um[t]に応じて制御する。具体的には、再生制御部40は、制御情報Um[t]から再生情報Pm[t]を生成し、当該再生情報Pm[t]に応じて第m番目の再生パートを放音装置15に再生させる。すなわち、再生制御部40は、第m番目の再生パートの音響信号Zのうち再生情報Pm[t]に対応する部分を放音装置15に再生させる。したがって、目標楽曲のM個の再生パートの楽音が並列に再生される。 FIG. 10 is an explanatory diagram of control information U[t] and control cost R[p1] in the third embodiment. The control information U[t] of the third embodiment includes M pieces of control information U1[t] to UM[t] corresponding to different playback parts of the target song. The control information Um[t] corresponds to the control information U[t] of the first embodiment. Therefore, the control information Um[t] is a two-dimensional vector including the playback position u1[t] and the playback speed u2[t]. The playback control unit 40 controls the playback of the m-th playback part by the sound emitting device 15 according to the control information Um[t]. Specifically, the playback control unit 40 generates playback information Pm[t] from the control information Um[t], and outputs the m-th playback part to the sound emitting device 15 according to the playback information Pm[t]. Let it play. That is, the reproduction control unit 40 causes the sound emitting device 15 to reproduce a portion of the audio signal Z of the m-th reproduction part that corresponds to the reproduction information Pm[t]. Therefore, the musical tones of M reproduction parts of the target music piece are reproduced in parallel.
 制御コストR[p1]は、M個の部分行列R1[p1]~RM[p1]を対角成分とするブロック対角行列である。制御コストR[p1]のうち部分行列Rm[p1]以外の要素はゼロに設定される。部分行列Rm[p1]は、第1実施形態の制御コストR[p1]に相当する。具体的には、第m番目の再生パートの再生位置p1[t]における再生情報Pm[t]の変化に関するコストである。演算処理部33は、第1実施形態の制御コストR[p1]と同様に、数式(9)により部分行列Rm[p1]を算定する。部分行列Rm[p1]の算定の適用される数式(9)の集合Grは、楽曲データDのうち第m番目の再生パートの第2楽譜データD2が指定する各音符の発音位置p'である。 The control cost R[p1] is a block diagonal matrix whose diagonal components are M submatrices R1[p1] to RM[p1]. Elements of the control cost R[p1] other than the submatrix Rm[p1] are set to zero. The submatrix Rm[p1] corresponds to the control cost R[p1] of the first embodiment. Specifically, it is a cost related to a change in the reproduction information Pm[t] at the reproduction position p1[t] of the m-th reproduction part. The arithmetic processing unit 33 calculates the submatrix Rm[p1] using Equation (9) similarly to the control cost R[p1] of the first embodiment. The set Gr of formula (9) to which the calculation of the submatrix Rm[p1] is applied is the pronunciation position p' of each note specified by the second score data D2 of the m-th playback part of the music data D. .
 第3実施形態の変数設定部34は、数式(9)の加重値Cr[p1]を部分行列Rm[p1]毎に個別に設定する。例えば、記憶装置12には相異なる複数の設定データが記憶される。複数の設定データの各々には、相異なる再生パートに対応するM個の加重値Cr[p1]が登録される。各加重値Cr[p1]の数値は、設定データ毎に相違する。変数設定部34は、操作装置14に対する利用者からの指示に応じて複数の設定データの何れかを選択する。設定データの選択は、各部分行列Rm[p1]に対応する加重値Cr[p1]の設定に相当する。演算処理部33は、設定データに登録された各加重値Cr[p1]を適用した数式(8)の演算により部分行列Rm[p1]を算定する。以上の説明から理解される通り、各部分行列Rm[p1]の生成に適用される加重値Cr[p1]が、利用者からの指示に応じて変更される。なお、変数設定部34は、M個の加重値Cr[p1]の各々を、利用者からの指示に応じて個別に設定してもよい。 The variable setting unit 34 of the third embodiment individually sets the weight value Cr[p1] of formula (9) for each submatrix Rm[p1]. For example, the storage device 12 stores a plurality of different setting data. M weight values Cr[p1] corresponding to different playback parts are registered in each of the plurality of setting data. The numerical value of each weight value Cr[p1] differs for each setting data. The variable setting unit 34 selects any one of the plurality of setting data according to a user's instruction to the operating device 14. Selection of setting data corresponds to setting of weight value Cr[p1] corresponding to each submatrix Rm[p1]. The arithmetic processing unit 33 calculates the submatrix Rm[p1] by calculating the formula (8) applying each weight value Cr[p1] registered in the setting data. As understood from the above description, the weight value Cr[p1] applied to the generation of each submatrix Rm[p1] is changed in accordance with instructions from the user. Note that the variable setting unit 34 may individually set each of the M weight values Cr[p1] according to instructions from the user.
 情報生成部32は、第1実施形態と同様に、状態変数X[t]と状態コストQ[s1]と制御コストR[p1]とを適用した数式(7a)~(7d)の演算により、再生パートの制御情報U[t]を算定する(Sa5)。すなわち、情報生成部32は、M個の再生パートの各々について制御情報Um(U1[t]~UM[t])を生成する。したがって、第3実施形態においても第1実施形態と同様の効果が実現される。 Similarly to the first embodiment, the information generation unit 32 calculates the following equations (7a) to (7d) using the state variable X[t], state cost Q[s1], and control cost R[p1]. Calculate the control information U[t] of the playback part (Sa5). That is, the information generation unit 32 generates control information Um (U1[t] to UM[t]) for each of the M reproduction parts. Therefore, the third embodiment also achieves the same effects as the first embodiment.
 以上に説明した通り、第3実施形態においては、演奏パートの総数Nと再生パートの総数Mとが一般化される。演奏パートの総数Nが2以上である場合には、複数の演奏者(演奏パート)の各々について演奏情報S[t]が予測される。したがって、複数の演奏者による演奏に応じて再生パートの再生を適切に制御できる。また、再生パートの総数Mが2以上である場合には、複数の再生パートの各々について制御情報Um[t]が生成される。したがって、演奏者による演奏に応じて複数の再生パートの各々の再生を制御できる。演奏パートの総数Nおよび再生パートの総数Mの双方が2以上である場合には、複数の演奏者による演奏に応じて複数の再生パートの各々の再生を制御できる。 As explained above, in the third embodiment, the total number N of performance parts and the total number M of reproduction parts are generalized. When the total number N of performance parts is 2 or more, performance information S[t] is predicted for each of the plurality of performers (performance parts). Therefore, the reproduction of the reproduction parts can be appropriately controlled according to the performances by a plurality of performers. Further, when the total number M of reproduction parts is 2 or more, control information Um[t] is generated for each of the plurality of reproduction parts. Therefore, the reproduction of each of the plurality of reproduction parts can be controlled according to the performance by the performer. When both the total number N of performance parts and the total number M of reproduction parts are 2 or more, the reproduction of each of the plurality of reproduction parts can be controlled in accordance with the performances by the plurality of performers.
 第3実施形態においては、演奏パートと再生パートとの相異なる組合せに対応するN×M個の加重値Cq[s1]が制御される。また、相異なる再生パートに対応するM個の加重値Cr[p1]が制御される。前述の通り、演奏パートの演奏に対する再生パートの再生の関係は、加重値Cq[s1]および加重値Cr[p1]に依存する。したがって、N個の演奏パートの各々とM個の再生パートの各々との関係を、加重値Cq[s1]および加重値Cr[p1]に応じて詳細に制御できる。すなわち、再生パートの再生が演奏パートの演奏に連動する度合を、各演奏パートと各再生パートとの組合せ毎に個別に制御できる。例えば、特定の再生パートの再生を特定の演奏パートの演奏に強く連動させる一方、他の再生パートの再生を当該演奏パートの演奏に殆ど連動させない、といった多様な制御が実現される。 In the third embodiment, N×M weight values Cq[s1] corresponding to different combinations of performance parts and playback parts are controlled. Furthermore, M weight values Cr[p1] corresponding to different playback parts are controlled. As described above, the relationship between the reproduction of the playback part and the performance of the performance part depends on the weight value Cq[s1] and the weight value Cr[p1]. Therefore, the relationship between each of the N performance parts and each of the M playback parts can be controlled in detail according to the weighting value Cq[s1] and the weighting value Cr[p1]. That is, the degree to which the reproduction of the reproduction part is linked to the performance of the performance part can be individually controlled for each combination of each performance part and each reproduction part. For example, various types of control can be realized, such as strongly linking the playback of a specific playback part with the performance of a specific performance part, while hardly linking the playback of other playback parts with the performance of the performance part.
 第3実施形態の例示から理解される通り、予測制御部30は、少なくとも1の演奏者について目標楽曲内の演奏位置s1[t]を含む演奏情報S[t]を予測する予測モデルを利用したモデル予測制御により、目標楽曲の少なくとも1の再生パートについて制御情報U[t]を生成する。 As understood from the example of the third embodiment, the prediction control unit 30 uses a prediction model that predicts the performance information S[t] including the performance position s1[t] in the target music piece for at least one performer. Control information U[t] is generated for at least one playback part of the target song using model predictive control.
D:変形例
 以上に例示した各態様に付加される具体的な変形の態様を以下に例示する。前述の実施形態および以下に例示する変形例から任意に選択された複数の態様を、相互に矛盾しない範囲で適宜に併合してもよい。
D: Modifications Specific modifications added to each of the embodiments exemplified above will be exemplified below. A plurality of aspects arbitrarily selected from the above-described embodiment and the modified examples illustrated below may be combined as appropriate to the extent that they do not contradict each other.
(1)前述の各形態においては、記憶装置12に記憶された音響信号Zを再生パートの再生に利用したが、再生パートの楽音を再生する方法は、以上の例示に限定されない。例えば、再生制御部40が第2楽譜データD2を音源部に順次に供給することで、音響信号Zを生成してもよい。演奏者による演奏パートの演奏に並行して第2楽譜データD2が音源部に供給される。音源部が生成した音響信号Zを放音装置15に供給することで再生パートの楽音が再生される。すなわち、再生制御部40は、第2楽譜データD2を処理するシーケンサとして機能する。音源部は、ハードウェア音源またはソフトウェア音源である。再生制御部40は、音源部に対する第2楽譜データD2の供給の時点を制御情報U[t]に応じて制御する。 (1) In each of the above embodiments, the acoustic signal Z stored in the storage device 12 is used to reproduce the reproduction part, but the method of reproducing the musical tone of the reproduction part is not limited to the above examples. For example, the audio signal Z may be generated by the reproduction control section 40 sequentially supplying the second musical score data D2 to the sound source section. The second musical score data D2 is supplied to the sound source section in parallel with the performance of the performance part by the performer. By supplying the acoustic signal Z generated by the sound source section to the sound emitting device 15, the musical tone of the reproduction part is reproduced. That is, the playback control section 40 functions as a sequencer that processes the second musical score data D2. The sound source section is a hardware sound source or a software sound source. The reproduction control unit 40 controls the timing of supplying the second musical score data D2 to the sound source unit according to the control information U[t].
(2)前述の各形態においては、再生パートの楽音を放音装置15により再生する形態を例示したが、再生パートの再生の形態は以上の例示に限定されない。例えば、再生制御部40は、自動演奏が可能な電子楽器に再生パートの楽音を再生させてもよい。すなわち、再生制御部40は、制御情報U[t]に応じて電子楽器を制御することで、再生パートの自動演奏を電子楽器に実行させる。 (2) In each of the above-mentioned embodiments, the musical tone of the reproduction part is reproduced by the sound emitting device 15, but the reproduction part is not limited to the above examples. For example, the playback control unit 40 may cause an electronic musical instrument capable of automatic performance to play the musical tone of the playback part. That is, the playback control unit 40 causes the electronic musical instrument to automatically perform the playback part by controlling the electronic musical instrument according to the control information U[t].
 また、再生制御部40は、例えば、再生パートに関する動画(以下「目標動画」という)の再生を制御してもよい。目標動画は、目標楽曲の再生パートを特定の演奏者が演奏する様子を表す動画である。例えば、現実の演奏者が楽器により再生パートを演奏する様子を撮影した撮影動画、または、仮想的な演奏者が再生パートを演奏する様子を画像処理により生成した合成動画が、目標動画として想定される。なお、目標動画における音響の有無は不問である。 Furthermore, the playback control unit 40 may, for example, control the playback of a video related to the playback part (hereinafter referred to as "target video"). The target video is a video that shows a specific performer playing the playback part of the target song. For example, the target video may be a captured video of a real performer playing the playback part on an instrument, or a composite video generated by image processing of a virtual performer playing the playback part. Ru. Note that it does not matter whether or not there is sound in the target video.
 目標動画を表す動画データが記憶装置12に記憶される。再生制御部40は、動画データの出力により表示装置13に目標動画を表示する。再生制御部40は、制御情報U[t]に応じて目標動画の再生を制御する。具体的には、再生制御部40は、制御情報U[t]から再生情報P[t]を生成し、目標動画のうち当該再生情報P[t]に対応する部分を表示装置13に表示する。すなわち、演奏者による演奏パートの演奏に連動して、目標動画の再生位置p1[t]と再生速度p2[t]とが制御される。 Video data representing the target video is stored in the storage device 12. The playback control unit 40 displays the target moving image on the display device 13 by outputting the moving image data. The playback control unit 40 controls playback of the target video according to the control information U[t]. Specifically, the playback control unit 40 generates playback information P[t] from the control information U[t], and displays a portion of the target video that corresponds to the playback information P[t] on the display device 13. . That is, the playback position p1[t] and playback speed p2[t] of the target moving image are controlled in conjunction with the performance of the performance part by the performer.
 目標動画が表す仮想的な演奏者(以下「仮想演奏者」という)は、例えば、仮想空間内に存在するアバターである。例えば、再生制御部40は、仮想空間内の仮想カメラにより撮影された仮想演奏者および背景画像を、表示装置13に表示する。表示装置13は、利用者の頭部に装着されるHMD(Head Mounted Display)に搭載されてよい。表示装置13がHMDに搭載された形態では、利用者の頭部の挙動(例えば位置および方向)に応じて、仮想空間内における仮想カメラの位置および方向が動的に制御される。したがって、利用者は、自身の頭部を適宜に移動することで、仮想空間内の任意の位置および方向から仮想演奏者を視認できる。 The virtual performer (hereinafter referred to as "virtual performer") represented by the target video is, for example, an avatar existing in the virtual space. For example, the playback control unit 40 displays on the display device 13 a virtual performer and a background image photographed by a virtual camera in the virtual space. The display device 13 may be installed in an HMD (Head Mounted Display) that is worn on the user's head. When the display device 13 is installed in the HMD, the position and direction of the virtual camera in the virtual space are dynamically controlled according to the behavior (eg, position and direction) of the user's head. Therefore, by moving their head appropriately, the user can visually recognize the virtual performer from any position and direction in the virtual space.
 仮想空間内の仮想演奏者を表示するための動画データは、例えば仮想演奏者の骨格および関節の動作を表す動作データを含む。動作データは、例えば骨格および関節の各々について相対的な角度および位置の時間的な変化を指定する。再生制御部40は、動作データが表す骨格および関節の移動を、制御情報U[t](または再生情報P[t])に応じて制御する。また、再生制御部40は、動作データにより指定される姿勢の仮想演奏者を仮想空間内のオブジェクトとして生成する。例えば、仮想空間内の仮想演奏者は、動作データのうち再生情報P[t]に対応する部分が指定する骨格および関節に対応する姿勢に制御される。すなわち、再生制御部40は、動作データが指定する骨格および関節の移動の速度を、制御情報U[t]に応じて変化させる。したがって、現実空間内の演奏者による演奏に連動して仮想空間内の仮想演奏者による演奏が進行する。立体的な仮想演奏者の生成には、例えばモデリングおよびテクスチャリング等の画像処理が利用される。そして、再生制御部40は、仮想空間内の仮想演奏者を仮想カメラにより撮影した平面的な画像(目標動画)を、例えばレンダリング等の画像処理により生成する。前述の通り、仮想カメラの位置および方向は、利用者の頭部の挙動に応じて変化する。再生制御部40は、以上の処理により生成された目標動画を表示装置13に表示する。したがって、前述の通り、利用者は、再生パートを仮想演奏者が演奏する様子を、仮想空間内の任意の位置および方向から視認できる。例えば、HMDを装着した演奏者は、自身による演奏パートの演奏に連動して仮想演奏者が再生パートを演奏する様子を、仮想空間内の任意の位置および方向から確認できる。なお、以上の説明においては、再生パートを演奏する仮想的な演奏者を表示したが、例えば再生パートの進行に連動してダンスする仮想的なダンサーを表示装置13に表示してもよい。仮想的な演奏者および仮想的なダンサーは、仮想的な実演者として包括的に表現される。また、以上の説明では、表示装置13が利用者の頭部に装着される形態を例示したが、利用者の近傍に固定的に設置された表示装置13に、仮想空間内の仮想的な実演者が表示されてもよい。 The video data for displaying the virtual performer in the virtual space includes, for example, motion data representing the movements of the skeleton and joints of the virtual performer. The motion data specifies, for example, changes in relative angle and position over time for each of the skeleton and joints. The reproduction control unit 40 controls movement of the skeleton and joints represented by the motion data according to control information U[t] (or reproduction information P[t]). Furthermore, the playback control unit 40 generates a virtual performer in a posture specified by the motion data as an object in the virtual space. For example, the virtual performer in the virtual space is controlled to have a posture corresponding to the skeleton and joints specified by the portion of the motion data that corresponds to the playback information P[t]. That is, the reproduction control unit 40 changes the speed of movement of the skeleton and joints specified by the motion data according to the control information U[t]. Therefore, the performance by the virtual performer in the virtual space progresses in conjunction with the performance by the performer in the real space. For example, image processing such as modeling and texturing is used to generate a three-dimensional virtual performer. Then, the playback control unit 40 generates a planar image (target moving image) of the virtual performer in the virtual space captured by the virtual camera, through image processing such as rendering, for example. As mentioned above, the position and direction of the virtual camera change depending on the behavior of the user's head. The playback control unit 40 displays the target video generated by the above processing on the display device 13. Therefore, as described above, the user can view the virtual performer playing the playback part from any position and direction in the virtual space. For example, a performer wearing an HMD can check from any position and direction in the virtual space how a virtual performer is playing a playback part in conjunction with the performer's performance of the playback part. Note that in the above description, a virtual performer who plays the playback part is displayed, but for example, a virtual dancer who dances in conjunction with the progress of the playback part may be displayed on the display device 13. Virtual performers and virtual dancers are collectively represented as virtual performers. In the above description, the display device 13 is attached to the user's head. person may be displayed.
 以上の例示から理解される通り、再生制御部40は、再生パートの再生を制御する要素として包括的に表現される。「再生パートの再生」には、再生パートの楽音の再生と再生パートの動画(対象動画)の再生とが包含される。表示装置13および放音装置15は、再生パートを再生する再生装置である。 As understood from the above example, the playback control unit 40 is comprehensively expressed as an element that controls the playback of the playback part. "Reproduction of the reproduction part" includes reproduction of the musical tone of the reproduction part and reproduction of the moving image (target moving image) of the reproduction part. The display device 13 and the sound emitting device 15 are playback devices that play back the playback part.
 第1実施形態から第3実施形態によれば、演奏者による演奏に応じて再生パートに関する楽音の再生を制御できる。他方、本変形例によれば、演奏者による演奏に応じて再生パートに関する動画の再生を制御できる。 According to the first to third embodiments, it is possible to control the reproduction of musical tones related to the reproduction part according to the performance by the performer. On the other hand, according to this modification, it is possible to control the reproduction of the moving image related to the reproduction part in accordance with the performance by the performer.
(3)前述の各形態においては、演奏者による演奏を表す演奏データEが再生制御システム10に供給される形態を例示したが、演奏者による演奏に応じた入力情報は、演奏データEに限定されない。例えば、演奏者が演奏した楽音の波形を表す信号(以下「演奏信号」という)が、演奏データEに代えて再生制御システム10に供給されてもよい。演奏信号は、演奏者による演奏で楽器から発音される楽音を、マイクロホンにより収音することで生成される信号である。 (3) In each of the above embodiments, the performance data E representing the performance by the performer is supplied to the playback control system 10, but the input information corresponding to the performance by the performer is limited to the performance data E. Not done. For example, a signal representing the waveform of a musical tone played by a performer (hereinafter referred to as a "performance signal") may be supplied to the playback control system 10 instead of the performance data E. The performance signal is a signal generated by collecting musical tones produced by a musical instrument during a performance by a performer using a microphone.
 演奏予測部31は、演奏信号の解析により演奏情報S[t]を生成する。例えば、解析部311は、演奏信号を解析することで演奏時刻t[k]と演奏位置s[k]とを推定する。予測部312は、第1実施形態と同様に、予測モデルを利用して演奏情報S[t]を生成する。以上の構成においても、前述の各形態と同様の効果が実現される。 The performance prediction unit 31 generates performance information S[t] by analyzing the performance signal. For example, the analysis unit 311 estimates the performance time t[k] and the performance position s[k] by analyzing the performance signal. The prediction unit 312 generates performance information S[t] using a prediction model, as in the first embodiment. The above configuration also achieves the same effects as those of the above-described embodiments.
(4)前述の各形態においては、演奏情報S[t]の予測に利用される予測モデルとして状態空間モデルを例示したが、予想モデルの形態は以上の例示に限定されない。例えば、深層ニューラルネットワークまたは隠れマルコフモデル等の統計モデルが、予測モデルとして利用されてもよい。 (4) In each of the above-mentioned embodiments, the state space model is exemplified as the prediction model used to predict the performance information S[t], but the form of the prediction model is not limited to the above examples. For example, a statistical model such as a deep neural network or a hidden Markov model may be used as a predictive model.
(5)前述の各形態においては、演奏情報S[t]が演奏位置s1[t]と演奏速度s2[t]とを含む形態を例示したが、演奏情報S[t]の形式は以上の例示に限定されない。例えば、演奏速度s2[t]は省略されてよい。すなわち、演奏情報S[t]は、演奏位置s1[t]を含む情報として包括的に表現される。再生情報P[t]についても同様に、再生位置p1[t]と再生速度p2[t]とを含む情報には限定されない。例えば、再生速度p2[t]は省略されてよい。すなわち、再生情報P[t]は、再生位置p1[t]を含む情報として包括的に表現される。 (5) In each of the above embodiments, the performance information S[t] includes the performance position s1[t] and the performance speed s2[t], but the format of the performance information S[t] is as follows. Not limited to examples. For example, the performance speed s2[t] may be omitted. That is, the performance information S[t] is comprehensively expressed as information including the performance position s1[t]. Similarly, the reproduction information P[t] is not limited to information including the reproduction position p1[t] and the reproduction speed p2[t]. For example, the playback speed p2[t] may be omitted. That is, the playback information P[t] is comprehensively expressed as information including the playback position p1[t].
 以上の説明から理解される通り、状態変数X[t]および制御情報U[t]の形式も、前述の各形態における例示には限定されない。例えば、状態変数X[t]から速度誤差x2[t]は省略されてよい。また、制御情報U[t]から再生速度u2[t]は省略されてよい。 As understood from the above description, the formats of the state variable X[t] and the control information U[t] are not limited to the examples in each of the above-mentioned forms. For example, the speed error x2[t] may be omitted from the state variable X[t]. Furthermore, the playback speed u2[t] may be omitted from the control information U[t].
(6)前述の各形態においては、予測制御部30が、ひとつの予測モデルを利用したモデル予測制御により制御情報U[t]を生成する形態を例示したが、相異なる複数の予測モデルが選択的に利用されてもよい。予測制御部30は、複数の予測モデルの何れかを利用して、目標楽曲の1以上の再生パートについて制御情報U[t]を生成する。 (6) In each of the above embodiments, the predictive control unit 30 generates the control information U[t] by model predictive control using one predictive model, but a plurality of different predictive models are selected. It may be used for The prediction control unit 30 generates control information U[t] for one or more playback parts of the target song using any one of the plurality of prediction models.
 例えば、演奏者毎に予測モデルが用意される。各演奏者の予測モデルは、当該演奏者の演奏の傾向が反映された状態空間モデルである。予測制御部30は、複数の予測モデルのうち演奏パートの演奏者に対応する予測モデルを利用して、目標楽曲の1以上の再生パートについて制御情報U[t]を生成する。なお、複数の演奏者の集合毎(例えば楽団毎)に予測モデルが用意されてもよい。 For example, a prediction model is prepared for each performer. Each performer's prediction model is a state space model that reflects the performance tendency of the performer. The predictive control unit 30 generates control information U[t] for one or more playback parts of the target song by using a prediction model corresponding to the performer of the performance part from among the plurality of prediction models. Note that a prediction model may be prepared for each set of a plurality of performers (for example, for each orchestra).
 また、例えば目標楽曲の属性毎に予測モデルが用意されてもよい。目標楽曲の属性は、例えば目標楽曲の音楽ジャンル(例えばロック、ポップス、ジャズ、トランスまたはヒップホップ等)、または音楽的な印象(例えば「明るい印象の楽曲」「暗い印象の楽曲」等)である。予測制御部30は、複数の予測モデルのうち目標楽曲の属性に対応する予測モデルを利用して、当該目標楽曲の1以上の再生パートについて制御情報U[t]を生成する。 Additionally, a prediction model may be prepared for each attribute of the target song, for example. The attributes of the target song are, for example, the music genre of the target song (for example, rock, pop, jazz, trance, hip-hop, etc.) or the musical impression (for example, "a song with a bright impression", "a song with a dark impression", etc.). . The prediction control unit 30 generates control information U[t] for one or more playback parts of the target song by using a prediction model corresponding to the attribute of the target song among the plurality of prediction models.
 以上の構成においては、楽曲データDおよび演奏データEが共通する場合でも、予測モデルの選択条件(例えば演奏者または属性)に応じて再生パートの再生を多様に制御できる。 In the above configuration, even when the music data D and the performance data E are common, the reproduction of the reproduction part can be controlled in various ways according to the selection conditions of the prediction model (for example, performer or attribute).
(7)例えば携帯電話機またはスマートフォン等の端末装置との間で通信するサーバ装置により、再生制御システム10が実現されてもよい。例えば、再生制御システム10の予測制御部30は、端末装置から受信した演奏データE(または演奏信号)を処理することで制御情報U[t]を生成する。例えば再生制御システム10の記憶装置12に記憶された楽曲データD、または、端末装置から送信された楽曲データDが、制御情報U[t]の生成に利用される。再生制御部40は、音響信号Z(または対象動画の動画データ)のうち制御情報U[t]に応じた部分を端末装置に送信する。なお、再生制御部40が端末装置に搭載された構成においては、制御情報U[t]が再生制御システム10から端末装置に送信されてもよい。 (7) For example, the playback control system 10 may be realized by a server device that communicates with a terminal device such as a mobile phone or a smartphone. For example, the predictive control unit 30 of the playback control system 10 generates the control information U[t] by processing the performance data E (or performance signal) received from the terminal device. For example, the music data D stored in the storage device 12 of the playback control system 10 or the music data D transmitted from the terminal device is used to generate the control information U[t]. The playback control unit 40 transmits a portion of the audio signal Z (or video data of the target video) that corresponds to the control information U[t] to the terminal device. Note that in a configuration in which the playback control unit 40 is installed in a terminal device, the control information U[t] may be transmitted from the playback control system 10 to the terminal device.
(8)前述の各形態に係る再生制御システム10の機能は、前述の通り、制御装置11を構成する単数または複数のプロセッサと、記憶装置12に記憶されたプログラムとの協働により実現される。以上に例示したプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性(non-transitory)の記録媒体であり、CD-ROM等の光学式記録媒体(光ディスク)が好例であるが、半導体記録媒体または磁気記録媒体等の公知の任意の形式の記録媒体も包含される。なお、非一過性の記録媒体とは、一過性の伝搬信号(transitory, propagating signal)を除く任意の記録媒体を含み、揮発性の記録媒体も除外されない。また、配信装置が通信網を介してプログラムを配信する構成では、当該配信装置においてプログラムを記憶する記録媒体が、前述の非一過性の記録媒体に相当する。 (8) As described above, the functions of the playback control system 10 according to each of the above embodiments are realized through cooperation between one or more processors forming the control device 11 and the program stored in the storage device 12. . The programs exemplified above may be provided in a form stored in a computer-readable recording medium and installed on a computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but any known recording medium such as a semiconductor recording medium or a magnetic recording medium is used. Also included are recording media in the form of. Note that the non-transitory recording medium includes any recording medium excluding transitory, propagating signals, and does not exclude volatile recording media. Furthermore, in a configuration in which a distribution device distributes a program via a communication network, a recording medium that stores a program in the distribution device corresponds to the above-mentioned non-transitory recording medium.
E:付記
 以上に例示した形態から、例えば以下の構成が把握される。
E: Supplementary Note From the configurations exemplified above, for example, the following configurations can be understood.
 本開示のひとつの態様(態様1)に係る再生制御方法は、少なくとも1の演奏者について楽曲内の演奏位置を含む演奏情報を予測する予測モデルを利用したモデル予測制御により、前記楽曲の少なくとも1の再生パートについて制御情報を生成し、前記少なくとも1の再生パートについて生成された制御情報により、前記楽曲における当該再生パートの再生を制御する。以上の態様によれば、制御情報の生成にモデル予測制御が利用されるから、演奏者による演奏に応じて再生パートの再生を適切に制御できる。 A playback control method according to one aspect (aspect 1) of the present disclosure includes a method for controlling at least one of the pieces of music through model predictive control using a prediction model that predicts performance information including a performance position in the piece of music for at least one performer. control information is generated for the playback part, and the playback of the playback part in the music piece is controlled by the control information generated for the at least one playback part. According to the aspect described above, since model predictive control is used to generate the control information, it is possible to appropriately control the reproduction of the reproduction part according to the performance by the performer.
 「演奏情報」は、演奏位置を含む任意の形式のデータである。例えば、演奏情報は、演奏位置と演奏速度とを含む。演奏位置は、楽曲内において演奏者が演奏している位置である。演奏速度は、演奏者が楽曲を演奏する速度(テンポ)である。他方、「制御情報」は、再生パートの再生を制御するための任意の形式のデータである。例えば、制御情報は、再生位置の変化量と再生速度の変化量とを含む。 "Performance information" is data in any format including the performance position. For example, the performance information includes a performance position and a performance speed. The performance position is the position in the song where the performer is playing. The performance speed is the speed (tempo) at which the performer plays the music. On the other hand, "control information" is data in any format for controlling reproduction of a reproduction part. For example, the control information includes the amount of change in playback position and the amount of change in playback speed.
 演奏情報の予測に利用される情報および処理は任意である。例えば、演奏者による演奏を表す演奏データ、または演奏者が演奏した楽音の波形を表す演奏信号を、演奏情報の予測に利用する形態が想定される。また、例えば利用者が演奏する様子を撮像した動画が、演奏情報の予測に利用されてもよい。演奏情報の予測には各種の予測モデルが利用される。予測モデルとしては、例えば、カルマンフィルタ等の状態空間モデルが利用される。 The information and processing used to predict performance information are arbitrary. For example, it is assumed that performance data representing a performance by a performer or a performance signal representing a waveform of a musical tone played by a performer is used for predicting performance information. Further, for example, a video of a user playing a performance may be used for predicting performance information. Various prediction models are used to predict performance information. As the prediction model, for example, a state space model such as a Kalman filter is used.
 「再生パート」は、楽曲を構成する複数の音楽パートのうち制御情報による制御対象となる音楽パートである。「再生パートの再生」は、当該再生パートに関する音響の再生(例えば自動演奏)のほか、当該再生パートに関する映像の再生を含む。 The "playback part" is a music part that is to be controlled by the control information among the plurality of music parts that make up the song. "Reproduction of a playback part" includes not only playback of audio (for example, automatic performance) related to the playback part, but also playback of video related to the playback part.
 態様1の具体例(態様2)において、前記少なくとも1の演奏者は、複数の演奏者であり、前記演奏情報の予測においては、前記複数の演奏者の各々について前記演奏情報を予測する。以上の態様によれば、複数の演奏者による演奏に応じて再生パートの再生を適切に制御できる。 In a specific example of Aspect 1 (Aspect 2), the at least one performer is a plurality of performers, and in predicting the performance information, the performance information is predicted for each of the plurality of performers. According to the above aspect, it is possible to appropriately control the reproduction of the reproduction part according to performances by a plurality of performers.
 態様1または態様2の具体例(態様3)において、前記少なくとも1の再生パートは、複数の再生パートであり、前記制御情報の生成においては、前記複数の再生パートの各々について前記制御情報を生成する。以上の態様によれば、演奏者による演奏に応じて複数の再生パートの各々の再生を制御できる。 In a specific example of aspect 1 or aspect 2 (aspect 3), the at least one reproduction part is a plurality of reproduction parts, and in generating the control information, the control information is generated for each of the plurality of reproduction parts. do. According to the above aspect, the reproduction of each of the plurality of reproduction parts can be controlled according to the performance by the performer.
 態様1から態様3の何れかの具体例(態様4)において、前記再生の制御においては、前記楽曲の前記少なくとも1の再生パートに関する楽音の再生を制御する。以上の態様によれば、演奏者による演奏に応じて再生パートに関する楽音の再生を制御できる。 In a specific example of any one of Aspects 1 to 3 (Aspect 4), the playback control includes controlling the playback of musical tones related to the at least one playback part of the music piece. According to the above aspect, it is possible to control the reproduction of musical tones related to the reproduction part according to the performance by the performer.
 態様1から態様3の何れかの具体例(態様5)において、前記再生の制御においては、前記楽曲の前記少なくとも1の再生パートに関する動画の再生を制御する。以上の態様によれば、演奏者による演奏に応じて再生パートに関する動画の再生を制御できる。動画は、例えば仮想空間内の仮想的な実演者(例えば演奏者またはダンサー)が再生パートを演奏する動画である。 In a specific example of any one of Aspects 1 to 3 (Aspect 5), the playback control includes controlling the playback of a video related to the at least one playback part of the song. According to the above aspect, it is possible to control the reproduction of the moving image related to the reproduction part according to the performance by the performer. The moving image is, for example, a moving image in which a virtual performer (for example, a performer or a dancer) in a virtual space performs a reproduction part.
 態様1から態様5の何れかの具体例(態様6)において、前記モデル予測制御においては、前記少なくとも1の演奏者について予測された演奏情報と、前記少なくとも1の再生パートの再生位置を含む再生情報と、の誤差を表す状態変数を含むコストが低減されるように、前記少なくとも1の再生パートについて制御情報を生成する。以上の形態によれば、演奏情報と再生情報との誤差を表す状態変数を含むコストが低減されるように制御情報が生成される。したがって、再生パートの再生を演奏者による演奏に連動させることが可能である。 In the specific example of any one of aspects 1 to 5 (aspect 6), in the model predictive control, playback including performance information predicted for the at least one performer and a playback position of the at least one playback part. Generating control information for the at least one playback part such that a cost including a state variable representing an error in the information and the at least one playback part is reduced. According to the above embodiment, the control information is generated so that the cost including the state variable representing the error between the performance information and the playback information is reduced. Therefore, it is possible to link the reproduction of the reproduction part to the performance by the performer.
 「再生情報」は、再生位置を含む任意の形式のデータである。例えば、再生情報は、再生位置と再生速度とを含む。再生位置は、楽曲内において再生されている位置である。再生速度は、楽曲が再生される速度である。 "Reproduction information" is data in any format including the reproduction position. For example, the playback information includes a playback position and a playback speed. The playback position is the position within the song where the song is being played. The playback speed is the speed at which the song is played.
 態様6の具体例(態様7)において、さらに、前記コストに含まれる少なくとも1の変数を、利用者からの指示に応じて設定する。以上の態様によれば、コストに関する変数が利用者からの指示に応じて設定されるから、再生パートの再生に利用者の意図を反映させることができる。 In the specific example of Aspect 6 (Aspect 7), at least one variable included in the cost is further set in accordance with an instruction from the user. According to the above aspect, since the variable related to the cost is set according to the instruction from the user, the user's intention can be reflected in the reproduction of the reproduction part.
 コスト(目的関数)の「変数」は、当該コストに関する演算に適用される各種のである。具体的には、目的変数が状態コストと制御コストとを含む形態においては、状態コストに対する第1加重値と、制御コストに対する第2加重値とが、「変数」として利用者からの指示に応じて設定される。 The "variables" of a cost (objective function) are various variables that are applied to calculations related to the cost. Specifically, in a form where the objective variable includes a state cost and a control cost, the first weight value for the state cost and the second weight value for the control cost are set as "variables" according to instructions from the user. is set.
 態様6の具体例(態様8)において、前記コストは、前記状態変数および前記制御情報と、状態コストおよび制御コストとを含み、前記状態コストは、前記状態変数に関するコストであり、前記制御コストは、前記再生情報の時間的な変化に関するコストである。以上の態様においては、状態変数に関する状態コストと、再生情報の時間的な変化に関する制御コストとがコストに含まれる。状態コストの低減により、演奏情報と再生情報との誤差が有効に低減される。また、制御コストの低減により、再生情報の過度な変化が抑制される。したがって、演奏位置と再生位置との誤差と、再生情報の過度な変化とを有効に低減できる。 In a specific example of aspect 6 (aspect 8), the cost includes the state variable and the control information, a state cost and a control cost, the state cost is a cost related to the state variable, and the control cost is , is a cost related to temporal changes in the reproduction information. In the above aspect, the costs include state costs related to state variables and control costs related to temporal changes in reproduction information. By reducing the state cost, the error between performance information and playback information is effectively reduced. Further, by reducing control costs, excessive changes in reproduction information are suppressed. Therefore, errors between the performance position and the playback position and excessive changes in playback information can be effectively reduced.
 態様8の具体例(態様9)において、さらに、第1加重値および第2加重値を設定し、前記状態コストは、前記第1加重値により加重されたコストであり、前記制御コストは、前記第2加重値により加重されたコストである。以上の態様においては、状態コストが第1加重値により加重され、制御コストが第2加重値により加重される。したがって、第1加重値および第2加重値の設定に応じて、演奏者による演奏と再生パートの再生との関係を変更できる。 In the specific example of aspect 8 (aspect 9), a first weight value and a second weight value are further set, the state cost is a cost weighted by the first weight value, and the control cost is a cost weighted by the first weight value, and the control cost is a cost weighted by the first weight value. This is the cost weighted by the second weight value. In the above aspect, the state cost is weighted by the first weight value, and the control cost is weighted by the second weight value. Therefore, the relationship between the performance by the performer and the reproduction of the reproduction part can be changed according to the settings of the first weight value and the second weight value.
 態様9の具体例(態様10)において、前記第1加重値および前記第2加重値の設定においては、利用者からの指示に応じて前記第1加重値および前記第2加重値を変更する。以上の態様においては、第1加重値および第2加重値の各々が利用者からの指示に応じて設定される。したがって、演奏者による演奏と再生パートの再生との関係を利用者が変更できる。 In a specific example of aspect 9 (aspect 10), in setting the first weight value and the second weight value, the first weight value and the second weight value are changed according to instructions from the user. In the above aspect, each of the first weight value and the second weight value is set according to an instruction from the user. Therefore, the user can change the relationship between the performance by the performer and the reproduction of the reproduction part.
 態様1から態様10の何れかの具体例(態様11)において、前記演奏情報の予測においては、演奏情報を予測するための複数の予測モデルの何れかを利用して、前記少なくとも1の演奏者について演奏情報を予測する。以上の態様によれば、予測モデルの選択条件に応じて再生パートの再生を多様に制御できる In a specific example of any one of aspects 1 to 10 (aspect 11), in predicting the performance information, the performance information is predicted by using any of a plurality of prediction models for predicting the performance information. Predict performance information about. According to the above aspect, the playback of the playback part can be controlled in various ways according to the selection conditions of the prediction model.
 本開示のひとつの態様(態様12)に係る再生制御システムは、少なくとも1の演奏者について楽曲内の演奏位置を含む演奏情報を予測する予測モデルを利用したモデル予測制御により、前記楽曲の少なくとも1の再生パートについて制御情報を生成する予測制御部と、前記少なくとも1の再生パートについて生成された制御情報により、前記楽曲における当該再生パートの再生を制御する再生制御部とを具備する。 A playback control system according to one aspect (aspect 12) of the present disclosure performs model prediction control using a prediction model that predicts performance information including a performance position in a song for at least one performer. a predictive control unit that generates control information for the playback part; and a playback control unit that controls playback of the playback part in the song based on the control information generated for the at least one playback part.
 本開示のひとつの態様(態様13)に係るプログラムは、少なくとも1の演奏者について楽曲内の演奏位置を含む演奏情報を予測する予測モデルを利用したモデル予測制御により、前記楽曲の少なくとも1の再生パートについて制御情報を生成する予測制御部、および、前記少なくとも1の再生パートについて生成された制御情報により、前記楽曲における当該再生パートの再生を制御する再生制御部、としてコンピュータシステムを機能させる。 A program according to one aspect (aspect 13) of the present disclosure performs playback of at least one piece of music through model predictive control using a prediction model that predicts performance information including a performance position in a piece of music for at least one performer. The computer system is caused to function as a predictive control unit that generates control information for the part, and a playback control unit that controls playback of the playback part in the music piece using the control information generated for the at least one playback part.
 本開示のひとつの態様(態様14)に係る情報処理方法は、少なくとも1の演奏者について楽曲内の演奏位置を含む演奏情報を予測する予測モデルを利用したモデル予測制御により、前記楽曲の少なくとも1の再生パートについて制御情報を生成し、動作データが表す骨格および関節の移動を前記制御情報に応じて制御し、前記制御された骨格および関節に対応する姿勢の仮想的な実演者を仮想空間内に生成し、利用者の頭部の挙動に応じて位置および方向が制御される仮想カメラにより前記仮想空間を撮像した画像を表示装置に表示する。以上の態様によれば、制御情報の生成にモデル予測制御が利用されるから、演奏者による演奏に応じて仮想的な実演者の動作を適切に制御できる。 An information processing method according to one aspect (aspect 14) of the present disclosure provides a model predictive control using a predictive model that predicts performance information including a performance position in a song for at least one performer. generates control information for the playback part, controls the movement of the skeleton and joints represented by the motion data according to the control information, and moves a virtual demonstrator in a posture corresponding to the controlled skeleton and joints in a virtual space. An image of the virtual space captured by a virtual camera whose position and direction are controlled according to the behavior of the user's head is displayed on the display device. According to the above aspect, since model predictive control is used to generate the control information, it is possible to appropriately control the movements of the virtual performer in accordance with the performance by the performer.
100…演奏システム、10…再生制御システム、11…制御装置、12…記憶装置、13…表示装置、14…操作装置、15…放音装置、20…鍵盤楽器、30…予測制御部、31…演奏予測部、311…解析部、312…予測部、32…情報生成部、33…演算処理部、34…変数設定部、40…再生制御部。 100... Performance system, 10... Playback control system, 11... Control device, 12... Storage device, 13... Display device, 14... Operating device, 15... Sound emitting device, 20... Keyboard instrument, 30... Prediction control unit, 31... Performance prediction section, 311... Analysis section, 312... Prediction section, 32... Information generation section, 33... Arithmetic processing section, 34... Variable setting section, 40... Playback control section.

Claims (14)

  1.  少なくとも1の演奏者について楽曲内の演奏位置を含む演奏情報を予測する予測モデルを利用したモデル予測制御により、前記楽曲の少なくとも1の再生パートについて制御情報を生成し、
     前記少なくとも1の再生パートについて生成された制御情報により、前記楽曲における当該再生パートの再生を制御する
     コンピュータシステムにより実現される再生制御方法。
    Generating control information for at least one playback part of the song by model predictive control using a prediction model that predicts performance information including a performance position in the song for at least one performer;
    A playback control method realized by a computer system, wherein the control information generated for the at least one playback part controls the playback of the playback part in the music piece.
  2.  前記少なくとも1の演奏者は、複数の演奏者であり、
     前記演奏情報の予測においては、前記複数の演奏者の各々について前記演奏情報を予測する
     請求項1の再生制御方法。
    The at least one performer is a plurality of performers,
    2. The playback control method according to claim 1, wherein in predicting the performance information, the performance information is predicted for each of the plurality of performers.
  3.  前記少なくとも1の再生パートは、複数の再生パートであり、
     前記制御情報の生成においては、前記複数の再生パートの各々について前記制御情報を生成する
     請求項1または請求項2の再生制御方法。
    The at least one playback part is a plurality of playback parts,
    3. The playback control method according to claim 1, wherein in generating the control information, the control information is generated for each of the plurality of playback parts.
  4.  前記再生の制御においては、前記楽曲の前記少なくとも1の再生パートに関する楽音の再生を制御する
     請求項1から請求項3の何れかの再生制御方法。
    4. The playback control method according to claim 1, wherein the playback control includes controlling playback of musical tones related to the at least one playback part of the music piece.
  5.  前記再生の制御においては、前記楽曲の前記少なくとも1の再生パートに関する動画の再生を制御する
     請求項1から請求項3の何れかの再生制御方法。
    The playback control method according to any one of claims 1 to 3, wherein the playback control includes controlling playback of a video related to the at least one playback part of the song.
  6.  前記モデル予測制御においては、前記少なくとも1の演奏者について予測された演奏情報と、前記少なくとも1の再生パートの再生位置を含む再生情報と、の誤差を表す状態変数を含むコストが低減されるように、前記少なくとも1の再生パートについて制御情報を生成する
     請求項1から請求項5の何れかの再生制御方法。
    In the model predictive control, a cost including a state variable representing an error between performance information predicted for the at least one performer and playback information including a playback position of the at least one playback part is reduced. 6. The reproduction control method according to claim 1, further comprising: generating control information for the at least one reproduction part.
  7.  さらに、前記コストに含まれる少なくとも1の変数を、利用者からの指示に応じて設定する
     請求項6の再生制御方法。
    7. The playback control method according to claim 6, further comprising setting at least one variable included in the cost according to an instruction from a user.
  8.  前記コストは、前記状態変数および前記制御情報と、状態コストおよび制御コストとを含み、
     前記状態コストは、前記状態変数に関するコストであり、
     前記制御コストは、前記再生情報の時間的な変化に関するコストである
     請求項6の再生制御方法。
    The cost includes the state variable and the control information, a state cost and a control cost,
    The state cost is a cost related to the state variable,
    7. The playback control method according to claim 6, wherein the control cost is a cost related to temporal changes in the playback information.
  9.  さらに、第1加重値および第2加重値を設定し、
     前記状態コストは、前記第1加重値により加重されたコストであり、
     前記制御コストは、前記第2加重値により加重されたコストである
     請求項8の再生制御方法。
    Furthermore, setting a first weight value and a second weight value,
    The state cost is a cost weighted by the first weight value,
    The playback control method according to claim 8, wherein the control cost is a cost weighted by the second weight value.
  10.  前記第1加重値および前記第2加重値の設定においては、利用者からの指示に応じて前記第1加重値および前記第2加重値を変更する
     請求項9の再生制御方法。
    10. The reproduction control method according to claim 9, wherein in setting the first weight value and the second weight value, the first weight value and the second weight value are changed according to an instruction from a user.
  11.  前記演奏情報の予測においては、演奏情報を予測するための複数の予測モデルの何れかを利用して、前記少なくとも1の演奏者について演奏情報を予測する
     請求項1から請求項10の何れかの再生制御方法。
    11. In predicting the performance information, performance information is predicted for the at least one performer using any one of a plurality of prediction models for predicting performance information. Playback control method.
  12.  少なくとも1の演奏者について楽曲内の演奏位置を含む演奏情報を予測する予測モデルを利用したモデル予測制御により、前記楽曲の少なくとも1の再生パートについて制御情報を生成する予測制御部と、
     前記少なくとも1の再生パートについて生成された制御情報により、前記楽曲における当該再生パートの再生を制御する再生制御部と
     を具備する再生制御システム。
    A predictive control unit that generates control information for at least one playback part of the song by model predictive control using a predictive model that predicts performance information including a performance position in the song for at least one performer;
    A playback control system, comprising: a playback control section that controls playback of the playback part of the song based on control information generated for the at least one playback part.
  13.  少なくとも1の演奏者について楽曲内の演奏位置を含む演奏情報を予測する予測モデルを利用したモデル予測制御により、前記楽曲の少なくとも1の再生パートについて制御情報を生成する予測制御部、および、
     前記少なくとも1の再生パートについて生成された制御情報により、前記楽曲における当該再生パートの再生を制御する再生制御部、
     としてコンピュータシステムを機能させるプログラム。
    A predictive control unit that generates control information for at least one playback part of the song by model predictive control using a predictive model that predicts performance information including a performance position in the song for at least one performer, and
    a playback control unit that controls the playback of the playback part in the song based on control information generated for the at least one playback part;
    A program that makes a computer system function as a computer.
  14.  少なくとも1の演奏者について楽曲内の演奏位置を含む演奏情報を予測する予測モデルを利用したモデル予測制御により、前記楽曲の少なくとも1の再生パートについて制御情報を生成し、
     動作データが表す骨格および関節の移動を前記制御情報に応じて制御し、
     前記制御された骨格および関節に対応する姿勢の仮想的な実演者を仮想空間内に生成し、
     利用者の頭部の挙動に応じて位置および方向が制御される仮想カメラにより前記仮想空間を撮像した画像を表示装置に表示する
     コンピュータシステムにより実現される情報処理方法。
    Generating control information for at least one playback part of the song by model predictive control using a prediction model that predicts performance information including a performance position in the song for at least one performer;
    controlling movement of the skeleton and joints represented by the motion data according to the control information;
    generating a virtual demonstrator in a virtual space in a posture corresponding to the controlled skeleton and joints;
    An information processing method realized by a computer system that displays an image of the virtual space captured by a virtual camera whose position and direction are controlled according to the behavior of a user's head on a display device.
PCT/JP2022/009776 2022-03-07 2022-03-07 Reproduction control method, information processing method, reproduction control system, and program WO2023170757A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/009776 WO2023170757A1 (en) 2022-03-07 2022-03-07 Reproduction control method, information processing method, reproduction control system, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/009776 WO2023170757A1 (en) 2022-03-07 2022-03-07 Reproduction control method, information processing method, reproduction control system, and program

Publications (1)

Publication Number Publication Date
WO2023170757A1 true WO2023170757A1 (en) 2023-09-14

Family

ID=87936234

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/009776 WO2023170757A1 (en) 2022-03-07 2022-03-07 Reproduction control method, information processing method, reproduction control system, and program

Country Status (1)

Country Link
WO (1) WO2023170757A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007241181A (en) * 2006-03-13 2007-09-20 Univ Of Tokyo Automatic musical accompaniment system and musical score tracking system
JP2018063295A (en) * 2016-10-11 2018-04-19 ヤマハ株式会社 Performance control method and performance control device
JP2019139295A (en) * 2018-02-06 2019-08-22 ヤマハ株式会社 Information processing method and information processing apparatus
US10643593B1 (en) * 2019-06-04 2020-05-05 Electronic Arts Inc. Prediction-based communication latency elimination in a distributed virtualized orchestra
JP2021043258A (en) * 2019-09-06 2021-03-18 ヤマハ株式会社 Control system and control method
EP3869495A1 (en) * 2020-02-20 2021-08-25 Antescofo Improved synchronization of a pre-recorded music accompaniment on a user's music playing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007241181A (en) * 2006-03-13 2007-09-20 Univ Of Tokyo Automatic musical accompaniment system and musical score tracking system
JP2018063295A (en) * 2016-10-11 2018-04-19 ヤマハ株式会社 Performance control method and performance control device
JP2019139295A (en) * 2018-02-06 2019-08-22 ヤマハ株式会社 Information processing method and information processing apparatus
US10643593B1 (en) * 2019-06-04 2020-05-05 Electronic Arts Inc. Prediction-based communication latency elimination in a distributed virtualized orchestra
JP2021043258A (en) * 2019-09-06 2021-03-18 ヤマハ株式会社 Control system and control method
EP3869495A1 (en) * 2020-02-20 2021-08-25 Antescofo Improved synchronization of a pre-recorded music accompaniment on a user's music playing

Similar Documents

Publication Publication Date Title
CN109478399B (en) Performance analysis method, automatic performance method, and automatic performance system
US11557269B2 (en) Information processing method
CN111052223B (en) Playback control method, playback control device, and recording medium
US10504498B2 (en) Real-time jamming assistance for groups of musicians
US8887051B2 (en) Positioning a virtual sound capturing device in a three dimensional interface
JP7432124B2 (en) Information processing method, information processing device and program
US11609736B2 (en) Audio processing system, audio processing method and recording medium
US7504572B2 (en) Sound generating method
WO2023170757A1 (en) Reproduction control method, information processing method, reproduction control system, and program
JP2018032316A (en) Video generation device, video generation model learning device, method for the same, and program
JP3233103B2 (en) Fingering data creation device and fingering display device
JP2018155936A (en) Sound data edition method
JP6838357B2 (en) Acoustic analysis method and acoustic analyzer
JP4238237B2 (en) Music score display method and music score display program
WO2024004564A1 (en) Acoustic analysis system, acoustic analysis method, and program
Lin et al. VocalistMirror: A Singer Support Interface for Avoiding Undesirable Facial Expressions
US20230244646A1 (en) Information processing method and information processing system
WO2023182005A1 (en) Data output method, program, data output device, and electronic musical instrument
WO2023181571A1 (en) Data output method, program, data output device, and electronic musical instrument
JP7458127B2 (en) Processing systems, sound systems and programs
WO2023181570A1 (en) Information processing method, information processing system, and program
WO2024085175A1 (en) Data processing method and program
WO2022074754A1 (en) Information processing method, information processing system, and program
JP2023154236A (en) Information processing system, information processing method, and program
JP2023154288A (en) Control device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22930743

Country of ref document: EP

Kind code of ref document: A1