WO2023170757A1

WO2023170757A1 - Reproduction control method, information processing method, reproduction control system, and program

Info

Publication number: WO2023170757A1
Application number: PCT/JP2022/009776
Authority: WO
Inventors: 陽前澤
Original assignee: ヤマハ株式会社
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2023-09-14

Abstract

This reproduction control system is provided with: a prediction control unit that generates control information pertaining to at least one reproduced part of a musical composition by using model prediction control which uses a prediction model for predicting performance information including the performance position, within the musical composition, of at least one musician; and a reproduction control unit that controls the reproduction of the reproduced part of the musical composition by using the control information generated pertaining to the at least one reproduced part.

Description

Reproduction control method, information processing method, reproduction control system and program

The present disclosure relates to technology for controlling audio or video playback.

For example, techniques for synchronizing the reproduction of music with the performance by a performer have been proposed in the past. A situation is assumed in which a music piece is played in parallel with performances by a plurality of performers. Non-Patent Document 1 discloses a technique for estimating performance positions and performance speeds by integrating information on performances by a plurality of performers, and controlling playback of music according to the estimation results. Furthermore, Non-Patent Document 2 discloses a configuration in which the reproduction of music is synchronized with the performance of a specific performer selected from a plurality of performers.

Under the conventional control techniques exemplified above, it is actually difficult to make the playback of music appropriately follow the performance of a musical instrument by a performer, and there is room for further improvement. In consideration of the above circumstances, one aspect of the present disclosure aims to appropriately control the reproduction of a reproduction part according to a performance by a performer.

In order to solve the above problems, a playback control method according to one aspect of the present disclosure performs model predictive control using a predictive model that predicts performance information including a performance position in a song for at least one performer. Control information is generated for at least one playback part of the song, and playback of the playback part of the song is controlled using the control information generated for the at least one playback part.

A playback control system according to one aspect of the present disclosure performs model predictive control using a predictive model that predicts performance information including a performance position in a song for at least one performer, for at least one playback part of the song. The present invention includes a predictive control section that generates control information, and a playback control section that controls playback of the playback part in the music piece based on the control information generated for the at least one playback part.

A program according to one aspect of the present disclosure provides control information for at least one playback part of a song by model predictive control using a prediction model that predicts performance information including a performance position in a song for at least one performer. The computer system is made to function as a predictive control unit that generates a prediction control unit that generates the playback part, and a playback control unit that controls playback of the playback part in the music piece using the control information generated for the at least one playback part.

An information processing method according to one aspect of the present disclosure performs model predictive control using a predictive model that predicts performance information including a performance position in a music piece for at least one performer, for at least one playback part of the music piece. generating control information, controlling movement of the skeleton and joints represented by the motion data according to the control information, and generating a virtual demonstrator in a virtual space in a posture corresponding to the controlled skeleton and joints; An image of the virtual space captured by a virtual camera whose position and direction are controlled according to the behavior of the user's head is displayed on a display device.

FIG. 1 is a block diagram illustrating the configuration of a performance system. FIG. 2 is a block diagram illustrating the functional configuration of a playback control system. FIG. 2 is a block diagram illustrating the configuration of a predictive control unit. It is a schematic diagram of state cost and control cost. It is a graph showing the relationship between weight value and feedback gain. It is a flowchart of control processing. It is a schematic diagram of the setting screen in 2nd Embodiment. It is a schematic diagram of the setting screen in 2nd Embodiment. FIG. 7 is an explanatory diagram of state variables and state costs in the third embodiment. It is an explanatory diagram of control information and control cost in a 3rd embodiment.

A: First Embodiment FIG. 1 is a block diagram illustrating the configuration of a performance system 100 according to a first embodiment. In the first embodiment, a case is assumed in which a single performer plays a specific part (hereinafter referred to as a "performance part") out of a plurality of parts of a specific song (hereinafter referred to as a "target song"). The performance parts are, for example, one or more parts that constitute the melody of the target song. The performance system 100 controls the reproduction of parts other than the performance part (hereinafter referred to as "reproduction part") among the plurality of parts of the target music piece. The reproduction parts are, for example, one or more parts that constitute the accompaniment of the target music piece.

The performance system 100 includes a playback control system 10 and a keyboard instrument 20. The reproduction control system 10 and the keyboard instrument 20 are interconnected, for example, by wire or wirelessly.

The keyboard instrument 20 is an electronic musical instrument equipped with a plurality of keys corresponding to different pitches. The performer plays the performance part by sequentially operating each key of the keyboard instrument 20. The keyboard instrument 20 reproduces musical tones of pitches played by a player. In addition, the keyboard instrument 20 supplies performance data E representing the performance to the reproduction control system 10 in parallel with the reproduction of musical tones according to the performance by the player. The performance data E specifies the pitch and key depression intensity corresponding to the key operated by the player. That is, the performance data E is data representing a time series of notes played by the performer. The performance data E is, for example, event data compliant with the MIDI (Musical Instrument Digital Interface) standard. Note that the instrument played by the player is not limited to the keyboard instrument 20.

The playback control system 10 includes a control device 11, a storage device 12, a display device 13, an operating device 14, and a sound emitting device 15. The reproduction control system 10 is realized by a portable information device such as a smartphone or a tablet terminal, or a portable or stationary information device such as a personal computer. Note that the reproduction control system 10 is realized not only as a single device but also as a plurality of devices configured separately from each other. Furthermore, the playback control system 10 may be installed in the keyboard instrument 20. The entire performance system 100 including the playback control system 10 and the keyboard instrument 20 may be interpreted as a "playback control system."

The control device 11 is one or more processors that control each element of the playback control system 10. Specifically, for example, CPU (Central Processing Unit), GPU (Graphics Processing Unit), SPU (Sound Processing Unit), DSP (Digital Signal Processor), FPGA (Field Programmable Gate Array), or ASIC (Application Specific Integrated Circuit). The control device 11 is composed of one or more types of processors such as the following.

The storage device 12 is one or more memories that store programs executed by the control device 11 and various data used by the control device 11. For example, a known recording medium such as a semiconductor recording medium and a magnetic recording medium, or a combination of multiple types of recording media is used as the storage device 12. Note that, for example, a portable recording medium that can be attached to and detached from the playback control system 10 or a recording medium that can be accessed by the control device 11 via a communication network (for example, cloud storage) is used as the storage device 12. Good too.

The storage device 12 stores music data D and audio signals Z. The music data D is data that specifies the time series of a plurality of notes constituting the target music. That is, the music data D is data representing the musical score of the target music. The music data D includes first musical score data D1 and second musical score data D2. The first musical score data D1 specifies the note string of the performance part of the target musical piece. The second musical score data D2 specifies the note string of the reproduction part of the target music piece. The music data D (D1, D2) is, for example, a file in a format compliant with the MIDI (Musical Instrument Digital Interface) standard. The acoustic signal Z is a time domain signal representing the waveform of the musical tone (ie, accompaniment tone) of the reproduction part.

The display device 13 displays various images. The display device 13 is configured with a display panel such as a liquid crystal panel or an organic EL (Electroluminescence) panel. The operating device 14 accepts operations by the user. Specifically, the operating device 14 is a plurality of operating elements operated by a user, or a touch panel configured integrally with the display surface of the display device 13. The user who operates the operating device 14 is, for example, a performer of a performance part or an operator other than the performer.

The sound emitting device 15 reproduces sound under the control of the control device 11. For example, the sound emitting device 15 reproduces the musical tone of the reproduction part represented by the acoustic signal Z. The sound emitting device 15 is, for example, a speaker or headphones. Note that a sound emitting device 15 that is separate from the playback control system 10 may be connected to the playback control system 10 by wire or wirelessly. Note that illustration of a D/A converter that converts the audio signal Z from digital to analog and an amplifier that amplifies the audio signal Z are omitted for convenience.

FIG. 2 is a block diagram illustrating the functional configuration of the playback control system 10. The control device 11 executes a program stored in the storage device 12 to provide a plurality of functions (predictive control unit 30 and playback control Part 40) is realized.

The predictive control unit 30 generates control information U[t] using the performance data E and the music data D. Control information U[t] is generated at different times t on the time axis. That is, the predictive control unit 30 generates a time series of control information U[t]. The control information U[t] is data in an arbitrary format for controlling the reproduction of the reproduction part. The control information U[t] is a two-dimensional vector including a playback position u1[t] and a playback speed u2[t].

The playback position u1[t] is the position (point on the time axis) at which the playback part should be played at time t. Specifically, the playback position u1[t] is based on the playback position (hereinafter referred to as "reference position") at each time t when the playback part is played back at a predetermined speed (hereinafter referred to as "reference speed"). It is a relative position. That is, the playback position u1[t] is expressed as a difference (amount of change) from the reference position.

The playback speed u2[t] is the speed at which the playback part should be played back at time t. Specifically, the playback speed u2[t] is a relative speed with respect to the reference speed. That is, the playback speed u2[t] is expressed as a difference (amount of change) from the reference speed.

The playback control unit 40 controls the playback of the musical tone of the playback part according to the control information U[t]. Specifically, the playback control unit 40 controls the playback of the musical tone of the playback part by the sound emitting device 15. The reproduction control unit 40 generates reproduction information P[t] from the control information U[t], and causes the sound emitting device 15 to reproduce the reproduction part according to the reproduction information P[t]. Specifically, the reproduction control unit 40 outputs a sample sequence of a portion of the acoustic signal Z corresponding to the reproduction information P[t] to the sound emitting device 15.

The reproduction information P[t] is information representing the actual reproduction of the reproduction part by the sound emitting device 15. Specifically, the playback information P[t] is a two-dimensional vector including a playback position p1[t] and a playback speed p2[t]. The reproduction position p1[t] is the position (point on the time axis) of the reproduction part to be reproduced at time t. The playback position p1[t] is a position based on the starting point of the target music piece. On the other hand, the playback speed p2[t] is the speed at which the playback part should be played back at time t. Specifically, the playback speed p2[t] is a speed with the stop of playback as a reference value (zero).

As described above, the predictive control unit 30 generates the control information U[t] using the performance data E corresponding to the performance by the performer, and the playback control unit 40 controls the playback of the playback part by the sound emitting device 15. Control is performed according to control information U[t]. The predictive control unit 30 of the first embodiment generates control information U[t] so that the reproduction of the reproduction part by the sound emitting device 15 follows the performance of the performance part by the performer. Model Predictive Control (MPC) is used to generate the control information U[t].

FIG. 3 is a block diagram illustrating a specific configuration of the predictive control unit 30. The prediction control section 30 includes a performance prediction section 31 , an information generation section 32 , an arithmetic processing section 33 , and a variable setting section 34 .

The performance prediction unit 31 predicts the performance information S[t] using the prediction model. Performance information S[t] is predicted for each time t on the time axis. That is, the performance prediction unit 31 generates a time series of performance information S[t]. The performance information S[t] is information predicted from the performance of the performance part by the performer (that is, the performance data E). Specifically, the performance information S[t] is a two-dimensional vector including a performance position s1[t] and a performance speed s2[t]. The prediction model is a mathematical model for predicting performance information S[t].

The performance position s1[t] is the position (point on the time axis) where the performer is predicted to perform at time t in the performance part. The performance position s1[t] is a position based on the starting point of the target music piece. On the other hand, the performance speed s2[t] is the predicted performance speed at time t. The performance speed s2[t] is a speed with the stop of the performance as a reference value (zero).

The performance prediction section 31 includes an analysis section 311 and a prediction section 312. The analysis unit 311 estimates the performance time t[k] and the performance position s[k] by analyzing the performance data E (k is a natural number). Each time the performer plays each note of the performance part, the performance time t[k] and the performance position s[k] are estimated. The performance time t[k] is the time when the k-th note among the plurality of notes of the performance part is played. The performance position s[k] is the position of the k-th note among the plurality of notes of the performance part. For the analysis by the analysis unit 311, a known performance analysis technique (score alignment technique) is arbitrarily employed. For example, the analysis technique disclosed in Japanese Patent Application Publication No. 2016-099512 is used to estimate the performance time t[k] and the performance position s[k]. The analysis unit 311 uses a statistical estimation model such as a deep neural network (DNN) or a hidden Markov model (HMM) to calculate the performance time t[k] and the performance position s[ k] may be estimated.

The prediction unit 312 generates performance information S[t] for a time t after (that is, in the future) the performance time t[k]. The prediction model is used by the prediction unit 312 to predict the performance information S[t]. The prediction model is, for example, a state space model that assumes that the performance by the performer progresses at a constant speed. Specifically, it is assumed that the performance progresses at a constant speed during the intervals between successive notes. Under the above assumptions, the state variable Λ[k] in the state space model is expressed by the following equation (1).

The symbol ε[k] in Equation (1) is a noise component (eg, white noise). The covariance of the noise component ε[k] is calculated from the performance tendency of the performer. In the prediction model, the probability that the performance position s[k] occurs under the observed condition of the state variable Λ[k] follows a normal distribution with a predetermined variance. Under the above premise, by updating the state variable Λ[k] through arithmetic processing such as a Kalman filter, the performance information S[t] can be predicted as shown in Equation (2) below.

Note that in the above explanation, it is assumed that the performance of the performance part progresses at a constant speed, but the performance information S[t] The performance speed may be changed in the prediction. For example, the prediction unit 312 may calculate the performance information S[t] by calculating the following formulas (3a) and (3b).

The symbol dt in Equations (3a) and (3b) is a predetermined time length. Further, the symbol τ(s1[t]) means the performance speed at the performance position s1[t] of the performance information S[t]. As described above, the performance speed τ(s1[t]) is calculated in advance using, for example, the performance speed at which the performer played the performance part in the past. For example, it is assumed that the expected value of the past performance speed of the performance part of the target music piece is calculated as the performance speed τ(s1[t]). Further, a statistical estimation model such as a deep neural network or a hidden Markov model may be made to learn the relationship between musical scores played by the performer in the past and the performance speed τ(s1[t]) in the performance. The prediction unit 312 generates the performance speed τ(s1[t]) by processing the performance data E using a statistical estimation model.

The information generation unit 32 in FIG. 3 generates control information U[t] from performance information S[t]. As mentioned above, the control information U[t] is set so that the reproduction of the reproduction part by the sound emitting device 15 (control information U[t]) follows the performance of the performance part by the performer (performance information S[t]). generated. The derivation of an arithmetic expression (hereinafter referred to as "control law") for generating control information U[t] under the above premise will be discussed below. In the first embodiment, LQG (Linear-Quadratic-Gaussian) control is used to derive the control law.

First, assume a state variable X[t] expressed by the following equation (4).

As understood from equation (4), the state variable X[t] is a variable that represents the error between the performance information S[t] and the playback information P[t]. That is, the state variable X[t] represents the error between the performance of the performance part by the performer and the reproduction part by the sound emitting device 15. The state variable X[t] in the first embodiment is a two-dimensional vector including a position error x1[t] and a speed error x2[t]. The positional error x1[t] is the error between the performance position s1[t] and the reproduction position p1[t] (x1[t]=s1[t]-p1[t]). Further, the speed error x2[t] is the error between the performance speed s2[t] and the playback speed p2[t] (x2[t]=s2[t]-p2[t]).

As described above, the control information U[t] includes a playback position u1[t] based on the reference position and a playback speed u2[t] based on the reference speed. Now, assuming that the playback part is played back at a constant speed during the minute time dt, a state transition expressed by the following equation (5) can be assumed. Note that matrix B in Equation (5) is a quadratic unit matrix.

Now, from the viewpoint of deriving the control law for calculating the control information U[t] at a time (t'+δ) when a time length δ has elapsed from a certain time t', it is expressed by the following formula (6). Consider reducing (eg, minimizing) the cost J.

The symbol T means transpose of a matrix. Further, the symbol Δ in Equation (6) is a time that is sufficiently behind (in the future) the time (t'+δ) on the time axis.

The symbol Q[s1] in Equation (6) is the cost (hereinafter referred to as "state cost") regarding the state variable X[t] at each performance position s1[t] of the target music piece. As mentioned above, the state variable X[t] means the error between the performance information S[t] and the playback information P[t]. Therefore, the state cost Q[s1] means the cost for the error between the performance information S[t] and the playback information P[t] at the performance position s1[t] of the target music piece. That is, the state cost Q[s1] is a cost for the fact that the reproduction of the reproduction part does not follow the performance of the performance part. As understood from Equation (6), the state cost Q[s1] is a quadratic square matrix.

The symbol R[p1] in formula (6) is the cost related to the control information U[t] (hereinafter referred to as "control cost"). Specifically, the control cost R[p1] means the cost for the playback position u1[t] and the playback speed u2[t]. As mentioned above, the playback position u1[t] means the amount of change in the playback position p1[t] with respect to the reference position, and the playback speed u2[t] means the amount of change in the playback speed p2[t] with respect to the reference speed. do. Therefore, the control cost R[p1] is expressed as a cost related to temporal changes in the playback position p1[t] and the playback speed p2[t] represented by the playback information P[t]. That is, the control cost R[p1] is a cost for a change in the reproduction information P[t]. As understood from Equation (6), the control cost R[p1] is a quadratic square matrix.

As understood from formula (6), cost (objective function) J includes state variable X[t], control information U[t], state cost Q[s1], and control cost R[p1]. By using equation (6), a control law for generating control information U[t] is derived as shown in equations (7a) to (7d) below.

The symbol O in formula (7d) is a zero matrix. That is, the matrix Y[t] is a matrix that becomes a zero matrix at time Δ. The symbol L[t] in Equation (7a) is a feedback gain for the state variable X[t], and is expressed by a quadratic square matrix. As understood from equation (7a), the control information U[t] may assume linear feedback with respect to the state variable X[t]. Further, the feedback gain L[t] does not depend on either the control information U[t] or the state variable X[t]. On the other hand, the feedback gain L[t] depends on the state cost Q[s1] and the control cost R[p1].

The information generation unit 32 in FIG. 3 uses a mathematical formula ( The control information U[t] of the playback part is calculated by the calculations 7a) to (7d). That is, the control information U[t] is calculated so that the cost J in Equation (6) is reduced. As can be understood from the above explanation, the model predictive control by the predictive control unit 30 in FIG. includes an optimization process for generating control information U[t] suitable from the viewpoint of reducing cost J.

The arithmetic processing unit 33 in FIG. 3 generates a state cost Q[s1] and a control cost R[p1] that are applied to the generation of control information U[t]. The generation of the state cost Q[s1] and the control cost R[p1] will be described in detail below.

FIG. 4 is a schematic diagram of the state cost Q[s1] and the control cost R[p1]. In FIG. 4, state cost Q[s1] and control cost R[p1] when the target music piece is expressed by the musical score shown in FIG. 4 are illustrated. In FIG. 4, the numerical value of the element in the first row and first column of the state cost Q[s1] and the numerical value of the element in the first row and first column of the control cost R[p1] are illustrated for convenience. ing.

The state cost Q[s1] is expressed by the following formula (8).

The symbol ε in Equation (8) is a small value for stabilizing each value of the state cost Q[s1]. The symbol I means a quadratic unit matrix.

The symbol Gq in formula (8) is a set of positions at which each note of the performance part is to be played by the performer. That is, the set Gq includes the starting point position (hereinafter referred to as "sounding position") s' of each note specified by the first musical score data D1 of the music data D. As illustrated in FIG. 4, Equation (8) represents a time series (hereinafter referred to as "pulse train") Hq of a plurality of pulses q corresponding to different sound generation positions s'. Each pulse q is centered at a time point a time α after the sound generation position s'. Note that since the variable α is a small numerical value, in FIG. 4, the sound generation position s' and the center of each pulse q are shown to substantially coincide with each other. The symbol κs' in Equation (8) is a variable that determines the maximum value of the pulse q corresponding to the sounding position s'. The symbol γ is a variable that determines the pulse width of each pulse q.

On the horizontal axis of FIG. 4, the function value of the pulse train Hq corresponding to the performance position s1[t] corresponds to the state cost Q[s1]. The symbol Cq[s1] in Equation (8) is a weight value for weighting the state cost Q[s1]. That is, the larger the weight value Cq[s1], the more the influence of the state cost Q[s1] on the feedback gain L[t] increases.

The calculation processing unit 33 specifies each sounding position s' by analyzing the first musical score data D1, and calculates the state cost Q[s1] by executing the calculation of formula (8). As can be understood from the above equation (6), the larger the state cost Q[s1] at each performance position s1[t], the larger the error between the performance information S[t] and the playback information P[t] (i.e., the state variable Feedback gain L[t] is set so that X[t]) is sufficiently reduced. That is, at the performance position s1[t] near each sound generation position s' of the performance part, sufficient approximation between the performance information S[t] of the performance part and the reproduction information P[t] of the reproduction part is required. . Specifically, the feedback gain L[t] is set so that the performance of the performance part by the performer and the reproduction of the reproduction part by the sound emitting device 15 are sufficiently similar. On the other hand, at a performance position s1[t] sufficiently spaced from each sound generation position s', a difference between performance information S[t] and reproduction information P[t] is allowed.

The control cost R[p1] is expressed by the following formula (9).

The symbol ε in Equation (9) is a small value for stabilizing each value of the control cost R[p1]. The symbol I means a quadratic unit matrix.

The symbol Gr in Equation (9) is a set of positions at which each note of the playback part is to be played. That is, the set Gr includes the position (hereinafter referred to as "sounding position") p' of the starting point of each note specified by the second musical score data D2 of the music data D. As illustrated in FIG. 4, Equation (9) represents a time series (hereinafter referred to as "pulse train") Hr of a plurality of pulses r corresponding to different sound generation positions p'. Each pulse r is set to have a shape that gradually increases from a point in time before the sound generation position p' and sharply decreases after the sound generation position p' has passed. The symbol ω(p) is a window function representing one pulse r, and is expressed, for example, by the following equation (10). The coefficient c1 and the coefficient c2 in Equation (10) are predetermined positive numbers.

On the horizontal axis of FIG. 4, the function value of the pulse train Hr corresponding to the reproduction position p1[t] corresponds to the control cost R[p1]. The symbol Cr[p1] in Equation (9) is a weight value for weighting the control cost R[p1]. That is, the larger the weight value Cr[p1], the greater the influence of the control cost R[p1] on the feedback gain L[t].

The calculation processing unit 33 specifies each sounding position p' by analyzing the second musical score data D2, and calculates the control cost R[p1] by executing the calculation of formula (9). As understood from the above equation (6), the feedback gain L[t] is set so that the larger the control cost R[p1] at each position p1, the more the control information U[t] is sufficiently reduced. Ru. In the vicinity of each sounding position p' of the performance part, the reference position and the playback position p1[t] are sufficiently approximated (the playback position u1[t] is sufficiently reduced), and the reference speed and the playback position p1[t] A sufficient approximation (sufficient reduction of the playback speed u2[t]) is required. Specifically, the feedback gain L[t] is set so that the reproduction of the reproduction part by the sound emitting device 15 sufficiently approximates the note sequence represented by the second musical score data D2. On the other hand, changes in the reproduction information P[t] are allowed at reproduction positions p1[t] that are sufficiently distant from each sound generation position p'.

The variable setting unit 34 in FIG. 3 sets variables that are applied to the generation of control information U[t]. Specifically, the variable setting unit 34 sets each variable (ε, κs', γ, α, Cq[s1]) included in formula (8) and each variable (ε, Cr [p1], c1, c2). For example, the variable setting unit 34 sets each variable included in formula (8) or formula (9) to a numerical value stored in the storage device 12. As described above, the variable setting unit 34 of the first embodiment sets one or more variables included in the cost J of formula (6). The arithmetic processing unit 33 calculates the state cost Q[s1] and the control cost R[p1] by calculation using the variables set by the variable setting unit 34.

FIG. 5 is a graph showing the relationship between the weighted value Cq[s1], the weighted value Cr[p1], and the feedback gain L[t]. Note that in FIG. 5, the numerical value of the element in the first row and first column of the feedback gain L[t] is illustrated for convenience. As understood from equation (7a), there is a tendency that the larger the feedback gain L[t] is, the more strongly the reproduction information P[t] of the reproduction part is corrected. For example, the larger the feedback gain L[t] is, the more the playback of the playback part is corrected to approximate the performance information S[t] of the playback part.

In FIG. 5, a musical score similar to that in FIG. 4 is assumed. The target music piece is divided into a section σ1, a section σ2, and a section σ3 on the time axis. The interval σ1 is an interval in which the performance part is sounded and the reproduction part is kept silent. The interval σ2 is an interval in which the playback part is sounded and the performance part is kept silent. The interval σ3 is an interval in which both the performance part and the playback part are sounded.

The graph V1 in FIG. 5 is the feedback gain L[t] when the weighted value Cq[s1] and the weighted value Cr[p1] are set to equal values (case 1). In case 1, the feedback gain L[t] is set to a large value near the sounding position s' of the performance part within the interval σ1. That is, the reproduction of the reproduction part is strongly corrected so that the error between the performance information S[t] and the reproduction information P[t] is sufficiently reduced. On the other hand, the feedback gain L[t] is maintained at a sufficiently small value within the interval σ2. That is, the reproduction of the reproduction part is hardly corrected in the interval σ2. Furthermore, within the interval σ3, the feedback gain L[t] is maintained at a large value near the sound generation position s', although it is not as large as within the interval σ1. That is, in the section σ3 in which both the performance part and the playback part are sounded, the playback of the playback part is strongly corrected, although not as much as in the section σ1.

Graph V2 in FIG. 5 is the feedback gain L[t] when the weighted value Cq[s1] is sufficiently smaller than the weighted value Cr[p1] (case 2). Specifically, the weight value Cq[s1] was set to 0.1, and the weight value Cr[p1] was set to 1.0. In case 2, the feedback gain L[t] is maintained at a small value overall. That is, the correction of the error between the performance information S[t] and the playback information P[t] is suppressed compared to Case 1.

Graph V3 in FIG. 5 is the feedback gain L[t] when the weighted value Cq[s1] is sufficiently larger than the weighted value Cr[p1] (case 3). Specifically, the weight value Cq[s1] was set to 1.0, and the weight value Cr[p1] was set to 0.1. In case 3, the feedback gain L[t] is set to a large value in the vicinity of the sound generation position s' of the performance part, regardless of whether or not the playback part is generating sound. That is, the reproduction of the reproduced part is strongly corrected so that the error between the performance information S[t] and the reproduced information P[t] is sufficiently reduced, regardless of whether or not the reproduced part is sounded.

As can be understood from the above explanation, the reproduction behavior of the reproduction part with respect to the performance part changes according to the weight value Cq[s1] and the weight value Cr[p1]. Specifically, the relationship between the performance part and the playback part changes depending on the magnitude relationship between the weighted value Cq[s1] and the weighted value Cr[p1].

FIG. 6 is a flowchart of the process (hereinafter referred to as "control process") executed by the control device 11. The control process is repeated at predetermined intervals.

When the control process is started, the control device 11 (analysis unit 311) estimates the performance time t[k] and the performance position s[k] by analyzing the performance data E (Sa1). Further, the control device 11 (prediction unit 312) generates performance information S[t] for time t after performance time t[k] by using the prediction model (Sa2: prediction process).

The control device 11 (variable setting unit 34) sets variables applied to the generation of control information U[t] (Sa3). The control device 11 (arithmetic processing unit 33) generates a state cost Q[s1] and a control cost R[p1] (Sa4). Specifically, the control device 11 generates the state cost Q[s1] by analyzing the first musical score data D1, and generates the control cost R[p1] by analyzing the second musical score data D2. The variables set in step Sa3 are applied to generate the state cost Q[s1] and the control cost R[p1].

The control device 11 (information generation unit 32) calculates the formula (6) by calculating the formulas (7a) to (7d) applying the state variable X[t], the state cost Q[s1], and the control cost R[p1]. ) is calculated so that the cost J is reduced (Sa5: optimization process). Through prediction processing (Sa2) that generates performance information S[t] using a prediction model and optimization processing (Sa5) that generates control information U[t] using performance information S[t], Model predictive control is realized.

The control device 11 (playback control unit 40) controls the playback of the playback part by the sound emitting device 15 according to the control information U[t] (Sa6). Specifically, the control device 11 generates reproduction information P[t] from the control information U[t], and reproduces a portion of the acoustic signal Z corresponding to the reproduction information P[t] on the sound emitting device 15. let

As explained above, in the first embodiment, since model predictive control is used to generate the control information U[t], it is possible to appropriately control the reproduction of the reproduction part according to the performance by the performer. In particular, in the first embodiment, the control information U[t] is generated so that the cost J including the state variable X[t] representing the error between the performance information S[t] and the playback information P[t] is reduced. be done. Therefore, it is possible to link the reproduction of the reproduction part to the performance by the performer.

Furthermore, the cost J includes a state cost Q[s1] related to the state variable X[t] and a control cost R[p1] related to temporal changes in the reproduction information P[t]. By reducing the state cost Q[s1], the error (state variable X(t)) between the performance information S[t] and the reproduction information P[t] is effectively reduced. Further, by reducing the control cost R[p1], excessive changes in the reproduction information P[t] are suppressed. Therefore, the error between the performance information S[t] and the reproduction information P[t] and the excessive change in the reproduction information P[t] can be effectively reduced.

B: Second Embodiment The second embodiment will be described. In addition, in each aspect illustrated below, for elements whose functions are similar to those in the first embodiment, the same reference numerals as in the description of the first embodiment are used, and detailed descriptions of each are omitted as appropriate.

The second embodiment differs from the first embodiment in the operation of the variable setting unit 34. The configuration and operation of elements other than the variable setting section 34 are the same as in the first embodiment. Therefore, the second embodiment also achieves the same effects as the first embodiment.

The variable setting unit 34 of the first embodiment sets the variable applied to the generation of the control information U[t] to a numerical value stored in advance in the storage device 12. The variable setting unit 34 of the second embodiment sets variables to be applied to the generation of the control information U[t] in response to a user's instruction to the operating device 14 (Sa3). Specifically, the variable setting unit 34 sets each variable (ε, κs', γ, α, Cq[s1]) in equation (8) and each variable (ε, Cr[p1], c1 , c2) are set variably according to instructions from the user. The calculation processing unit 33 calculates the state cost Q[s1] and the control cost R[p1] by calculation using the variables set by the variable setting unit 34 (Sa4). According to the second embodiment, the variable related to the cost J in Equation (6) is set according to an instruction from the user, so that the user's intention can be reflected in the reproduction of the reproduction part.

As explained above, the variable setting unit 34 of the second embodiment sets the weighted value Cq[s1] of Equation (8) and the weighted value Cr[p1] of Equation (9). The weight value Cq[s1] is an example of a "first weight value", and the weight value Cr[p1] is an example of a "second weight value".

FIG. 7 is a schematic diagram of the setting screen 141 for the user to change the weight value Cq[s1] and the weight value Cr[p1]. The variable setting unit 34 displays a setting screen 141 on the display device 13.

The setting screen 141 includes the musical score 142 of the target song represented by the song data D. The musical score 142 includes a musical score 143 of the performance part represented by the first musical score data D1, and a musical score 144 of the reproduction part represented by the second musical score data D2. The user can specify an arbitrary section (hereinafter referred to as a "set section") 145 within the musical score 142 by operating the operating device 14. The variable setting unit 34 accepts the designation of a setting section 145 by the user. Note that a plurality of setting sections 145 may be specified in the musical score 142.

The user selects either the weighted value Cq[s1] or the weighted value Cr[p1] by operating the operating device 14. The variable setting unit 34 accepts selections by the user. When the weight value Cq[s1] is selected, the variable setting unit 34 displays the changed image 146 in FIG. 7 on the display device 13. The modified image 146 includes the current value (synchrony) of the weight value Cq[s1]. The user can instruct increase (Increase synchrony) or decrease (Decrease synchrony) of weight value Cq[s1] by operating on change image 146. The variable setting unit 34 changes the weight value Cq[s1] within the setting section 145 in response to an instruction from the user. The changed image 146 displays the changed weight value Cq[s1] (synchrony=3). The variable setting unit 34 sets a weight value Cq[s1] for each setting section 145 specified by the user. Note that the variable setting unit 34 may set the weight value Cq[s1] to a numerical value directly specified by the user.

Further, when the weight value Cr[p1] is selected, the variable setting unit 34 displays the changed image 147 in FIG. 8 on the display device 13. The modified image 147 includes the current value (rigidity) of the weight value Cr[p1]. The user can instruct increase (Increase rigidity) or decrease (Decrease rigidity) of weight value Cr[p1] by operating on change image 147. The variable setting unit 34 changes the weight value Cr[p1] within the setting section 145 in response to an instruction from the user. The changed image 147 displays the changed weight value Cr[p1] (rigidity=3). The variable setting unit 34 sets a weight value Cr[p1] for each setting section 145 specified by the user. Note that the variable setting unit 34 may set the weight value Cr[p1] to a numerical value directly designated by the user.

The arithmetic processing unit 33 of the second embodiment generates the state cost Q[s1] and the control cost R[p1] according to the weighted value Cq[s1] and the weighted value Cr[p1] set by the variable setting unit 34. (Sa4). Specifically, the arithmetic processing unit 33 calculates the performance position s1[t] within the set section 145 of the target music according to formula (8) by applying the weighted value Cq[s1] of the set section 145. Calculate the state cost Q[s1]. In addition, the calculation processing unit 33 calculates the control cost R by calculating the playback position p1[t] within the set section 145 of the target music according to formula (9) to which the weighted value Cr[p1] of the set section 145 is applied. Calculate [p1]. For sections other than the set section 145 of the target song, the weight value Cq[s1] and the weight value Cr[p1] are set to predetermined initial values.

As explained with reference to FIG. 5, the reproduction behavior of the reproduction part with respect to the performance part changes according to the weight value Cq[s1] and the weight value Cr[p1]. In the second embodiment, the relationship between the performance part and the playback part can be changed according to the setting of the weight value Cq[s1] and the weight value Cr[p1] by the variable setting section 34. Particularly in the second embodiment, each of the weight value Cq[s1] and the weight value Cr[p1] is set according to instructions from the user. Therefore, the user can change the relationship between the performance parts and the playback parts.

C: Third Embodiment In the first embodiment, it is assumed that one performance part is played and one reproduction part is played. On the other hand, the above-described model predictive control has the advantage of being easily expandable to multi-input-multi-output (MIMO). Considering the above circumstances, in the third embodiment, a case is described in which the playback of M (M is a natural number) playback parts is controlled in conjunction with the performance of N (N is a natural number) playback parts. Suppose. For example, each of the N performers plays a different performance part of the target piece of music. Therefore, N performance data E corresponding to different performers (performance parts) are supplied to the reproduction control system 10 in parallel. The case where both the total number N of performers and the total number M of reproduction parts are set to "1" corresponds to the first embodiment described above.

The music data D of the third embodiment includes N pieces of first score data D1 and M pieces of second score data D2. The N pieces of first musical score data D1 correspond to different performance parts of the target musical piece. The M pieces of second musical score data D2 correspond to different reproduction parts of the target musical piece. The storage device 12 also stores M audio signals Z corresponding to different reproduction parts. The acoustic signal Z of each reproduction part represents the waveform of the musical tone of the reproduction part.

The performance prediction unit 31 predicts performance information S[t] for each of the N performance parts using a prediction model. That is, performance information S[t] is predicted for each of the N performers. The process of predicting performance information S[t] is the same as in the first embodiment. Performance information S[t] of each performance part is predicted from the performance of the performance part (ie, performance data E). Note that the performance prediction unit 31 may predict the performance information S[t] of each performance part using a separate prediction model for each performance part, or may use a prediction model common to N performance parts. The performance information S[t] of each performance part may be predicted by doing so.

FIG. 9 is an explanatory diagram of the state variable X[t] and state cost Q[s1] in the third embodiment. The state variables X[t] of the third embodiment include N×M state variables Xn,m[t] (n=1 to N, m=1 to M). Specifically, the state variable X[t] includes state variables Xn,m[t] for all combinations of selecting one of the N performance parts and one of the M reproduction parts. The state variable Xn,m[t] corresponds to the state variable X[t] of the first embodiment. Therefore, the state variable Xn,m[t] is a two-dimensional vector representing the error between the performance information S[t] of the nth performance part and the reproduction information P[t] of the mth performance part. . That is, the state variable Xn,m[t] represents the error between the performance of the n-th performance part and the m-th reproduction part by the sound emitting device 15.

The state cost Q[s1] is a block diagonal matrix whose diagonal components are N×M submatrices Qn,m[s1]. Elements of the state cost Q[s1] other than the submatrix Qn,m[s1] are set to zero. Specifically, the state cost Q[s1] includes submatrices Qn,m[s1] for all combinations of selecting one of the N performance parts and one of the M reproduction parts. The submatrix Qn,m[s1] corresponds to the state cost Q[s1] of the first embodiment. Specifically, the submatrix Qn,m[s1] is composed of the performance information S[t] at the performance position s1[t] of the nth performance part and the reproduction information P[t] of the mth performance part. is the cost for the error. The arithmetic processing unit 33 calculates the submatrix Qn,m[s1] using Equation (8) similarly to the state cost Q[s1] in the first embodiment. The set Gq of formula (8) applied to the calculation of the submatrix Qn,m[s1] is the sounding position s' of each note specified by the first score data D1 of the n-th performance part of the music data D. It is.

The variable setting unit 34 of the third embodiment individually sets the weight value Cq[s1] of formula (8) for each submatrix Qn,m[s1]. For example, the storage device 12 stores a plurality of different setting data. N×M weight values Cq[s1] corresponding to different combinations of performance parts and playback parts are registered in each of the plurality of setting data. The numerical value of each weight value Cq[s1] differs for each setting data. The variable setting unit 34 selects any one of the plurality of setting data according to a user's instruction to the operating device 14. Selection of setting data corresponds to setting of weight value Cq[s1] corresponding to each submatrix Qn,m[s1]. The calculation processing unit 33 calculates the submatrix Qn,m[s1] by calculating the formula (8) applying each weight value Cq[s1] registered in the setting data. As understood from the above description, the weight value Cq[s1] applied to the generation of each submatrix Qn,m[s1] is changed according to instructions from the user. Note that the variable setting unit 34 may individually set each of the N×M weight values Cq[s1] according to instructions from the user.

FIG. 10 is an explanatory diagram of control information U[t] and control cost R[p1] in the third embodiment. The control information U[t] of the third embodiment includes M pieces of control information U1[t] to UM[t] corresponding to different playback parts of the target song. The control information Um[t] corresponds to the control information U[t] of the first embodiment. Therefore, the control information Um[t] is a two-dimensional vector including the playback position u1[t] and the playback speed u2[t]. The playback control unit 40 controls the playback of the m-th playback part by the sound emitting device 15 according to the control information Um[t]. Specifically, the playback control unit 40 generates playback information Pm[t] from the control information Um[t], and outputs the m-th playback part to the sound emitting device 15 according to the playback information Pm[t]. Let it play. That is, the reproduction control unit 40 causes the sound emitting device 15 to reproduce a portion of the audio signal Z of the m-th reproduction part that corresponds to the reproduction information Pm[t]. Therefore, the musical tones of M reproduction parts of the target music piece are reproduced in parallel.

The control cost R[p1] is a block diagonal matrix whose diagonal components are M submatrices R1[p1] to RM[p1]. Elements of the control cost R[p1] other than the submatrix Rm[p1] are set to zero. The submatrix Rm[p1] corresponds to the control cost R[p1] of the first embodiment. Specifically, it is a cost related to a change in the reproduction information Pm[t] at the reproduction position p1[t] of the m-th reproduction part. The arithmetic processing unit 33 calculates the submatrix Rm[p1] using Equation (9) similarly to the control cost R[p1] of the first embodiment. The set Gr of formula (9) to which the calculation of the submatrix Rm[p1] is applied is the pronunciation position p' of each note specified by the second score data D2 of the m-th playback part of the music data D. .

The variable setting unit 34 of the third embodiment individually sets the weight value Cr[p1] of formula (9) for each submatrix Rm[p1]. For example, the storage device 12 stores a plurality of different setting data. M weight values Cr[p1] corresponding to different playback parts are registered in each of the plurality of setting data. The numerical value of each weight value Cr[p1] differs for each setting data. The variable setting unit 34 selects any one of the plurality of setting data according to a user's instruction to the operating device 14. Selection of setting data corresponds to setting of weight value Cr[p1] corresponding to each submatrix Rm[p1]. The arithmetic processing unit 33 calculates the submatrix Rm[p1] by calculating the formula (8) applying each weight value Cr[p1] registered in the setting data. As understood from the above description, the weight value Cr[p1] applied to the generation of each submatrix Rm[p1] is changed in accordance with instructions from the user. Note that the variable setting unit 34 may individually set each of the M weight values Cr[p1] according to instructions from the user.

Similarly to the first embodiment, the information generation unit 32 calculates the following equations (7a) to (7d) using the state variable X[t], state cost Q[s1], and control cost R[p1]. Calculate the control information U[t] of the playback part (Sa5). That is, the information generation unit 32 generates control information Um (U1[t] to UM[t]) for each of the M reproduction parts. Therefore, the third embodiment also achieves the same effects as the first embodiment.

As explained above, in the third embodiment, the total number N of performance parts and the total number M of reproduction parts are generalized. When the total number N of performance parts is 2 or more, performance information S[t] is predicted for each of the plurality of performers (performance parts). Therefore, the reproduction of the reproduction parts can be appropriately controlled according to the performances by a plurality of performers. Further, when the total number M of reproduction parts is 2 or more, control information Um[t] is generated for each of the plurality of reproduction parts. Therefore, the reproduction of each of the plurality of reproduction parts can be controlled according to the performance by the performer. When both the total number N of performance parts and the total number M of reproduction parts are 2 or more, the reproduction of each of the plurality of reproduction parts can be controlled in accordance with the performances by the plurality of performers.

In the third embodiment, N×M weight values Cq[s1] corresponding to different combinations of performance parts and playback parts are controlled. Furthermore, M weight values Cr[p1] corresponding to different playback parts are controlled. As described above, the relationship between the reproduction of the playback part and the performance of the performance part depends on the weight value Cq[s1] and the weight value Cr[p1]. Therefore, the relationship between each of the N performance parts and each of the M playback parts can be controlled in detail according to the weighting value Cq[s1] and the weighting value Cr[p1]. That is, the degree to which the reproduction of the reproduction part is linked to the performance of the performance part can be individually controlled for each combination of each performance part and each reproduction part. For example, various types of control can be realized, such as strongly linking the playback of a specific playback part with the performance of a specific performance part, while hardly linking the playback of other playback parts with the performance of the performance part.

As understood from the example of the third embodiment, the prediction control unit 30 uses a prediction model that predicts the performance information S[t] including the performance position s1[t] in the target music piece for at least one performer. Control information U[t] is generated for at least one playback part of the target song using model predictive control.

D: Modifications Specific modifications added to each of the embodiments exemplified above will be exemplified below. A plurality of aspects arbitrarily selected from the above-described embodiment and the modified examples illustrated below may be combined as appropriate to the extent that they do not contradict each other.

(1) In each of the above embodiments, the acoustic signal Z stored in the storage device 12 is used to reproduce the reproduction part, but the method of reproducing the musical tone of the reproduction part is not limited to the above examples. For example, the audio signal Z may be generated by the reproduction control section 40 sequentially supplying the second musical score data D2 to the sound source section. The second musical score data D2 is supplied to the sound source section in parallel with the performance of the performance part by the performer. By supplying the acoustic signal Z generated by the sound source section to the sound emitting device 15, the musical tone of the reproduction part is reproduced. That is, the playback control section 40 functions as a sequencer that processes the second musical score data D2. The sound source section is a hardware sound source or a software sound source. The reproduction control unit 40 controls the timing of supplying the second musical score data D2 to the sound source unit according to the control information U[t].

(2) In each of the above-mentioned embodiments, the musical tone of the reproduction part is reproduced by the sound emitting device 15, but the reproduction part is not limited to the above examples. For example, the playback control unit 40 may cause an electronic musical instrument capable of automatic performance to play the musical tone of the playback part. That is, the playback control unit 40 causes the electronic musical instrument to automatically perform the playback part by controlling the electronic musical instrument according to the control information U[t].

Furthermore, the playback control unit 40 may, for example, control the playback of a video related to the playback part (hereinafter referred to as "target video"). The target video is a video that shows a specific performer playing the playback part of the target song. For example, the target video may be a captured video of a real performer playing the playback part on an instrument, or a composite video generated by image processing of a virtual performer playing the playback part. Ru. Note that it does not matter whether or not there is sound in the target video.

Video data representing the target video is stored in the storage device 12. The playback control unit 40 displays the target moving image on the display device 13 by outputting the moving image data. The playback control unit 40 controls playback of the target video according to the control information U[t]. Specifically, the playback control unit 40 generates playback information P[t] from the control information U[t], and displays a portion of the target video that corresponds to the playback information P[t] on the display device 13. . That is, the playback position p1[t] and playback speed p2[t] of the target moving image are controlled in conjunction with the performance of the performance part by the performer.

The virtual performer (hereinafter referred to as "virtual performer") represented by the target video is, for example, an avatar existing in the virtual space. For example, the playback control unit 40 displays on the display device 13 a virtual performer and a background image photographed by a virtual camera in the virtual space. The display device 13 may be installed in an HMD (Head Mounted Display) that is worn on the user's head. When the display device 13 is installed in the HMD, the position and direction of the virtual camera in the virtual space are dynamically controlled according to the behavior (eg, position and direction) of the user's head. Therefore, by moving their head appropriately, the user can visually recognize the virtual performer from any position and direction in the virtual space.

The video data for displaying the virtual performer in the virtual space includes, for example, motion data representing the movements of the skeleton and joints of the virtual performer. The motion data specifies, for example, changes in relative angle and position over time for each of the skeleton and joints. The reproduction control unit 40 controls movement of the skeleton and joints represented by the motion data according to control information U[t] (or reproduction information P[t]). Furthermore, the playback control unit 40 generates a virtual performer in a posture specified by the motion data as an object in the virtual space. For example, the virtual performer in the virtual space is controlled to have a posture corresponding to the skeleton and joints specified by the portion of the motion data that corresponds to the playback information P[t]. That is, the reproduction control unit 40 changes the speed of movement of the skeleton and joints specified by the motion data according to the control information U[t]. Therefore, the performance by the virtual performer in the virtual space progresses in conjunction with the performance by the performer in the real space. For example, image processing such as modeling and texturing is used to generate a three-dimensional virtual performer. Then, the playback control unit 40 generates a planar image (target moving image) of the virtual performer in the virtual space captured by the virtual camera, through image processing such as rendering, for example. As mentioned above, the position and direction of the virtual camera change depending on the behavior of the user's head. The playback control unit 40 displays the target video generated by the above processing on the display device 13. Therefore, as described above, the user can view the virtual performer playing the playback part from any position and direction in the virtual space. For example, a performer wearing an HMD can check from any position and direction in the virtual space how a virtual performer is playing a playback part in conjunction with the performer's performance of the playback part. Note that in the above description, a virtual performer who plays the playback part is displayed, but for example, a virtual dancer who dances in conjunction with the progress of the playback part may be displayed on the display device 13. Virtual performers and virtual dancers are collectively represented as virtual performers. In the above description, the display device 13 is attached to the user's head. person may be displayed.

As understood from the above example, the playback control unit 40 is comprehensively expressed as an element that controls the playback of the playback part. "Reproduction of the reproduction part" includes reproduction of the musical tone of the reproduction part and reproduction of the moving image (target moving image) of the reproduction part. The display device 13 and the sound emitting device 15 are playback devices that play back the playback part.

According to the first to third embodiments, it is possible to control the reproduction of musical tones related to the reproduction part according to the performance by the performer. On the other hand, according to this modification, it is possible to control the reproduction of the moving image related to the reproduction part in accordance with the performance by the performer.

(3) In each of the above embodiments, the performance data E representing the performance by the performer is supplied to the playback control system 10, but the input information corresponding to the performance by the performer is limited to the performance data E. Not done. For example, a signal representing the waveform of a musical tone played by a performer (hereinafter referred to as a "performance signal") may be supplied to the playback control system 10 instead of the performance data E. The performance signal is a signal generated by collecting musical tones produced by a musical instrument during a performance by a performer using a microphone.

The performance prediction unit 31 generates performance information S[t] by analyzing the performance signal. For example, the analysis unit 311 estimates the performance time t[k] and the performance position s[k] by analyzing the performance signal. The prediction unit 312 generates performance information S[t] using a prediction model, as in the first embodiment. The above configuration also achieves the same effects as those of the above-described embodiments.

(4) In each of the above-mentioned embodiments, the state space model is exemplified as the prediction model used to predict the performance information S[t], but the form of the prediction model is not limited to the above examples. For example, a statistical model such as a deep neural network or a hidden Markov model may be used as a predictive model.

(5) In each of the above embodiments, the performance information S[t] includes the performance position s1[t] and the performance speed s2[t], but the format of the performance information S[t] is as follows. Not limited to examples. For example, the performance speed s2[t] may be omitted. That is, the performance information S[t] is comprehensively expressed as information including the performance position s1[t]. Similarly, the reproduction information P[t] is not limited to information including the reproduction position p1[t] and the reproduction speed p2[t]. For example, the playback speed p2[t] may be omitted. That is, the playback information P[t] is comprehensively expressed as information including the playback position p1[t].

As understood from the above description, the formats of the state variable X[t] and the control information U[t] are not limited to the examples in each of the above-mentioned forms. For example, the speed error x2[t] may be omitted from the state variable X[t]. Furthermore, the playback speed u2[t] may be omitted from the control information U[t].

(6) In each of the above embodiments, the predictive control unit 30 generates the control information U[t] by model predictive control using one predictive model, but a plurality of different predictive models are selected. It may be used for The prediction control unit 30 generates control information U[t] for one or more playback parts of the target song using any one of the plurality of prediction models.

For example, a prediction model is prepared for each performer. Each performer's prediction model is a state space model that reflects the performance tendency of the performer. The predictive control unit 30 generates control information U[t] for one or more playback parts of the target song by using a prediction model corresponding to the performer of the performance part from among the plurality of prediction models. Note that a prediction model may be prepared for each set of a plurality of performers (for example, for each orchestra).

Additionally, a prediction model may be prepared for each attribute of the target song, for example. The attributes of the target song are, for example, the music genre of the target song (for example, rock, pop, jazz, trance, hip-hop, etc.) or the musical impression (for example, "a song with a bright impression", "a song with a dark impression", etc.). . The prediction control unit 30 generates control information U[t] for one or more playback parts of the target song by using a prediction model corresponding to the attribute of the target song among the plurality of prediction models.

In the above configuration, even when the music data D and the performance data E are common, the reproduction of the reproduction part can be controlled in various ways according to the selection conditions of the prediction model (for example, performer or attribute).

(7) For example, the playback control system 10 may be realized by a server device that communicates with a terminal device such as a mobile phone or a smartphone. For example, the predictive control unit 30 of the playback control system 10 generates the control information U[t] by processing the performance data E (or performance signal) received from the terminal device. For example, the music data D stored in the storage device 12 of the playback control system 10 or the music data D transmitted from the terminal device is used to generate the control information U[t]. The playback control unit 40 transmits a portion of the audio signal Z (or video data of the target video) that corresponds to the control information U[t] to the terminal device. Note that in a configuration in which the playback control unit 40 is installed in a terminal device, the control information U[t] may be transmitted from the playback control system 10 to the terminal device.

(8) As described above, the functions of the playback control system 10 according to each of the above embodiments are realized through cooperation between one or more processors forming the control device 11 and the program stored in the storage device 12. . The programs exemplified above may be provided in a form stored in a computer-readable recording medium and installed on a computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but any known recording medium such as a semiconductor recording medium or a magnetic recording medium is used. Also included are recording media in the form of. Note that the non-transitory recording medium includes any recording medium excluding transitory, propagating signals, and does not exclude volatile recording media. Furthermore, in a configuration in which a distribution device distributes a program via a communication network, a recording medium that stores a program in the distribution device corresponds to the above-mentioned non-transitory recording medium.

E: Supplementary Note From the configurations exemplified above, for example, the following configurations can be understood.

A playback control method according to one aspect (aspect 1) of the present disclosure includes a method for controlling at least one of the pieces of music through model predictive control using a prediction model that predicts performance information including a performance position in the piece of music for at least one performer. control information is generated for the playback part, and the playback of the playback part in the music piece is controlled by the control information generated for the at least one playback part. According to the aspect described above, since model predictive control is used to generate the control information, it is possible to appropriately control the reproduction of the reproduction part according to the performance by the performer.

"Performance information" is data in any format including the performance position. For example, the performance information includes a performance position and a performance speed. The performance position is the position in the song where the performer is playing. The performance speed is the speed (tempo) at which the performer plays the music. On the other hand, "control information" is data in any format for controlling reproduction of a reproduction part. For example, the control information includes the amount of change in playback position and the amount of change in playback speed.

The information and processing used to predict performance information are arbitrary. For example, it is assumed that performance data representing a performance by a performer or a performance signal representing a waveform of a musical tone played by a performer is used for predicting performance information. Further, for example, a video of a user playing a performance may be used for predicting performance information. Various prediction models are used to predict performance information. As the prediction model, for example, a state space model such as a Kalman filter is used.

The "playback part" is a music part that is to be controlled by the control information among the plurality of music parts that make up the song. "Reproduction of a playback part" includes not only playback of audio (for example, automatic performance) related to the playback part, but also playback of video related to the playback part.

In a specific example of Aspect 1 (Aspect 2), the at least one performer is a plurality of performers, and in predicting the performance information, the performance information is predicted for each of the plurality of performers. According to the above aspect, it is possible to appropriately control the reproduction of the reproduction part according to performances by a plurality of performers.

In a specific example of aspect 1 or aspect 2 (aspect 3), the at least one reproduction part is a plurality of reproduction parts, and in generating the control information, the control information is generated for each of the plurality of reproduction parts. do. According to the above aspect, the reproduction of each of the plurality of reproduction parts can be controlled according to the performance by the performer.

In a specific example of any one of Aspects 1 to 3 (Aspect 4), the playback control includes controlling the playback of musical tones related to the at least one playback part of the music piece. According to the above aspect, it is possible to control the reproduction of musical tones related to the reproduction part according to the performance by the performer.

In a specific example of any one of Aspects 1 to 3 (Aspect 5), the playback control includes controlling the playback of a video related to the at least one playback part of the song. According to the above aspect, it is possible to control the reproduction of the moving image related to the reproduction part according to the performance by the performer. The moving image is, for example, a moving image in which a virtual performer (for example, a performer or a dancer) in a virtual space performs a reproduction part.

In the specific example of any one of aspects 1 to 5 (aspect 6), in the model predictive control, playback including performance information predicted for the at least one performer and a playback position of the at least one playback part. Generating control information for the at least one playback part such that a cost including a state variable representing an error in the information and the at least one playback part is reduced. According to the above embodiment, the control information is generated so that the cost including the state variable representing the error between the performance information and the playback information is reduced. Therefore, it is possible to link the reproduction of the reproduction part to the performance by the performer.

"Reproduction information" is data in any format including the reproduction position. For example, the playback information includes a playback position and a playback speed. The playback position is the position within the song where the song is being played. The playback speed is the speed at which the song is played.

In the specific example of Aspect 6 (Aspect 7), at least one variable included in the cost is further set in accordance with an instruction from the user. According to the above aspect, since the variable related to the cost is set according to the instruction from the user, the user's intention can be reflected in the reproduction of the reproduction part.

The "variables" of a cost (objective function) are various variables that are applied to calculations related to the cost. Specifically, in a form where the objective variable includes a state cost and a control cost, the first weight value for the state cost and the second weight value for the control cost are set as "variables" according to instructions from the user. is set.

In a specific example of aspect 6 (aspect 8), the cost includes the state variable and the control information, a state cost and a control cost, the state cost is a cost related to the state variable, and the control cost is , is a cost related to temporal changes in the reproduction information. In the above aspect, the costs include state costs related to state variables and control costs related to temporal changes in reproduction information. By reducing the state cost, the error between performance information and playback information is effectively reduced. Further, by reducing control costs, excessive changes in reproduction information are suppressed. Therefore, errors between the performance position and the playback position and excessive changes in playback information can be effectively reduced.

In the specific example of aspect 8 (aspect 9), a first weight value and a second weight value are further set, the state cost is a cost weighted by the first weight value, and the control cost is a cost weighted by the first weight value, and the control cost is a cost weighted by the first weight value. This is the cost weighted by the second weight value. In the above aspect, the state cost is weighted by the first weight value, and the control cost is weighted by the second weight value. Therefore, the relationship between the performance by the performer and the reproduction of the reproduction part can be changed according to the settings of the first weight value and the second weight value.

In a specific example of aspect 9 (aspect 10), in setting the first weight value and the second weight value, the first weight value and the second weight value are changed according to instructions from the user. In the above aspect, each of the first weight value and the second weight value is set according to an instruction from the user. Therefore, the user can change the relationship between the performance by the performer and the reproduction of the reproduction part.

In a specific example of any one of aspects 1 to 10 (aspect 11), in predicting the performance information, the performance information is predicted by using any of a plurality of prediction models for predicting the performance information. Predict performance information about. According to the above aspect, the playback of the playback part can be controlled in various ways according to the selection conditions of the prediction model.

A playback control system according to one aspect (aspect 12) of the present disclosure performs model prediction control using a prediction model that predicts performance information including a performance position in a song for at least one performer. a predictive control unit that generates control information for the playback part; and a playback control unit that controls playback of the playback part in the song based on the control information generated for the at least one playback part.

A program according to one aspect (aspect 13) of the present disclosure performs playback of at least one piece of music through model predictive control using a prediction model that predicts performance information including a performance position in a piece of music for at least one performer. The computer system is caused to function as a predictive control unit that generates control information for the part, and a playback control unit that controls playback of the playback part in the music piece using the control information generated for the at least one playback part.

An information processing method according to one aspect (aspect 14) of the present disclosure provides a model predictive control using a predictive model that predicts performance information including a performance position in a song for at least one performer. generates control information for the playback part, controls the movement of the skeleton and joints represented by the motion data according to the control information, and moves a virtual demonstrator in a posture corresponding to the controlled skeleton and joints in a virtual space. An image of the virtual space captured by a virtual camera whose position and direction are controlled according to the behavior of the user's head is displayed on the display device. According to the above aspect, since model predictive control is used to generate the control information, it is possible to appropriately control the movements of the virtual performer in accordance with the performance by the performer.

100... Performance system, 10... Playback control system, 11... Control device, 12... Storage device, 13... Display device, 14... Operating device, 15... Sound emitting device, 20... Keyboard instrument, 30... Prediction control unit, 31... Performance prediction section, 311... Analysis section, 312... Prediction section, 32... Information generation section, 33... Arithmetic processing section, 34... Variable setting section, 40... Playback control section.

Claims

Generating control information for at least one playback part of the song by model predictive control using a prediction model that predicts performance information including a performance position in the song for at least one performer;
A playback control method realized by a computer system, wherein the control information generated for the at least one playback part controls the playback of the playback part in the music piece.
The at least one performer is a plurality of performers,
2. The playback control method according to claim 1, wherein in predicting the performance information, the performance information is predicted for each of the plurality of performers.
The at least one playback part is a plurality of playback parts,
3. The playback control method according to claim 1, wherein in generating the control information, the control information is generated for each of the plurality of playback parts.
4. The playback control method according to claim 1, wherein the playback control includes controlling playback of musical tones related to the at least one playback part of the music piece.
The playback control method according to any one of claims 1 to 3, wherein the playback control includes controlling playback of a video related to the at least one playback part of the song.
In the model predictive control, a cost including a state variable representing an error between performance information predicted for the at least one performer and playback information including a playback position of the at least one playback part is reduced. 6. The reproduction control method according to claim 1, further comprising: generating control information for the at least one reproduction part.
7. The playback control method according to claim 6, further comprising setting at least one variable included in the cost according to an instruction from a user.
The cost includes the state variable and the control information, a state cost and a control cost,
The state cost is a cost related to the state variable,
7. The playback control method according to claim 6, wherein the control cost is a cost related to temporal changes in the playback information.
Furthermore, setting a first weight value and a second weight value,
The state cost is a cost weighted by the first weight value,
The playback control method according to claim 8, wherein the control cost is a cost weighted by the second weight value.
10. The reproduction control method according to claim 9, wherein in setting the first weight value and the second weight value, the first weight value and the second weight value are changed according to an instruction from a user.
11. In predicting the performance information, performance information is predicted for the at least one performer using any one of a plurality of prediction models for predicting performance information. Playback control method.
A predictive control unit that generates control information for at least one playback part of the song by model predictive control using a predictive model that predicts performance information including a performance position in the song for at least one performer;
A playback control system, comprising: a playback control section that controls playback of the playback part of the song based on control information generated for the at least one playback part.
A predictive control unit that generates control information for at least one playback part of the song by model predictive control using a predictive model that predicts performance information including a performance position in the song for at least one performer, and
a playback control unit that controls the playback of the playback part in the song based on control information generated for the at least one playback part;
A program that makes a computer system function as a computer.
Generating control information for at least one playback part of the song by model predictive control using a prediction model that predicts performance information including a performance position in the song for at least one performer;
controlling movement of the skeleton and joints represented by the motion data according to the control information;
generating a virtual demonstrator in a virtual space in a posture corresponding to the controlled skeleton and joints;
An information processing method realized by a computer system that displays an image of the virtual space captured by a virtual camera whose position and direction are controlled according to the behavior of a user's head on a display device.