WO2022269796A1 - 装置、合奏システム、音再生方法、及びプログラム - Google Patents
装置、合奏システム、音再生方法、及びプログラム Download PDFInfo
- Publication number
- WO2022269796A1 WO2022269796A1 PCT/JP2021/023765 JP2021023765W WO2022269796A1 WO 2022269796 A1 WO2022269796 A1 WO 2022269796A1 JP 2021023765 W JP2021023765 W JP 2021023765W WO 2022269796 A1 WO2022269796 A1 WO 2022269796A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound
- performance
- performance sound
- venue
- estimation
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 13
- 230000005236 sound signal Effects 0.000 claims description 26
- 230000005540 biological transmission Effects 0.000 claims description 7
- 238000004891 communication Methods 0.000 description 34
- 238000010586 diagram Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 7
- 230000002194 synthesizing effect Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000009430 construction management Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007786 learning performance Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K15/00—Acoustics not otherwise provided for
- G10K15/02—Synthesis of acoustic waves
Definitions
- the present invention relates to a device, an ensemble system, a sound reproduction method, and a program.
- the present invention has been made in view of such circumstances, and its purpose is to reproduce sound received via a communication line without delay.
- One aspect of the present invention is a device provided at the first venue when a remote ensemble is performed at the first venue and the second venue, wherein the performance sound collected by the device provided at the second venue is the performance sound.
- One aspect of the present invention is an ensemble system that realizes remote ensemble performances at a first venue and a second venue, comprising: a first terminal device provided at the first venue; and a second terminal device provided at the second venue.
- the first terminal device comprises: a first acquisition unit that acquires a first performance sound at the first venue; and a first transmission unit that transmits the first performance sound to the second terminal device.
- a first receiving unit for receiving a second performance sound in the second venue from the second terminal device; and inputting the second performance sound received by the first receiving unit to a second performance sound estimation model.
- a first estimation unit for estimating a future second performance estimation sound in the second performance sound; and a first sound output unit for outputting the second performance estimation sound
- a second acquisition unit that acquires the second performance sound
- a second transmission unit that transmits the second performance sound to the first terminal device
- a second performance sound that receives the first performance sound from the first terminal device.
- a second receiving unit for estimating a future first estimated performance sound in the first performance sound by inputting the first performance sound received by the second reception unit into a first performance sound estimation model
- a second sound output unit that outputs the first estimated performance sound, and the first performance sound estimation model learns a first sound signal corresponding to the first performance sound.
- the ensemble system is a trained model trained to estimate the second estimated performance sound from the input second performance sound by learning two-tone signals.
- one aspect of the present invention is a sound reproduction method performed by a computer device provided in the first venue when a remote ensemble is performed in a first venue and a second venue, wherein the device provided in the second venue is Inputting the collected performance sound into a performance sound estimation model, estimating a future performance estimation sound in the performance sound, and learning a sound signal corresponding to the performance sound, the performance sound estimation model learns a sound signal corresponding to the performance sound.
- the sound reproduction method is a trained model trained to estimate the performance estimation sound from the performance sound obtained from the performance sound.
- a computer device provided at the first venue plays the performance sound collected by the device provided at the second venue.
- a program for inputting a sound estimation model and estimating a future performance estimation sound in the performance sound, wherein the performance sound estimation model learns a sound signal corresponding to the performance sound, thereby estimating the input performance A program that is a trained model trained to estimate the performance estimation sound from a sound.
- the sound received via the communication line can be played without delay.
- FIG. 1 is a schematic diagram showing an outline of an ensemble system 1 according to an embodiment
- FIG. 1 is a block diagram showing an example of the configuration of an ensemble system 1 according to an embodiment
- FIG. It is a figure which shows the example of the learned model 120 which concerns on embodiment.
- 4 is a sequence diagram illustrating the flow of processing performed by the concert playing system 1 according to the embodiment;
- FIG. 4 is a flowchart for explaining the flow of processing performed by the player terminal 10 according to the embodiment;
- the ensemble system 1 according to the embodiment will be described below with reference to the drawings.
- An example of a session (remote ensemble) between remote performers using the ensemble system 1 will be described below. It is not limited to this, and it is possible to apply the ensemble system 1 according to the present embodiment when synthesizing arbitrary contents other than sound.
- FIG. 1 is a schematic diagram showing an overview of an ensemble system 1 according to an embodiment.
- the ensemble system 1 is a system that transmits in real time the sound of a performance performed by a performer to another performer who is remotely located.
- a sound (first performance sound) associated with a performance at a venue E1 is picked up by a microphone MC1 and transmitted to a session partner venue E2 via a communication network NW. .
- the first performance sound received via the communication network NW is output from the speaker SP2.
- the performance sound (second performance sound) at the venue E2 is picked up by the microphone MC2 and transmitted to the venue E1 via the communication network NW.
- the second performance sound received via the communication network NW is output from the speaker SP1.
- the first performance sound and the second performance sound are transmitted to the distribution server 20 , mixed, and distributed to the viewer terminal 30 via the distribution server 20 .
- the future performance sound is estimated from the session partner's performance sound received via the communication network NW.
- the future performance sound is a sound to be played at a future performance position (T+ ⁇ t) from the performance position T in the received performance sound of the session partner.
- the second performance sound is received at the venue E1, and the future performance sound of the second performance sound is estimated based on the received second performance sound.
- the first performance sound is received at the venue E2, and the future performance sound of the first performance sound is estimated based on the received first performance sound.
- a trained model is used for estimation.
- a trained model is a model that has learned a sound signal associated with a performance sound. The trained model is trained to estimate the future performance sound of the input performance sound from the input performance sound.
- the learned model is created by executing machine learning (for example, deep learning) of the learning model using the sound signal of the performance sound as learning data.
- a learning model is, for example, a model such as a neural network or a multi-tree.
- the sound signal of the learning data is, for example, an acoustic signal obtained by picking up the performance sound of a musical instrument with a microphone.
- the sound signal includes time-series data in which instruction data indicating the content of the performance and time data indicating the time point at which the instruction data is generated are arranged.
- the instruction data designates pitch (note number) and strength (velocity) to instruct various events such as sounding and muting.
- the time data specifies, for example, an interval (delta time) between successive instruction data.
- performance sounds received via the communication network NW are input to the learned model.
- the trained model estimates and outputs future performance sounds for the input performance sounds.
- a future performance sound estimated by the trained model is output from the speaker.
- the second performance sound is received at the venue E1, and the received second performance sound is input to the learned model (second performance sound estimation model).
- the second performance sound estimation model is a model that has learned a sound signal related to the second performance sound.
- the second performance sound estimation model estimates the future performance sound of the input second performance sound.
- the performance sound estimated by the second performance sound estimation model is output from the speaker SP1.
- the first performance sound is received at the venue E2, and the received first performance sound is input to the learned model (first performance sound estimation model).
- the first performance sound estimation model is a model that has learned a sound signal related to the first performance sound.
- the first performance sound estimation model estimates the future performance sound of the input first performance sound.
- the performance sound estimated by the first performance sound estimation model is output from the speaker SP2.
- the ensemble system 1 of the present embodiment can estimate and output future performance sounds received via the communication network NW. Therefore, even if the performance sound at the performance position T delayed from the actual performance position (T+ ⁇ t) is received due to the transmission delay, the performance sound at the actual performance position (T+ ⁇ t) is estimated and output. It is possible. Therefore, it is possible to reproduce the sound received via the communication line without delay.
- the sound signal of the learning data used for learning may be arbitrarily determined.
- the sound signal of the learning data may be at least a sound signal corresponding to the performance sound to be estimated, but is preferably a sound played in a performance mode similar to the performance sound to be estimated. This is because it is possible to improve the accuracy of estimation by learning performance sounds with similar performance styles.
- the sound signal of the learning data is preferably the sound played by the performer who will actually perform in the actual remote ensemble.
- the sound signal of the learning data is preferably the sound of a musical instrument that is actually played in the actual remote ensemble.
- the sound signal of the learning data is, for example, the performance sound (rehearsal sound source) played in the rehearsal. By using the rehearsal sound source, it is possible to accurately estimate the performance sound in the actual remote ensemble performance.
- FIG. 2 is a block diagram showing an example of the configuration of the ensemble system 1 according to the embodiment.
- the ensemble system 1 is applicable when a plurality of player terminals 10 (player terminals 10-1 to 10-N, where N is a natural number different from 1) performs remote performance.
- the ensemble system 1 includes, for example, three player terminals 10-1 to 10-3, a distribution server 20, and an audience terminal 30. It should be noted that a plurality of viewer terminals 30 may be provided in the ensemble system 1 .
- the performer terminal 10-1 is a computer device such as a smart phone, a mobile terminal, a tablet, or a PC (Personal Computer) provided at the venue E1 in FIG.
- the speaker section 15 provided in the player terminal 10-1 corresponds to the speaker SP1 in FIG.
- a microphone section 16 provided in the player terminal 10-1 corresponds to the microphone MC1 in FIG.
- the performer terminal 10-2 is a computer device such as a smart phone, a mobile terminal, a tablet, or a PC provided at the venue E2 in FIG.
- the speaker section 15 included in the player terminal 10-2 corresponds to the speaker SP2 in FIG.
- the microphone section 16 provided in the player terminal 10-2 corresponds to the microphone MC2 in FIG. Although omitted in FIG. 1, the same applies to the player terminal 10-3.
- the player terminals 10-1 to 10-3 are simply referred to as "player terminals 10" when not distinguished.
- the communication network NW is, for example, a wide area network, that is, a WAN (Wide Area Network), the Internet, or a combination thereof.
- WAN Wide Area Network
- the player terminal 10 includes, for example, a communication section 11, a storage section 12, a control section 13, a display section 14, a speaker section 15, and a microphone section 16.
- the communication unit 11 communicates with the distribution server 20.
- the storage unit 12 is configured by storage media such as HDD, flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), RAM (Random Access read/write Memory), ROM (Read Only Memory), or a combination thereof.
- the storage unit 12 stores programs for executing various processes of the player terminal 10 and temporary data used when performing various processes.
- the storage unit 12 stores a trained model 120, for example.
- the trained model 120 is information necessary to construct the trained model.
- the information necessary for constructing the trained model includes the configuration of the trained model, set values of parameters to be used, and the like.
- the trained model is a CNN (Convolutional Neural Network) configuration comprising an input layer, an intermediate layer, and an output layer
- the configuration of the trained model is the number of units in each layer, the number of intermediate layers, This is information indicating an activation function and the like.
- the parameters to be used are information indicating coupling coefficients and weights for coupling nodes in each hierarchy.
- FIG. 3 is a diagram showing an example of a trained model 120-1 stored in the player terminal 10-1.
- FIG. 4 is a diagram showing an example of the trained model 120-2 stored in the player terminal 10-2.
- FIG. 5 is a diagram showing an example of a trained model 120-3 stored in player terminal 10-3.
- the trained models 120-1 to 120-3 are simply referred to as "trained models 120" when they are not distinguished.
- the learned model 120 includes items such as target venue number, performance type, and learned model.
- the target venue No. is identification information such as a number that uniquely identifies the venue where the performance will be performed.
- the performance type is information indicating the type of performance performed at the venue specified by the target venue No. For example, the musical instrument to be played.
- the learned model is a learned model corresponding to the performance sound of the performance performed at the venue specified by the target venue number.
- the example of FIG. 3 shows that the trained model 120-1 stores a second trained model and a third trained model.
- the second learned model is a model for estimating the future performance sound corresponding to the performance sound of the trumpet that will be performed at the venue specified by the target venue No. (2).
- the third trained model is a model for estimating the future performance sound corresponding to the performance sound of the trumpet that will be performed at the venue specified by the target venue No. (3).
- the venue specified by the target venue No. (1) corresponds to the venue where the player terminal 10-1 is provided.
- the venue specified by the target venue No. (2) or the target venue No. (3) corresponds to the venue where the session partner is present.
- the example of FIG. 4 shows that the trained model 120-2 stores the first trained model and the third trained model.
- the first trained model is a model for estimating the future performance sound corresponding to the performance sound of the trumpet performed at the venue specified by the target venue No. (1).
- the third trained model is a model for estimating the future performance sound corresponding to the performance sound of the trumpet that will be performed at the venue specified by the target venue No. (3).
- the venue specified by the target venue No. (2) corresponds to the venue where the player terminal 10-2 is provided.
- the venue specified by the target venue No. (1) or the target venue No. (3) corresponds to the venue where the session partner is present.
- the example of FIG. 5 shows that the trained model 120-3 stores a first trained model and a second trained model.
- the first trained model is a model for estimating the future performance sound corresponding to the performance sound of the trumpet performed at the venue specified by the target venue No. (1).
- the second learned model is a model for estimating the future performance sound corresponding to the performance sound of the trumpet that will be performed at the venue specified by the target venue No. (2).
- the venue specified by the target venue No. (3) corresponds to the venue where the performer terminal 10-3 is provided.
- the venue specified by the target venue No. (1) or the target venue No. (2) corresponds to the venue where the session partner is present.
- the trained model 120 stores a trained model for estimating a performance sound to be a session partner.
- control unit 13 is implemented by causing a CPU (Central Processing Unit) provided as hardware in the player terminal 10 to execute a program.
- the control unit 13 controls the player terminal 10 in an integrated manner.
- the control unit 13 controls the communication unit 11, the storage unit 12, the display unit 14, the speaker unit 15, and the microphone unit 16, respectively.
- the control unit 13 includes, for example, an acquisition unit 130, an estimation unit 131, an output unit 132, and a distribution unit 133.
- Acquisition unit 130 acquires the performance sound of the session partner.
- the acquisition unit 130 outputs the acquired performance sound to the estimation unit 131 .
- the estimation unit 131 estimates future performance sounds by inputting the performance sounds acquired from the acquisition unit 130 into the learned model.
- the estimation unit 131 outputs the estimated performance sound to the output unit 132 .
- the output unit 132 causes the speaker unit 15 to output the performance sound acquired from the estimation unit 131 . As a result, the future performance sound of the session partner is emitted from the speaker section 15 .
- the output unit 132 may output sounds obtained by mixing future performance sounds in the performance sounds of the session partners.
- the distribution unit 133 transmits the performance sound picked up by the microphone unit 16 to the session partner player terminal 10 and the distribution server 20 via the communication unit 11 .
- the display unit 14 includes a display device such as a liquid crystal display, and displays an image such as a video of the session partner's performance in accordance with the control of the control unit 13 .
- the speaker unit 15 outputs the performance sound of the session partner according to the control of the control unit 13 .
- the distribution server 20 is a computer device that distributes images and sounds related to performances.
- the distribution server 20 is, for example, a server device, a cloud, a PC, or the like.
- the distribution server 20 includes, for example, a communication unit 21, a storage unit 22, and a control unit 23.
- the communication unit 21 communicates with each player terminal 10 and the audience terminal 30 .
- the storage unit 22 is configured by, for example, a storage medium such as an HDD, flash memory, EEPROM, RAM, ROM, or a combination thereof.
- the storage unit 22 stores programs for executing various processes of the distribution server 20 and temporary data used when performing various processes.
- the storage unit 22 stores distribution information 220, for example.
- the distribution information 220 is information about sounds to be distributed.
- the distribution information 220 is, for example, information indicating a list of the viewer terminals 30 to which the content is distributed and the content to be distributed.
- the control unit 23 is implemented by causing a CPU provided as hardware in the distribution server 20 to execute a program.
- the control unit 23 includes an acquisition unit 230, a synthesis unit 231, and a distribution unit 232, for example.
- the acquisition unit 230 acquires performance sounds from each player terminal 10 .
- the acquiring unit 230 outputs information indicating each acquired performance sound to the synthesizing unit 231 .
- the synthesizing unit 231 generates a synthetic sound (ensemble sound) by mixing the performance sounds acquired from the acquiring unit 230 .
- the synthesizing unit 231 generates a synthesized sound by, for example, compressing each sound source and adding the compressed sound sources.
- the synthesizing unit 231 outputs the generated synthetic sound to the distributing unit 232 .
- the distribution unit 232 distributes the synthesized sound acquired from the synthesis unit 231 to the viewer terminal 30.
- the viewer terminal 30 is the viewer's computer device.
- the viewer terminal 30 is, for example, a smart phone, a PC, a tablet terminal, or the like.
- the viewer terminal 30 includes, for example, a communication section 31 , a storage section 32 , a control section 33 , a display section 34 and a speaker section 35 .
- the communication unit 31 communicates with the distribution server 20.
- the storage unit 32 is configured by a storage medium such as HDD, flash memory, EEPROM, RAM, ROM, or a combination thereof.
- the storage unit 32 stores programs for executing various processes of the viewer terminal 30 and temporary data used when performing various processes.
- the control unit 33 is implemented by causing the CPU provided as hardware in the viewer terminal 30 to execute a program.
- the control unit 33 comprehensively controls the viewer terminal 30 .
- the control unit 33 controls the communication unit 31, the storage unit 32, the display unit 34, and the speaker unit 35, respectively.
- the display unit 34 includes a display device such as a liquid crystal display, and displays images such as images of live performances related to the remote ensemble according to the control of the control unit 33 .
- the speaker unit 35 outputs ensemble sounds of the live performance related to the remote ensemble under the control of the control unit 33 .
- FIG. 6 is a sequence diagram explaining the flow of processing performed by the ensemble system 1 according to the embodiment.
- a case in which two player terminals 10-1 and 10-2 perform remote performance will be described as an example.
- the performer terminal 10-1 collects the performance sound at its own venue, and transmits the collected performance sound to the performer terminal 10-2 and the distribution server 20 (step S10).
- the own venue here is the venue where the player terminal 10-1 is provided.
- the performer terminal 10-2 receives the performance sound of the other venue, and performs sound processing of the received performance sound of the other venue (step S11).
- the other venue here is the venue where the player terminal 10-1 is provided. The flow of sound processing will be described later in detail.
- the performer terminal 10-2 picks up the performance sound at its own venue and transmits the picked-up performance sound to the performer terminal 10-1 and the distribution server 20 (step S12).
- the own venue here is the venue where the player terminal 10-2 is provided. The player terminal 10-2 repeatedly executes the processing shown in steps S11 and S12 until the session ends.
- the performer terminal 10-1 receives the performance sound of the other venue, and performs sound processing of the received performance sound of the other venue (step S13).
- the other venue here is the venue where the player terminal 10-2 is provided.
- the player terminal 10-1 repeatedly executes the processes shown in steps S10 and S13 until the session ends.
- the distribution server 20 receives the performance sound of the first venue (step S14).
- the first venue here is the venue where the player terminal 10-1 is provided.
- the distribution server 20 receives the performance sound of the second venue (step S15).
- the second venue here is the venue where the player terminal 10-2 is provided.
- the distribution server 20 mixes the performance sound at the first venue and the performance sound at the second venue (step S16).
- the distribution server 20 transmits the mixed ensemble sound to the viewer terminal 30 (step S17).
- the viewer terminal 30 receives the ensemble sound distributed from the distribution server 20, outputs the received ensemble sound to the speaker unit 35, and reproduces it (step S18).
- FIG. 7 is a flowchart explaining the flow of sound processing performed by the player terminal 10 according to the embodiment.
- the performer terminal 10 receives the performance sound of another venue (step S20).
- the player terminal 10 estimates a performance sound at a performance position (T+ ⁇ t) that is advanced by time ⁇ t from the performance position T of the received performance sound (step S21).
- the player terminal 10 outputs the estimated performance sound from the speaker section 15 (step S22).
- the performer terminal 10 picks up the sound of the performance at the venue by the microphone unit 16 (step S23).
- the performer terminal 10 transmits the performance sound collected at its own venue to the session partner performer terminal 10 and the distribution server 20 (step S24).
- the performer terminal 10 of the embodiment is provided at the venue E1 when performing remote ensemble performances at the venues E1 and E2.
- the player terminal 10 includes an estimation section 131 .
- the estimating unit 131 estimates the estimated future performance sound in the performance sound.
- the performance sound is the sound picked up by a device (for example, the performer terminal 10-2) provided in the venue E2.
- the estimation unit 131 inputs the performance sound to the performance sound estimation model to estimate the performance estimation sound.
- the performance sound estimation model is a trained model for estimating the performance estimation sound from the input performance sound.
- a performance sound estimation model is a trained model that has learned a sound signal corresponding to a performance sound.
- the player terminal 10 is an example of a "device".
- the case where the performer terminal 10 provided at the venue E estimates and outputs performance sounds at other venues has been exemplified and explained. However, it is not limited to this. Any device provided at least in the venue E may be configured to estimate and output the performance sound of another venue.
- the devices provided at the hall E are, for example, a distribution server device for distributing ensemble sounds, or a computer device such as a mixer for mixing the sounds of each hall.
- the ensemble system 1 of the embodiment also includes performer terminals 10-1 and 10-2.
- the performer terminal 10-1 is provided at the venue E1.
- the performer terminal 10-2 is provided at the venue E2.
- the player terminal 10 includes an acquisition section 130 , a communication section 11 , an estimation section 131 and an output section 132 .
- Acquisition unit 130 of performer terminal 10-1 acquires the first performance sound at venue E1.
- the communication unit 11 of the player terminal 10-1 transmits the first performance sound to the player terminal 10-2.
- the communication unit 11 of the performer terminal 10-1 receives the second performance sound at the venue E2 from the performer terminal 10-2.
- the estimation unit 131 of the player terminal 10-1 estimates a future performance sound (second performance estimation sound) in the second performance sound received by the communication unit 11.
- the estimation unit 131 performs estimation using a trained model (second performance sound estimation model).
- the output unit 132 of the player terminal 10-1 outputs the estimated sound.
- the acquisition unit 130 of the performer terminal 10-2 acquires the second performance sound at the venue E2.
- the communication unit 11 of the player terminal 10-2 transmits the second performance sound to the player terminal 10-1.
- the communication unit 11 of the player terminal 10-2 receives the first performance sound from the player terminal 10-1.
- the estimation unit 131 of the player terminal 10-2 estimates a future performance sound (first estimated performance sound) in the first performance sound received by the communication unit 11.
- FIG. The estimation unit 131 performs estimation using a trained model (first performance sound estimation model).
- the output unit 132 of the player terminal 10-2 outputs the estimated sound.
- a trained model is a model that has learned a sound signal related to a performance sound (first performance sound).
- a trained model is a model that has learned a sound signal related to a performance sound (second performance sound).
- the learned model may be a model that has learned the sound signal related to the rehearsal sound source. As a result, it is possible to accurately estimate the performance sound.
- a program for realizing the functions of the processing unit (control unit 13) in FIG. 1 is recorded in a computer-readable recording medium, and the program recorded in this recording medium is read into a computer system and executed. Construction management may be performed by It should be noted that the "computer system” referred to here includes hardware such as an OS and peripheral devices.
- the "computer system” also includes the home page providing environment (or display environment) if the WWW system is used.
- the term "computer-readable recording medium” refers to portable media such as flexible discs, magneto-optical discs, ROMs and CD-ROMs, and storage devices such as hard discs incorporated in computer systems.
- the term “computer-readable recording medium” includes media that retain programs for a certain period of time, such as volatile memory inside computer systems that serve as servers and clients.
- the program may be for realizing part of the functions described above, or may be capable of realizing the functions described above in combination with a program already recorded in the computer system.
- the above program may be stored in a predetermined server, and distributed (downloaded, etc.) via a communication line in response to a request from another device.
Landscapes
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Electrophonic Musical Instruments (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ROM、CD-ROM等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、サーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものを含むものとする。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。また、上記のプログラムを所定のサーバに記憶させておき、他の装置からの要求に応じて、当該プログラムを、通信回線を介して配信(ダウンロード等)させるようにしてもよい。
Claims (5)
- 第1会場と第2会場で遠隔合奏を行う場合において前記第1会場に設けられる装置であって、
前記第2会場に設けられる装置が収音した演奏音を演奏音推定モデルに入力し、当該演奏音における未来の演奏推定音を推定する推定部、
を有し、
前記演奏音推定モデルは、前記演奏音に対応する音信号を学習することによって、入力された前記演奏音から、前記演奏推定音を推定するように学習された学習済モデルである、
装置。 - 前記演奏音推定モデルは、前記演奏音に対応するリハーサル音源を学習する、
請求項1に記載の装置。 - 第1会場と第2会場での遠隔合奏を実現させる合奏システムであって、前記第1会場に設けられる第1端末装置と、前記第2会場に設けられる第2端末装置とを備え、
前記第1端末装置は、
前記第1会場における第1演奏音を取得する第1取得部と、
前記第1演奏音を、前記第2端末装置に送信する第1送信部と、
前記第2会場における第2演奏音を前記第2端末装置から受信する第1受信部と、
前記第1受信部により受信された前記第2演奏音を、第2演奏音推定モデルに入力することにより、前記第2演奏音における未来の第2演奏推定音を推定する第1推定部と、
前記第2演奏推定音を出力する第1音出力部と
を有し、
前記第2端末装置は、
前記第2演奏音を取得する第2取得部と、
前記第2演奏音を前記第1端末装置に送信する第2送信部と、
前記第1演奏音を前記第1端末装置から受信する第2受信部と、
前記第2受信部によって受信された第1演奏音を、第1演奏音推定モデルに入力することにより、前記第1演奏音における未来の第1演奏推定音を推定する第2推定部と、
前記第1演奏推定音を出力する第2音出力部と
を有し、
前記第1演奏音推定モデルは、前記第1演奏音に対応する第1音信号を学習することによって、入力された前記第1演奏音から、前記第1演奏推定音を推定するように学習された学習済モデルであり、
前記第2演奏音推定モデルは、前記第2演奏音に対応する第2音信号を学習することによって、入力された前記第2演奏音から、前記第2演奏推定音を推定するように学習された学習済モデルである、
合奏システム。 - 第1会場と第2会場で遠隔合奏を行う場合において前記第1会場に設けられるコンピュータ装置が行う音再生方法であって、
前記第2会場に設けられる装置が収音した演奏音を演奏音推定モデルに入力し、当該演奏音における未来の演奏推定音を推定し、
前記演奏音推定モデルは、前記演奏音に対応する音信号を学習することによって、入力された前記演奏音から、前記演奏推定音を推定するように学習された学習済モデルである、
音再生方法。 - 第1会場と第2会場で遠隔合奏を行う場合において前記第1会場に設けられるコンピュータ装置に、
前記第2会場に設けられる装置が収音した演奏音を演奏音推定モデルに入力させ、当該演奏音における未来の演奏推定音を推定させる、
プログラムであって、
前記演奏音推定モデルは、前記演奏音に対応する音信号を学習することによって、入力された前記演奏音から、前記演奏推定音を推定するように学習された学習済モデルである、
プログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202180099419.5A CN117501360A (zh) | 2021-06-23 | 2021-06-23 | 装置、合奏系统、音播放方法及程序 |
PCT/JP2021/023765 WO2022269796A1 (ja) | 2021-06-23 | 2021-06-23 | 装置、合奏システム、音再生方法、及びプログラム |
JP2023529312A JPWO2022269796A1 (ja) | 2021-06-23 | 2021-06-23 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/023765 WO2022269796A1 (ja) | 2021-06-23 | 2021-06-23 | 装置、合奏システム、音再生方法、及びプログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022269796A1 true WO2022269796A1 (ja) | 2022-12-29 |
Family
ID=84545313
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/023765 WO2022269796A1 (ja) | 2021-06-23 | 2021-06-23 | 装置、合奏システム、音再生方法、及びプログラム |
Country Status (3)
Country | Link |
---|---|
JP (1) | JPWO2022269796A1 (ja) |
CN (1) | CN117501360A (ja) |
WO (1) | WO2022269796A1 (ja) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005077485A (ja) * | 2003-08-28 | 2005-03-24 | National Institute Of Advanced Industrial & Technology | 多拠点におけるデュエット・合唱カラオケ制御方式 |
JP2010091794A (ja) * | 2008-10-08 | 2010-04-22 | Nippon Telegr & Teleph Corp <Ntt> | 遠隔デュエット方法、遠隔デュエットシステム、遠隔デュエットプログラムおよび遠隔デュエットプログラムを記録した記録媒体 |
JP2010112981A (ja) * | 2008-11-04 | 2010-05-20 | Ipix Co | 遠隔実演再生方法、装置 |
JP2011242560A (ja) * | 2010-05-18 | 2011-12-01 | Yamaha Corp | セッション端末及びネットワークセッションシステム |
JP2016206575A (ja) * | 2015-04-28 | 2016-12-08 | 株式会社第一興商 | 歌唱音声の伝送遅延に対応したカラオケシステム |
CN112447155A (zh) * | 2019-09-05 | 2021-03-05 | 中移(苏州)软件技术有限公司 | 一种电子乐谱翻页方法、装置及存储介质 |
-
2021
- 2021-06-23 JP JP2023529312A patent/JPWO2022269796A1/ja active Pending
- 2021-06-23 CN CN202180099419.5A patent/CN117501360A/zh active Pending
- 2021-06-23 WO PCT/JP2021/023765 patent/WO2022269796A1/ja active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005077485A (ja) * | 2003-08-28 | 2005-03-24 | National Institute Of Advanced Industrial & Technology | 多拠点におけるデュエット・合唱カラオケ制御方式 |
JP2010091794A (ja) * | 2008-10-08 | 2010-04-22 | Nippon Telegr & Teleph Corp <Ntt> | 遠隔デュエット方法、遠隔デュエットシステム、遠隔デュエットプログラムおよび遠隔デュエットプログラムを記録した記録媒体 |
JP2010112981A (ja) * | 2008-11-04 | 2010-05-20 | Ipix Co | 遠隔実演再生方法、装置 |
JP2011242560A (ja) * | 2010-05-18 | 2011-12-01 | Yamaha Corp | セッション端末及びネットワークセッションシステム |
JP2016206575A (ja) * | 2015-04-28 | 2016-12-08 | 株式会社第一興商 | 歌唱音声の伝送遅延に対応したカラオケシステム |
CN112447155A (zh) * | 2019-09-05 | 2021-03-05 | 中移(苏州)软件技术有限公司 | 一种电子乐谱翻页方法、装置及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022269796A1 (ja) | 2022-12-29 |
CN117501360A (zh) | 2024-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11785410B2 (en) | Reproduction apparatus and reproduction method | |
US20070287141A1 (en) | Internet based client server to provide multi-user interactive online Karaoke singing | |
US8779265B1 (en) | Networks of portable electronic devices that collectively generate sound | |
JP2019525571A5 (ja) | ||
US20100095829A1 (en) | Rehearsal mix delivery | |
KR102546398B1 (ko) | 레이턴시 없이 라이브에 가까운 라이브 인터넷 음악을 공연하고 녹음하는 방법 및 시스템 | |
KR102184378B1 (ko) | 인공지능 악기 서비스 제공 시스템 | |
Rossetti et al. | Live Electronics, Audiovisual Compositions, and Telematic Performance: Collaborations During the Pandemic | |
US20240129669A1 (en) | Distribution system, sound outputting method, and non-transitory computer-readable recording medium | |
CN115867902B (zh) | 用于使用音频波形样本表演和录制现场音乐的方法和系统 | |
US20160307551A1 (en) | Multifunctional Media Players | |
WO2022269796A1 (ja) | 装置、合奏システム、音再生方法、及びプログラム | |
JP2008089849A (ja) | リモート演奏システム | |
JP2010002732A (ja) | カラオケ映像録画装置 | |
JP6568351B2 (ja) | カラオケシステム、プログラム及びカラオケ音声再生方法 | |
JP2013156543A (ja) | 投稿再生装置及びプログラム | |
JP2014071226A (ja) | 音楽再生システム、音楽再生方法 | |
JP2013024915A (ja) | ドングル(dongle)装置および再生システム | |
JP6958676B1 (ja) | 制御方法および制御システム | |
JP2009244712A (ja) | 演奏システム及び録音方法 | |
WO2022208609A1 (ja) | 配信システム、配信方法、及びプログラム | |
JP4214908B2 (ja) | 教習用演奏再生表示システム | |
JP6565554B2 (ja) | カラオケシステム、サーバ、カラオケ装置 | |
JP7468111B2 (ja) | 再生制御方法、制御システムおよびプログラム | |
WO2024047815A1 (ja) | 盛り上がり尤度制御方法、盛り上がり尤度制御装置及び盛り上がり尤度制御プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21947082 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180099419.5 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023529312 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21947082 Country of ref document: EP Kind code of ref document: A1 |