WO2022018786A1 - Système de traitement sonore, dispositif de traitement sonore, procédé de traitement sonore et programme de traitement sonore - Google Patents

Système de traitement sonore, dispositif de traitement sonore, procédé de traitement sonore et programme de traitement sonore Download PDF

Info

Publication number
WO2022018786A1
WO2022018786A1 PCT/JP2020/028035 JP2020028035W WO2022018786A1 WO 2022018786 A1 WO2022018786 A1 WO 2022018786A1 JP 2020028035 W JP2020028035 W JP 2020028035W WO 2022018786 A1 WO2022018786 A1 WO 2022018786A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
user
unit
event
performer
Prior art date
Application number
PCT/JP2020/028035
Other languages
English (en)
Japanese (ja)
Inventor
篤 古城
洋介 栗原
Original Assignee
株式会社ウフル
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社ウフル filed Critical 株式会社ウフル
Priority to PCT/JP2020/028035 priority Critical patent/WO2022018786A1/fr
Priority to JP2021549976A priority patent/JP6951610B1/ja
Priority to JP2021155819A priority patent/JP2022020625A/ja
Publication of WO2022018786A1 publication Critical patent/WO2022018786A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones

Definitions

  • the present invention relates to a voice processing system, a voice processing device, a voice processing method, and a voice processing program.
  • Patent Document 1 proposes a technique that gives a fan the impression that the artist is singing or playing on the spot without showing the artist in front of the fan. ing.
  • a technology that can provide a new experience is desired.
  • a microphone for detecting a sound emitted by a user participating in an event, a performer position representing the position of the performer of the event in a predetermined area where the event is held, and a position of the user in the predetermined area.
  • a sound adjustment unit that adjusts the sound detected by the microphone based on the relationship with the audience position representing the above, a sound detected by the microphone corresponding to the first user who is the user, and adjusted by the sound adjustment unit, and the first A synthesizer that synthesizes the sound detected by the microphone corresponding to the second user who is a different user from the first user and adjusted to the sound adjustment unit, and an output unit that outputs data representing the sound synthesized by the synthesizer.
  • a sound processing system is provided.
  • a receiver that receives information, a sound adjustment unit that adjusts the sound emitted from the user and detected by the microphone based on the position information, and a transmission unit that transmits data representing the sound adjusted by the sound adjustment unit.
  • a voice processing device is provided.
  • a transmitter that transmits information
  • a receiver that receives sound data emitted from the user and adjusted based on the location information from the destination of the location information, and a destination corresponding to the first user who is the user.
  • a voice processing device including a synthesizer for synthesizing a sound represented by data received from and a sound represented by data received from a destination corresponding to a second user who is a user different from the first user. Will be done.
  • the sound emitted by the user participating in the event is detected by the microphone, the performer position representing the position of the performer of the event in the predetermined area where the event is held, and the user in the predetermined area.
  • the sound detected by the microphone is adjusted by the sound adjustment unit based on the relationship with the audience position representing the position of, and the sound is detected by the microphone corresponding to the first user who is the user and adjusted by the sound adjustment unit.
  • the sound and the sound detected by the microphone corresponding to the second user who is a different user from the first user and adjusted by the sound adjustment unit are synthesized by the synthesis unit, and the sound synthesized by the synthesis unit is represented.
  • a sound processing method including output of data is provided.
  • the relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area on the computer Voice processing that receives the position information indicating that, adjusts the sound emitted from the user and detected by the microphone based on the position information, and sends data representing the adjusted sound.
  • the program is offered.
  • the computer has a relationship between the performer position representing the position of the performer of the event in the predetermined area where the event is held and the audience position representing the position of the user participating in the event in the predetermined area.
  • Sending the indicated position information receiving sound data emitted from the user and adjusted based on the position information from the destination of the position information, and a destination corresponding to the first user who is the user.
  • a voice processing program is provided for synthesizing the sound represented by the data received from the first user and the sound represented by the data received from the destination corresponding to the second user who is a different user from the first user, and executing the sound. ..
  • FIG. 1 is a diagram showing a voice processing system according to the first embodiment.
  • the voice processing system 1 is, for example, an information processing system used for a service for distributing content at various events (appropriately referred to as a content distribution service).
  • the voice processing system 1 delivers the content expressing the performance of the performer P to the user U who is an audience.
  • the voice processing system 1 conveys the reaction of the user U who views the content to the performer P.
  • the performer P can perform a later performance while being aware of the reactions of the plurality of users U.
  • a plurality of users U can, for example, watch a performance that changes depending on their own reaction.
  • a plurality of users U can experience a sense of reality even when participating in an event in a remote environment, for example.
  • the voice processing system 1 can provide the user U with a new experience in the field of distributing the contents of various events.
  • any user is appropriately represented by the symbol U, and when the user U is distinguished, the alphabets a, b, ... Are represented by the reference numerals such as the user Ua and the user Ub.
  • Various events in which the voice processing system 1 is used include, for example, events related to at least one of music, drama, entertainment, sports, e-sports, competition, lectures, lectures, speeches, dialogues, discussions, and demonstration sales.
  • the voice processing system 1 may be used for other events.
  • An event is, for example, a place to show the skills of performer P to the audience.
  • the performer P may be referred to as an artist, performer, singer, actor, entertainer, speaker, or player.
  • the audience may be referred to as the audience, or viewer.
  • the event may be a form in which at least a part of the spectators participates for a fee, or a form in which at least a part of the spectators participates free of charge.
  • the event may be held in a real space, a virtual space (appropriately referred to as a cyber space), or may be held in parallel in the real space and the virtual space.
  • the symbol AE in FIG. 1 is a predetermined area where the event is held (appropriately referred to as an event venue AE).
  • the event venue AE may be an area of real space or an area of virtual space.
  • the event venue AE may be a facility that exists in the real space reproduced or expanded in the cyber space.
  • the participants of the event may include an audience who participates in the venue where the performer P performs, and an audience who participates in the venue where this venue is reproduced in cyberspace.
  • Reference numeral P1 in FIG. 1 is a performer position representing the position of the performer in the event venue AE.
  • the performer position P1 is, for example, a position preset with respect to the stage AE1 of the event venue AE.
  • the performer position P1 may be the center of the stage AE1.
  • the reference numeral AP in FIG. 1 is an area where the performer P performs (appropriately referred to as a demonstration area).
  • the demonstration area AP is, for example, outside the event venue AE.
  • the demonstration area AP is an arbitrary area in real space.
  • the demonstration area AP is, for example, a studio.
  • the demonstration area AP may be an area in the residence of the performer P, an area in a commercial facility, or another area.
  • the demonstration area AP may be an indoor area or an outdoor area (eg, a park, the roof of a building).
  • the performer P may be one person or a plurality of people.
  • the performance area AP corresponding to the first performer P may be different from the performance area AP corresponding to the second performer P.
  • the performer position P1 may be determined for each performer P or may be determined at a position representing the plurality of performers P.
  • the representative position of the plurality of performers P may be, for example, the position of each of the plurality of performers P represented by coordinates, and may be the average of the coordinates of the plurality of performers P (eg, the center of gravity of the positions of the plurality of performers).
  • the demonstration area AP may be an area within the event venue AE.
  • the demonstration area AP may be a stage in a real space venue.
  • the performer position P1 may be fixed or variable.
  • the speech processing system 1 may include a sensor that detects one or both of the position and movement of the performer P in the demonstration area AP, and may determine the performer position P1 based on the detection result of the sensor.
  • the speech processing system 1 may change the performer position P1 from the center of the stage AE1 to the outside when the performer P moves from the center of the performance area AP to the outside.
  • Reference numeral Q in FIG. 1 is an spectator position representing the position of the spectator in the event venue AE.
  • the spectator position of any user U is represented by the reference numeral Q, and when the spectator position is distinguished, the alphabets a, b, ... Are added to the reference numeral Q, such as the spectator position Qa and the spectator position Qb. It is represented by the code.
  • the spectator position Qa represents the position of the user Ua who is a spectator.
  • the spectator position Qb represents the position of the user Ub who is a spectator different from the user Ua.
  • the audience position Qb is a position farther from the performer position P1 than the audience position Qa.
  • the distance La between the performer position P1 and the audience position Qa is shorter than the distance Lb between the performer position P1 and the audience position Qb.
  • the reference numeral AU in FIG. 1 is an area (appropriately referred to as a viewing area AU) in which the user U, who is an audience, views the content of the event.
  • the viewing area AU is an arbitrary area in the real space.
  • the viewing area AU is, for example, outside the event venue AE.
  • the viewing area AU is, for example, an area in the residence of the user U.
  • the viewing area AU may be an area different from the area in the residence of the user U.
  • the viewing area AU may be an area of facilities such as a karaoke box, a movie theater, a theater, and a restaurant (eg, a sports bar).
  • the viewing area AU may be an area of a moving body such as a railroad, an aircraft, a ship, or a vehicle.
  • the viewing area AU may be an area within the event venue AE.
  • the event venue AE may be a drive-in theater
  • the viewing area AU may be an area in a vehicle parked in the event venue AE.
  • the viewing area AU may be indoors or outdoors.
  • a reference numeral in which the alphabets a, b, ... are added to the reference numeral AU, such as the viewing area AUa and the viewing area AUb. It is represented by.
  • the viewing area AUa is an area where the user Ua views the content
  • the viewing area AUb is an area where the user Ub views the content.
  • the number of viewers existing in one viewing area AU may be one for only the user U, or may be a plurality of viewers including the user U.
  • the voice processing system 1 includes a processing device 2, a terminal 3, a camera 4, a microphone 5, and a speaker 6.
  • the camera 4, the microphone 5, and the speaker 6 are arranged in the demonstration area AP.
  • the camera 4, the microphone 5, and the speaker 6 are connected to the processing device 2.
  • the camera 4, the microphone 5, and the speaker 6 are connected to an information processing device (eg, a computer) including a communication unit by wire or wirelessly.
  • This information processing device is communicably connected to the processing device 2 via, for example, an internet line.
  • the terminal 3 is arranged in the viewing area AU.
  • the terminal 3 is communicably connected to the processing device 2 via, for example, an internet line.
  • the microphone 5 detects the sound emitted by the performer P who performs in the demonstration area AP.
  • the voice emitted by the performer P includes, for example, one or both of the sound emitted by the performer P from the body (eg, voice, clapping) and the sound produced by the performer P by a tool (eg, musical instrument).
  • the video data captured by the camera 4 and the audio data detected by the microphone 5 are provided to the processing device 2.
  • the speaker 6 outputs sound to the performer P.
  • the speaker 6 outputs voice based on the voice data provided by the processing device 2.
  • the processing device 2 generates content using data (eg, video data, audio data) provided by the device (eg, camera 4, microphone 5) installed in the demonstration area AP.
  • the processing device 2 provides the generated content data via, for example, an internet line.
  • the voice processing system 1 does not have to include at least one of a camera 4, a microphone 5, and a speaker 6.
  • at least one of the camera 4, the microphone 5, and the speaker 6 may be an external device of the voice processing system 1.
  • the voice processing system 1 may be in the form of using the above-mentioned external device.
  • the camera according to the embodiment may be referred to as an image pickup device.
  • the microphone according to the embodiment may be referred to as a voice detection device.
  • the speaker according to the embodiment may be referred to as an audio output device.
  • the terminal 3 is arranged for each viewing area AU, for example.
  • the terminal 3 arranged in the viewing area AU is appropriately distinguished, it is represented by a reference numeral to which an alphabet is added, such as the terminal 3a and the terminal 3b.
  • the terminal 3a is a terminal arranged in the viewing area AUa
  • the terminal 3b is a terminal arranged in the viewing area AUb.
  • the terminal 3b executes the same processing as the terminal 3a, for example.
  • the terminal 3b has the same reference numerals as the terminal 3a for the same configuration as the terminal 3a, and the description of the terminal 3b overlapping with the terminal 3a is omitted or simplified.
  • the terminal 3 acquires the data of the content provided by the processing device 2, for example, via an internet line.
  • the terminal 3 uses the acquired data to provide content to the user.
  • the terminal 3 detects the sound emitted by the user.
  • the sound produced by the user includes, for example, one or both of the sound produced by the user from the body (eg, voice, clapping) and the sound produced by the user by a tool (eg, noise).
  • the sound emitted by the user is appropriately referred to as an audience sound.
  • the terminal 3 adjusts the audience sound based on the relationship between the performer position P1 and the audience position Q associated with the own device. For example, the terminal 3 adjusts the audience sound so that the longer the distance between the performer position P1 and the audience position Q, the lower the volume of the audience sound.
  • the terminal 3 provides the adjusted audience sound data, for example, via an internet line.
  • the processing device 2 synthesizes the audience sounds supplied from the plurality of terminals 3 and provides the combined audience sound data.
  • the speaker 6 outputs the spectator sound to the performer P based on the spectator sound data supplied from the processing device 2.
  • the audience sound R output from the speaker 6 includes, for example, a component Ra derived from the audience sound from the user Ua and a component Rb derived from the audience sound from the user Ub.
  • the audience position Qa corresponding to the user Ua is closer to the performer position P1 than the audience position Qb corresponding to the user Ub, and the component Ra derived from the audience sound of the user Ua is the component Rb derived from the audience sound of the user Ub.
  • the volume is louder than that.
  • the audience sound output from the speaker 6 contains a component adjusted based on the audience position Q, it contributes to, for example, conveying a live feeling to the performer.
  • the performer P can perform a performance so as to respond to the reaction of a plurality of users U by listening to the audience sound having a live feeling.
  • the plurality of users U can enjoy, for example, live performance content.
  • the user U applies for participation in the event (appropriately referred to as an application for participation) before the event is held.
  • the application destination of the participation application is the event organizer.
  • the event organizer accepts the participation of the user U, the event organizer sets the terminal 3 for the user U.
  • the event organizer stores various information (eg, data, program) used for providing the service described with reference to FIG. 1 in the terminal 3.
  • the event organizer provides the set terminal 3 to the user U.
  • the user U can use the terminal 3 provided by the event organizer to participate in the event and watch the content while transmitting the reaction to the performer P.
  • the terminal 3 according to the present embodiment also serves as a ticket indicating the right to participate in the event.
  • FIG. 2 is a diagram showing a process of setting a terminal according to the first embodiment.
  • User Ua applies for participation in the event prior to holding the event.
  • the user Ua operates, for example, a terminal installed in a store such as a convenience store (appropriately referred to as a reservation terminal 7), and inputs information regarding a participation application (appropriately referred to as entry information).
  • the entry information includes, for example, information for identifying the user Ua, information for specifying the event for which the user Ua wishes to participate, and information for the seat of the event venue AE desired by the user Ua.
  • the information that identifies the user is, for example, the user's name or identification information.
  • Information that identifies a user as appropriate in the following description is referred to as a user ID.
  • the information that identifies the event is, for example, the identification information (eg, number) assigned to each event, the name of the event, or the like.
  • information that identifies an event is appropriately referred to as an event ID.
  • the seat information is, for example, information for designating a seat rank (S seat, A seat), information for specifying seat conditions (eg, upstairs seat, arena, aisle side), and information for specifying a seat number. Includes at least one of.
  • the entry information input to the reservation terminal 7 is provided to the terminal of the application destination (appropriately referred to as the reception terminal 8).
  • the reservation terminal 7 is connected to an internet line and transmits entry information.
  • the reception terminal 8 receives the entry information transmitted by the reservation terminal 7.
  • the reception terminal 8 is, for example, a server that manages user information.
  • the reception terminal 8 includes a storage unit 11 and a communication unit 12.
  • the communication unit 12 is communicably connected to the reservation terminal 7 and receives the entry information transmitted by the reservation terminal 7.
  • the reception terminal 8 includes an application acquisition unit for acquiring a participation application provided by a user.
  • the reception terminal 8 determines a seat at the event venue AE of the user Ua based on the received entry information.
  • the reception terminal 8 determines a seat by, for example, a lottery.
  • the reception terminal 8 may determine the seat by a method different from the lottery, or may determine the seat by a method combining the lottery and other methods. For example, the reception terminal 8 may allocate seats to the user U corresponding to the entry information in the order in which the entry information is received, or may determine the seat based on the seat information shown in the entry information.
  • the storage unit 11 of the reception terminal 8 stores the user information D1.
  • the user information D1 includes, for example, a user ID and identification information for identifying a seat determined as a seat of the user Ua.
  • the identification information for identifying the seat is referred to as a seat ID as appropriate.
  • the user information D1 includes, for example, table data in which a user ID and a seat ID are associated with each of the plurality of users U.
  • the user information D1 may include information different from the user ID and the seat ID.
  • the user information D1 may include a history of events in which the user Ua has participated in the past.
  • the user information D1 may include at least one of the age, gender, and other attribute information of each user U. At least a part of the user information D1 stored in the storage unit 11 is provided to the processing device 2.
  • the reception terminal 8 may determine a seat when the first user participates in an event after the past event based on the history of the first user participating in the past event.
  • the user information D1 stored in the storage unit 11 includes a history of each user participating in a past event (appropriately referred to as a participation history).
  • the reception terminal 8 determines the seat for the current participation application of the first user by using, for example, the participation history included in the user information D1.
  • the user is given an attribute (eg, rank) according to the number of times he / she participates in the event.
  • a user's rank eg silver, gold, platinum
  • the preferential seats are, for example, the seats relatively close to the stage AE1 in the event venue AE shown in FIG.
  • the rank of the user may be determined by a parameter other than the number of participations, and may be determined, for example, according to the amount of money spent to participate in the event.
  • the rank of the user may be determined by one parameter, may be determined by a plurality of parameters, or may be randomly determined by lottery or the like.
  • the processing device 2 determines the audience position Q of the user Ua based on the participation application of the user Ua.
  • the processing device 2 includes a processing unit 21, a storage unit 22, and a communication unit 23.
  • the communication unit 23 is communicably connected to the reception terminal 8.
  • the communication unit 23 receives the user information transmitted by the reception terminal 8.
  • the storage unit 22 stores information D2 of the performer position P1 (see FIG. 1) of the event venue AE.
  • the information D2 of the performer position P1 stored in the storage unit 22 is, for example, coordinates in cyberspace.
  • the processing unit 21 includes a position determination unit 24.
  • the position-determining unit 24 determines the audience position Q representing the user's position in a predetermined area where the event is held (eg, the event venue AE in FIG. 1) based on the user's application for participation in the event.
  • the position determination unit 24 determines the audience position Q based on the participation application acquired by the reception terminal 8 which is the application acquisition unit, for example.
  • the position-fixing unit 24 derives the coordinates in cyberspace corresponding to the seat ID determined based on the participation application, and determines the derived coordinates as the spectator position Q.
  • the position determining unit 24 derives the coordinates in the same coordinate system as the performer position P1 corresponding to the seat ID.
  • the storage unit 22 stores map data in which the seat ID and the spectator position Q are associated with each other.
  • the position determination unit 24 acquires the seat ID included in the user information received by the communication unit 23, collates the seat ID with the map data, and determines the spectator position Q.
  • the position determination unit 24 determines the relative positions of the plurality of spectator positions Q by, for example, determining the respective spectator positions Q of the plurality of users U.
  • the position-fixing unit 24 determines the audience position Q when the first user participates in an event after the past event based on the history of the first user (eg, user Ua) participating in the past event. You may decide. For example, the position determination unit 24 may determine the spectator position Q corresponding to this seat based on the seat of the user Ua in the current event determined by the reception terminal 8 based on the participation history of the user Ua.
  • the processing unit 21 generates position information indicating the relationship between the performer position P1 and the audience position Q determined by the position determination unit 24.
  • the position information includes, for example, the distance between the performer position P1 and the audience position Q (eg, the distance La in FIG. 1).
  • the position information may include information different from the distance from the performer position P1 to the audience position Q.
  • the position information may include information that is a set of the performer position P1 and the audience position Q.
  • the position information may include information (eg, a vector) in the direction connecting the performer position P1 and the audience position Q.
  • the position information may include spatial information between the performer position P1 and the audience position Q (eg, information indicating the level of sound attenuation from the audience position Q to the performer position P1, presence or absence of obstacles).
  • the position information includes the distance between the performer position P1 and the audience position Q, the information in which the performer position P1 and the audience position Q are paired, the information in the direction connecting the performer position P1 and the audience position Q, and the performer position P1 and the audience. It may not include at least one piece of information in the space between the position Q and may contain information different from any of these pieces of information.
  • the position information generated by the processing device 2 is supplied to the terminal 3a.
  • the terminal 3 includes a storage unit 31 and a communication unit 32.
  • the communication unit 32 is, for example, communicably connected to the communication unit 23 of the processing device 2.
  • the communication unit 32 receives the position information transmitted by the communication unit 23 of the processing device 2.
  • the terminal 3 stores the position information received by the communication unit 23 in the storage unit 31.
  • the terminal 3a stores the position information D3a indicating the relationship between the performer position P1 shown in FIG. 1 and the audience position Qa determined based on the participation application of the user Ua.
  • the terminal 3a is provided to the user Ua by, for example, mail, and is used when the user Ua participates in the event.
  • the terminal 3b has the same configuration as the terminal 3a.
  • the terminal 3b stores the position information D3b indicating the relationship between the performer position P1 shown in FIG. 1 and the audience position Qb determined based on the participation application of the user Ub.
  • the terminal 3b is delivered to the user Ub by, for example, a courier service, and is used when the user Ub participates in the event.
  • the position information indicating the relationship between the audience position Q corresponding to the arbitrary user U and the performer position P1 is represented by the reference numeral D3 as appropriate.
  • the reservation terminal 7 may be a part of the voice processing system 1 or an external device of the voice processing system 1.
  • the reservation terminal 7 may be a terminal such as a smartphone or a personal computer owned by the user Ua.
  • the reservation terminal 7 that receives the entry information from the user Ub may be the same device as the reservation terminal 7 that receives the entry information from the user Ua, or may be a different device.
  • the process of inputting at least a part of the entry information to the reservation terminal 7 may be executed by a person different from the user Ua.
  • the user Ua may notify the receptionist of the participation application by the store or telephone, and the receptionist may input the entry information in the reservation terminal 7.
  • the application destination for the participation application does not have to be the event organizer.
  • the application destination of the participation application may be a trustee entrusted by the event organizer, a distributor of content related to the event, or an event manager.
  • the process of determining the seat of the user Ua may be manually executed by the person in charge of the application destination.
  • the user Ua may mail an application form in which the participation application is filled in to the application destination, and the person in charge of the application destination may determine the seat of the user Ua based on the information described in the application form.
  • the person in charge of the application destination may input the seat ID determined as the seat of the user Ua into the reception terminal 8 in association with the user ID.
  • the reception terminal 8 or the person in charge may determine whether or not to grant the user Ua the right to participate in the event (appropriately referred to as the participation right), and then determine the seat of the user Ua. For example, when there is a participation application from a user U having a number exceeding the capacity, or when it is predicted that there is a participation application, the reception terminal 8 may decide whether or not to grant the participation right by lottery.
  • the reception terminal 8 may be a part of the voice processing system 1 or an external device of the voice processing system 1.
  • the processing device 2 may include a reception terminal 8.
  • the reception terminal 8 may also serve as a reservation terminal 7.
  • the function of the processing device 2 may be realized by being divided into a plurality of devices, and these plurality of devices may be collectively referred to as a processing system.
  • the method of storing the position information in the terminal 3 may be a method that does not use wired or wireless communication.
  • the location information may be stored in the terminal 3 via a recording medium such as a USB memory.
  • the terminal 3 does not have to store the position information in advance before being delivered to the user U.
  • the processing device 2 may store the destination associated with the terminal 3 and transmit the location information to the destination after the user U receives the terminal 3.
  • the processing device 2 may update the position information stored in the terminal 3 by transmitting the position information to the destination.
  • FIG. 3 is a diagram showing a voice processing system according to the first embodiment.
  • the processing device 2 includes, for example, an information processing device such as a server computer.
  • the processing device 2 manages each part of the voice processing system 1.
  • the processing device 2 includes a processing unit 21, a communication unit 23, an input unit 25, and an output unit 26.
  • a camera 4, a microphone 5, and a speaker 6 are connected to the processing device 2.
  • the input unit 25 includes an interface to which a signal is input from the outside of the processing device 2.
  • a camera 4 and a microphone 5 are connected to the input unit 25.
  • the video data captured by the camera 4 and the audio data collected by the microphone 5 are input to the processing device 2.
  • the output unit 26 includes an interface that outputs a signal to the outside of the processing device 2.
  • a speaker 6 is connected to the output unit 26. At least one of the camera 4, the microphone 5, and the speaker 6 may be a part of the voice processing system 1 or a part of the processing device 2. At least one of the camera 4, the microphone 5, and the speaker 6 may be an external device of the audio processing system 1, for example, the equipment of the demonstration area AP shown in FIG.
  • the processing unit 21 includes a synthesis unit 27 and a content generation unit 28.
  • the content generation unit 28 generates content using the data input to the input unit 25.
  • the content generation unit 28 generates, for example, content in which the performance of the performer P is expressed by video and audio.
  • the content generation unit 28 generates content to be delivered to the user U by using the video data input from the camera 4 to the input unit 25 and the audio data input from the microphone 5 to the input unit 25. ..
  • the content generation unit 28, for example, synchronizes video and audio and generates a streaming moving image as content.
  • the content generation unit 28 may perform at least one of video processing, various sound processing, filtering, and compression processing to generate content.
  • the communication unit 23 transmits the data of the content generated by the content generation unit 28.
  • the terminal 3a includes a storage unit 31, a communication unit 32, and a processing unit 33.
  • the communication unit 32 is communicably connected to the communication unit 23 of the processing device 2.
  • the communication unit 32 receives the data of the content transmitted by the communication unit 23.
  • a microphone 41, a speaker 42, and a display device 43 are connected to the terminal 3a.
  • the microphone 41 and the speaker 42 may be a headset.
  • the display device 43 is a device that displays images such as a television set, a PC monitor, and a projector.
  • the processing unit 33 of the terminal 3a outputs the sound of the content to the speaker 42 based on the data of the content received by the communication unit 32.
  • the processing unit 33 of the terminal 3a causes the display device 43 to display the video of the content based on the data of the content received by the communication unit 32.
  • the user Ua can view the content expressing the performance of the performer P by the sound output from the speaker 42 and the video displayed on the display device 43.
  • the microphone 41 detects a sound emitted by a user who watches the performer P of the event. For example, the microphone 41 detects the cheers and applause of the user Ua who views the content.
  • the audience sound data detected by the microphone 41 is output to the terminal 3a.
  • the processing unit 33 of the terminal 3a includes a sound adjusting unit 34.
  • the sound adjustment unit 34 adjusts the sound detected by the microphone 41 based on the relationship between the performer position P1 and the audience position Qa.
  • FIG. 4 is a diagram showing an example of a voice adjustment method according to the first embodiment.
  • Reference numeral D5 in FIG. 4 is audio data input to the sound adjustment unit 34
  • reference numeral D6 is audio data output from the sound adjustment unit 34.
  • the horizontal axis is the frequency and the vertical axis is the amplitude.
  • the frequency corresponds to the pitch of the sound
  • the amplitude corresponds to the volume.
  • Reference numeral D7 is a function (eg, a filter) used to convert data D5 to data D6.
  • the function D7 is a function showing the relationship between the distance and the gain.
  • the gain corresponds to the rate of attenuation of the sound with respect to the distance.
  • the sound adjusting unit 34 adjusts the loudness of the sound detected by the microphone 41 by using the distance La between the audience position Qa and the performer position P1 shown in FIG.
  • the gain is constant in the frequency band of audible sound (eg, 20 Hz or more and 20 kHz or less).
  • Gain is a value that is uniquely determined for the distance. For example, for the distance La shown in FIG. 1, the gain is determined to be G1.
  • the sound adjustment unit 34 generates the data D6 by applying the gain G1 to the amplitude of the data D5 representing the audience sound of the user Ua. For example, the sound adjustment unit 34 adjusts the first sound detected by the microphone 41 so as to be closer to the second sound that reaches the performer position P1 when the first sound is emitted at the audience position Q.
  • the distance between the microphone 41 and the user Ua is, for example, the type of viewing area AUa shown in FIG. 1 (eg, home, movie theater), the type of device used for viewing content (eg, smartphone, personal computer, television). It is largely determined by the set) and the layout of the equipment.
  • the distance La between the performer position P1 and the audience position Qa shown in FIG. 1 is, for example, 20 mails. It is assumed that the terminal 3a is arranged in the living room of the user Ua and is connected to the television set, and the distance between the microphone 41 and the user Ua is 2 meters. In this case, the distance La (eg, 20 meters) between the performer position P1 and the audience position Qa is longer than the distance between the microphone 41 and the user Ua (eg, 2 meters).
  • the sound adjustment unit 34 makes the audience sound detected by the microphone 41 2 meters away from the user Ua approach the sound heard at the performer position P1 20 meters away from the audience position Qa when it is emitted at the audience position Qa. To adjust the volume.
  • the data D6 corresponds to the data of the audience sound heard at the performer position P1 when the audience sound is emitted at the audience position Qa.
  • the gain G1 corresponding to the distance La is less than 1, and the amplitude of the data D6 is smaller than that of the data D5.
  • the sound represented by the data D6 has a lower volume than the sound represented by the data D5, and corresponds to data representing the audience sound heard at the performer position P1 when the audience sound is emitted at the audience position Qa.
  • the sound adjusting unit 34 may adjust a waveform representing an amplitude with respect to the frequency. For example, the sound adjusting unit 34 may adjust the audience sound by using different gains depending on the frequency band. For example, the sound adjusting unit 34 may attenuate the relatively high-pitched sound component of the audience sound more than the relatively low-pitched sound component. The sound adjustment unit 34 does not have to attenuate the audience sound, and may amplify the audience sound. Even if the sound adjustment unit 34 adjusts the audience sound detected by the microphone 41 so that the volume is louder than the audience sound that reaches the performer position P1 when the audience sound is emitted at the audience position Q. good.
  • the sound adjusting unit 34 cuts the volume exceeding the predetermined value, and the audience sound from which the volume is cut is set to the audience position Q and the performer position P1. It may be adjusted based on the relationship of.
  • the sound adjustment unit 34 may adjust the audience sound so that the volume of the adjusted audience sound does not exceed a predetermined value.
  • the sound adjustment unit 34 may adjust the audience sound by using the information indicating the relationship between the position of the microphone 41 and the position of the user U and the information indicating the relationship between the audience position Q and the performer position P1.
  • the information representing the relationship between the position of the microphone 41 and the position of the user U is, for example, the distance between the microphone 41 and the user Ua.
  • the sound adjusting unit 34 may use a value detected by the sensor as the distance between the microphone 41 and the user Ua.
  • the sound adjusting unit 34 may use a predetermined value as the distance between the microphone 41 and the user Ua.
  • the distance between the microphone 41 and the user Ua may be, for example, a value (eg, a recommended value) within the range defined in the terms of use of the service provided by the voice processing system 1.
  • the sound adjusting unit 34 adjusts the audience sound by using the position information D3a stored in the storage unit 31.
  • the position information D3a includes, for example, the distance La between the audience position Qa and the performer position P1 shown in FIG.
  • the sound adjusting unit 34 acquires the gain G1 with respect to the distance La by using, for example, the function D7 shown in FIG.
  • the function D7 is stored in the storage unit 31, and the sound adjustment unit 34 reads the function D7 and the position information D3a (eg, distance La) from the storage unit 31 to derive the gain G1.
  • the sound adjustment unit 34 generates data D6 (see FIG. 4) by applying the derived gain G1 to the data D5 (see FIG. 4) indicating the audience sound detected by the microphone 41.
  • the audience sound adjusted by the sound adjustment unit 34 is referred to as an adjusted audience sound.
  • the data D6 is data representing the adjusted audience sound.
  • the position information D3a may include information that is a set of the audience position Qa and the performer position P1 represented by coordinates in cyberspace, for example.
  • the sound adjustment unit 34 may calculate the distance La between the audience position Qa and the performer position P1 by using the audience position Qa and the performer position P1.
  • the function D7 may be stored outside the terminal 3, and the sound adjusting unit 34 may acquire information representing the function D7 from the outside of the terminal 3 via the communication unit 32.
  • the function D7 may be represented by a mathematical formula or table data.
  • the position information D3a may include information (eg, gain G1) representing a change in sound transmitted from the audience position Qa to the performer position P1. In this case, the sound adjusting unit 34 may adjust the audience sound without using the function D7 shown in FIG.
  • the terminal 3 provides the adjusted audience sound data to the outside of the terminal 3.
  • the processing unit 33 of the terminal 3 controls the communication unit 32 to transmit the adjusted audience sound data.
  • the processing device 2 acquires the adjusted audience sound data provided by the terminal 3.
  • the communication unit 23 of the processing device 2 receives the adjusted audience sound data transmitted by the communication unit 32 of the terminal 3.
  • the processing device 2 acquires the adjusted audience sound data of the user Ua from the terminal 3a corresponding to the user Ua.
  • the processing device 2 acquires the adjusted audience sound data of the user Ub from the terminal 3b corresponding to the user Ub.
  • the synthesis unit 27 is detected by the microphone 41 corresponding to the user Ua and adjusted by the sound adjustment unit 34 of the terminal 3a, and is detected by the microphone 41 corresponding to the user Ub and is detected by the sound adjustment unit 34 of the terminal 3b. Synthesize with the adjusted audience sound.
  • the synthesizing unit 27 superimposes the spectator sound data acquired from the terminal 3a and the spectator sound data acquired from the terminal 3b, and generates data representing the synthesized spectator sound.
  • the adjusted spectator sound synthesized by the synthesizing unit 27 is referred to as a synthesized spectator sound.
  • the output unit 26 outputs data representing the audience sound synthesized by the synthesis unit 27.
  • the output unit 26 outputs digital data of spectator sound, and this digital data is DA-converted into analog data (eg, audio signal) by an amplifier and input to the speaker 6.
  • the speaker 6 converts analog data into vibration and outputs the synthesized audience sound.
  • the amplifier may be provided by the processing device 2, a speaker 6, or a device connected between the processing device 2 and the speaker 6.
  • the synthesized audience sound data provided by the processing device 2 to the outside may be digital data or analog data (eg, audio signal).
  • the processing device 2 may output audience sound data from the communication unit 23 by communication.
  • the communication unit 23 may be an output unit that outputs data representing the sound synthesized by the synthesis unit 27.
  • the performer P can perform while listening to the audience sound output from the speaker 6, for example.
  • the microphone 5 detects the sound in synchronization with the output of the audience sound by the speaker 6, for example.
  • the microphone 5 detects the audience sound output from the speaker 6 and the sound emitted by the performer P together.
  • the camera 4 detects the sound at a predetermined timing with respect to the timing at which the audience sound is output from the speaker 6.
  • the camera 4 takes a picture of the performer P in synchronization with the sound detection by the microphone 5. At least two timings, that is, the timing at which the sound is output from the speaker 6, the timing at which the microphone 41 detects the sound, and the timing at which the camera 4 shoots the performer P, are controlled by the processing device 2 or other devices. You may.
  • the processing device 2 acquires the sound detected by the microphone 5 and the image captured by the camera 4.
  • the content generation unit 28 generates content data using the sound detected by the microphone 5 and the video image taken by the camera 4.
  • the processing device 2 distributes the content generated by the content generation unit 28.
  • the communication unit 23 transmits data of the content generated by the content generation unit 28.
  • the sound included in the content generated by the content generation unit 28 includes the synthesized audience sound output from the speaker 6 and the sound emitted by the performer P.
  • the user U can, for example, watch the video of the performer P while listening to the sound including the synthesized audience sound and the sound emitted by the performer P.
  • the communication unit 23 may be an output unit that outputs sound data in which the sound emitted by the performer P and the sound adjusted by the sound adjustment unit 34 are combined.
  • the microphone 5 does not have to detect the audience sound output from the speaker 6.
  • the speaker 6 may be earphones or headphones.
  • the microphone 5 and the speaker 6 may be a headset.
  • the content generation unit 28 of the processing device 2 synthesizes the sound detected by the microphone 5 (eg, the sound emitted by the performer P) and the audience sound synthesized by the synthesis unit 27 to generate the content. good.
  • the content delivered by the processing device 2 does not have to include the audience sound after synthesis.
  • the performer P can perform, for example, according to the audience sound after synthesis, and the user can experience a sense of reality by, for example, viewing the performance of the performer P who reacts to the audience. ..
  • the content generation unit 28 is provided in the same device as the synthesis unit 27 in FIG. 3, it may be provided in a device different from the composition unit 27.
  • the synthesis unit 27 may be provided in a first streaming server that processes audience sound data provided by a plurality of terminals 3.
  • the content generation unit 28 may be provided in a second streaming server that processes data provided by devices (eg, camera 4, microphone 5) arranged in the demonstration area AP.
  • the terminal 3 may or may not be returned from the user to the event organizer after the event ends.
  • the terminal 3 may be reused in the events after this event.
  • the user Ua applies for participation in the second event after the first event.
  • the processing device 2 determines the spectator position Qa in the second event based on the application for participation in the second event.
  • the processing device 2 provides the position information D3a corresponding to the spectator position Qa in the second event to the terminal 3a.
  • the sound adjustment unit 34 of the terminal 3a may adjust the audience sound detected by the microphone 41 by using the position information D3a corresponding to the second event.
  • FIGS. 1 to 3 and the like two users, user Ua and user Ub, are typically shown as user U.
  • the number of users U is two in FIGS. 1 to 3, but is arbitrary.
  • the number of users U may be 100, 10,000, or 10,000.
  • FIG. 5 is a diagram showing a voice processing method according to the first embodiment. Refer to FIGS. 1 to 3 as appropriate for the configuration of the voice processing system 1. Each process shown in FIG. 5 is executed by each part of the voice processing system 1 during, for example, an event is being held. As described with reference to FIG. 2, before the event is held, the position determination unit 24 determines the spectator position Q based on the participation application of the user U, and the terminal 3 that stores the position information corresponding to the spectator position Q. Is delivered to the user.
  • the first user corresponds to the user Ua of FIGS. 1 to 3, and the first user terminal corresponds to the terminal 3a corresponding to the user Ua.
  • the second user corresponds to the user Ub of FIGS. 1 to 3, and the second user terminal corresponds to the terminal 3b corresponding to the user Ub.
  • the processing device transmits the content in step S1.
  • the communication unit 23 of the processing device 2 of FIG. 3 transmits content data.
  • the first user terminal receives the data of the content transmitted by the processing device.
  • the first user terminal reproduces the content based on the received content data.
  • the first user terminal detects the audience sound from the first user in step S2.
  • the first user terminal adjusts the audience sound by using the position information.
  • the sound adjusting unit 34 of the terminal 3a of FIG. 3 adjusts the audience sound detected in step S2 by using the position information D3a stored in the storage unit 31.
  • the first user terminal transmits the audience sound data adjusted in step S4.
  • the communication unit 32 of the terminal 3a of FIG. 3 transmits the adjusted audience sound data.
  • the processing device receives the adjusted audience sound data transmitted by the first user terminal in step S4.
  • the processing from step S5 to step S7 is the same as the processing from step S2 to step S4.
  • the second user terminal receives the content data transmitted by the processing device in step S1.
  • the second user terminal detects the audience sound from the second user in step S5.
  • the second user terminal adjusts the audience sound by using the position information.
  • the second user terminal transmits the audience sound data adjusted in step S7.
  • the processing device receives the adjusted audience sound data transmitted by the second user terminal in step S7.
  • the processing device synthesizes the adjusted audience sound in step S8.
  • the synthesizing unit 27 of the processing device 2 of FIG. 3 synthesizes the spectator sound data transmitted from the terminal 3a and the spectator sound data transmitted from the terminal 3b.
  • the processing device outputs the synthesized audience sound to the performer.
  • the processing device 2 outputs the synthesized spectator sound data to the speaker 6, and outputs the spectator sound to the performer P by the speaker 6.
  • the processing device transmits the content including the synthesized audience sound.
  • the microphone 5 in FIG. 3 detects a sound including a sound emitted by the performer P and a sound output from the speaker 6.
  • the content generation unit 28 of the processing device 2 generates content using the sound detected by the microphone 5.
  • the communication unit 23 of the processing device 2 transmits the data of the content generated by the content generation unit 28.
  • the first user terminal and the second user terminal each receive the content data transmitted by the processing device.
  • the first user terminal and the second user terminal reproduce the content based on the data of the received content.
  • the processes of steps S2 to S10 are repeatedly executed, for example, during the holding of an event.
  • the cycle in which the series of processes from step S2 to step S10 is repeated is arbitrarily set.
  • the position-fixing unit 24 determines the user U in a predetermined area (eg, event venue AE) where the event is held based on the application for participation in the event from the user U.
  • the audience position Q representing the position is determined.
  • the microphone 41 detects the sound emitted by the user U who is viewing the performer P (eg, content) of the event.
  • the sound adjustment unit 34 adjusts the sound detected by the microphone 41 based on the relationship between the performer position P1 representing the position of the performer of the event in the predetermined region and the audience position Q.
  • the synthesis unit 27 includes a sound detected by the microphone 41 corresponding to the first user (eg, user Ua) who is the user U and adjusted by the sound adjustment unit 34, and a second user who is a user U different from the first user. (Example, the sound detected by the microphone 41 corresponding to the user Ub) and adjusted by the sound adjusting unit 34 is synthesized.
  • the output unit 26 outputs data representing the sound synthesized by the synthesis unit 27.
  • This voice processing system 1 can convey the reaction of the user U to the performer P.
  • the performer P can perform a performance conscious of the reaction of the user U, for example.
  • the user U can, for example, watch a performance influenced by his / her reaction.
  • the voice processing system 1 according to the embodiment can provide a new experience.
  • the application acquisition unit (eg, the reception terminal 8) acquires the participation application provided by the user U.
  • the position-fixing unit 24 determines the audience position Q based on the participation application acquired by the application acquisition unit. In this case, for example, the process from the acquisition of the participation application to the determination of the audience position Q can be automatically performed.
  • the voice processing system 1 does not have to include an application acquisition unit (eg, a reception terminal 8). For example, information on a user participating in an event may be input to the processing device 2, and the position-fixing unit 24 may use this information to determine the position of the spectator.
  • At least a part of the users participating in the event (eg, the first user), for example, watch the performer P outside the predetermined area (eg, the event venue AE).
  • the microphone 41 corresponding to the first user (eg, user Ua) is arranged at a position (eg, viewing area AUa) for detecting the sound emitted by the first user.
  • the number of users who can participate in the event is less likely to be restricted by the size of the event venue as compared with the form in which the event is held only in the real space.
  • the event in addition to participating in an event at a venue in real space, it is possible to participate in the event from outside the venue in real space, and the number of spectators exceeding the capacity of the venue in real space is the event. You can participate in. According to this form, for example, a larger audience can enjoy the event. According to this form, for example, it is possible to create an opportunity for a user who lives far away from the venue in the real space to participate outside the venue, and to improve the convenience of the user.
  • Performer P performs, for example, outside a predetermined area (eg, event venue AE).
  • the sound data synthesized by the synthesis unit 27 is provided to a voice output device (eg, speaker 6) arranged at a position where the sound reaches the performer P (eg, the demonstration area AP).
  • a voice output device eg, speaker 6
  • the position where the performer P actually exists is less likely to be restricted by the position of the event venue, as compared with the form in which the performer P performs at the event venue in the real space. For example, an event can be performed even when there are a plurality of performers P and some performers P are separated from other performers P.
  • the terminal 3 corresponds to a first voice processing device including a sound adjusting unit 34 and a transmitting unit (eg, a communication unit 32) for transmitting sound data adjusted by the sound adjusting unit 34.
  • the processing device 2 includes a receiving unit (eg, a communication unit 23) for receiving data transmitted by a transmitting unit of the first voice processing device, and a synthesizing unit 27 for synthesizing the sound represented by the data received by the receiving unit. 2
  • a voice processing device corresponds to a voice processing device. According to this embodiment, for example, the process of adjusting the audience of a plurality of users is distributed to the plurality of first voice processing devices, and the load on the second voice processing device can be reduced.
  • the transmission unit of the second voice processing device may transmit position information indicating the relationship between the performer position and the audience position determined by the position determination unit.
  • the receiving unit of the first voice processing device may receive the position information transmitted by the transmitting unit of the second voice processing device.
  • the sound adjusting unit of the first voice processing device may adjust the sound detected by the microphone of the first voice processing device based on the position information received by the receiving unit of the first voice processing device.
  • the voice processing system 1 may generate content by variously combining virtual information and real information.
  • the event venue AE may include an actual facility or place (concert hall, stadium, event stage, etc.).
  • the event venue AE may include virtual facilities and environments (stages constructed in virtual space, sea, mountains, sky, etc.).
  • the audio processing system 1 may generate content using artificial audience sounds and sound effects generated for the event, in addition to the audience sounds emitted by the user.
  • one or both of the terminal 3 and the processing device 2 includes a computer.
  • the computer includes at least one of a portable device such as a smartphone, a tablet, or a laptop computer, or a stationary device such as a server device, a desktop personal computer, or a tower personal computer.
  • the computer includes, for example, a processing unit, a storage unit, a communication unit, and an input / output interface.
  • the processing unit includes, for example, a general-purpose processor (eg, CPU).
  • the storage unit includes, for example, one or both of a non-volatile storage device and a volatile storage device.
  • the non-volatile storage device is, for example, a hard disk, a solid state drive, or the like.
  • Volatile storage devices include, for example, random access memory, cache memory, work memory, and the like.
  • the communication unit performs wired or wireless communication in accordance with a predetermined communication standard.
  • the communication unit may perform communication by combining two or more types of communication having different communication standards.
  • the predetermined communication standard may include a LAN, a short-range wireless communication standard such as Bluetooth®, an infrared communication standard, or another standard.
  • the terminal 3 may be referred to by a processing device, an information processing device, an electronic device, or another name. At least a part of the processing of the terminal 3 may be executed by a digital signal processor or an integrated circuit for a specific application.
  • the processing device 2 may be referred to by a terminal, an information processing device, an electronic device, or other names. At least a part of the processing of the processing device 2 may be executed by a digital signal processor or may be executed by an integrated circuit for a specific application.
  • the processing unit reads a program stored in the storage unit and executes various processes according to this program.
  • This program tells a computer the audience position representing the user's position in a predetermined area where the event is held, and the position of the performer of the event in the predetermined area, which is determined based on the user's application for participation in the event.
  • the user includes a sound processing program for transmitting data representing the sound adjusted by the sound adjustment unit to a processing device including a sound adjusting unit for synthesizing the sound adjusted by the sound adjustment unit.
  • the program may be recorded and provided on a computer-readable storage medium.
  • the processing unit When the processing device 2 is a computer, the processing unit reads a program stored in the storage unit and executes various processes according to this program.
  • This program determines, for example, a spectator position representing the position of a user in a predetermined area where an event is held, and the position of a performer of the event in a predetermined area, based on a computer requesting participation in the event.
  • Position information indicating the relationship between the performer position and the audience position is transmitted to a processing device provided with a sound adjustment unit that adjusts the sound emitted by the user viewing the performer of the event and detected by the microphone based on the position information.
  • Data representing the combined sound by synthesizing the sound adjusted by the sound adjustment unit and the sound detected by the microphone corresponding to the second user who is a different user from the first user and adjusted by the sound adjustment unit. Includes a voice processing program to output and execute.
  • the program according to the embodiment may be a program (eg, difference file, difference program) that can realize the function according to the embodiment in combination with the program recorded in the computer system.
  • the program according to the embodiment may be recorded and provided on a computer-readable storage medium.
  • FIG. 6 is a diagram showing a voice processing system according to the second embodiment.
  • the terminal 3 is a terminal associated with the user U who uses the terminal 3.
  • the terminal 3a is a terminal owned by the user Ua.
  • the terminal 3a is, for example, a smartphone, a tablet, a personal computer, or other information processing device.
  • the terminal 3a stores, for example, information that identifies the terminal 3a and information that identifies the user Ua in the storage unit 31.
  • the information that identifies the terminal 3 is referred to as a terminal ID as appropriate.
  • the information that identifies the user includes, for example, the information of the account in the content distribution service.
  • Account information includes, for example, a user ID and password.
  • the terminal 3a includes a microphone 41, a speaker 42, a display unit 45, and an input unit 46.
  • the display unit 45 corresponds to the display device 43 shown in FIG.
  • the input unit 46 accepts input of various information.
  • the input unit 46 includes, for example, at least one of a keyboard, a mouse, and a touchpad.
  • the input unit 46 and the display unit 45 may be a touch panel.
  • the user Ua inputs various information to the terminal 3a by operating the input unit 46.
  • the terminal 3a also serves as the reservation terminal 7 described with reference to FIG.
  • the user Ua can input the entry information and apply for participation by operating the input unit 46 of the terminal 3a.
  • the entry information includes, for example, a user ID and information that identifies an event.
  • the communication unit 32 of the terminal 3a transmits the entry information.
  • the processing device 2 also serves as the reception terminal 8 described with reference to FIG.
  • the processing device 2 acquires the entry information transmitted by the communication unit 32 of the terminal 3a.
  • the communication unit 32 of the processing device 2 receives the entry information transmitted by the terminal 3a.
  • the storage unit 22 of the processing device 2 stores the user information D1 described with reference to FIG.
  • the processing unit 21 of the processing device 2 includes a reception unit 29.
  • the reception unit 29 executes the process by the reception terminal 8 described with reference to FIG.
  • the reception unit 29 determines the user's seat at the event. For example, the reception unit 29 acquires the event ID included in the entry information transmitted by the terminal 3a.
  • the reception unit 29 refers to a database that stores the event ID and the event information in association with each other, and acquires the event seat information corresponding to the event ID.
  • the reception unit 29 acquires the user ID included in the entry information transmitted by the terminal 3a.
  • the reception unit 29 acquires user Ua information (eg, user rank) corresponding to the user ID from the user information D1 stored in the storage unit 22.
  • the reception unit 29 determines the seat of the user Ua (eg, the seat number) in the event by using the information of the seat of the event (eg, the rank of the seat, the information of the vacant
  • the position determination unit 24 of the processing device 2 determines the audience position Qa of the user Ua using the information on the seat of the user Ua determined by the reception unit 29.
  • the processing unit 21 generates position information indicating the relationship between the performer position P1 and the audience position Qa determined by the position determination unit 24.
  • the communication unit 23 transmits the position information generated by the processing unit 21.
  • the communication unit 32 of the terminal 3a corresponding to the user Ua receives the position information transmitted by the communication unit 23 of the processing device 2.
  • the processing unit 33 of the terminal 3a stores the position information received by the communication unit 32 in the storage unit 31.
  • the communication unit 32 of the terminal 3a receives the data of the content transmitted by the processing device 2.
  • the terminal 3a reproduces this content based on the data of the content received by the communication unit 32.
  • the processing unit 33 causes the display unit 45 to display the video represented by the content data.
  • the processing unit 33 causes the speaker 42 to output the sound represented by the content data.
  • the microphone 41 detects the audience sound emitted by the user Ua.
  • the sound adjusting unit 34 adjusts the audience sound detected by the microphone 41 by using the position information D3a stored in the storage unit 31.
  • the communication unit 32 transmits the adjusted audience sound data.
  • the communication unit 23 of the processing device 2 receives the adjusted audience sound data transmitted by the communication unit 23 of the terminal 3a.
  • the synthesizing unit 27 synthesizes the adjusted spectator sound corresponding to the user Ua and the adjusted spectator sound corresponding to the user Ub.
  • the output unit 26 outputs the data of the synthesized spectator sound, and causes the speaker 42 to output the synthesized spectator sound.
  • the user U can change the audience position Q.
  • the user Ua can apply for a change in the audience position Qa during the event.
  • the user Ua can transmit information requesting a change of the spectator position Qa (appropriately referred to as a change application) by the terminal 3a.
  • the input unit 46 of the terminal 3a accepts a change application or input of information (appropriately referred to as a change application, etc.) that is the basis of the change application.
  • the change application or the like includes information on the seat of the change destination desired by the user U (appropriately referred to as change destination seat information).
  • the communication unit 32 of the terminal 3a transmits a change application based on the change application or the like input to the input unit 46.
  • the change application for example, one or both of the information specifying the user Ua (eg, user ID) and the information specifying the terminal 3a associated with the user Ua (eg, terminal ID), and the change destination information are provided.
  • the storage unit 31 may store one or both of the user ID and the terminal ID in advance. When applying for a seat change, the user Ua does not have to input the change destination seat information and one or both of the user ID and the terminal ID in the terminal 3a.
  • the processing unit 33 of the terminal 3a generates a change request using one or both of the user ID and the terminal ID stored in the storage unit 31 and the change destination information input to the input unit 46. You may.
  • the communication unit 32 may transmit the change request generated by the processing unit 33.
  • the communication unit 23 of the processing device 2 receives the change application transmitted by the communication unit 32 of the terminal 3a.
  • the position determination unit 24 changes the spectator position Qa corresponding to the user Ua based on the change request of the spectator position Q from the user Ua.
  • the position determination unit 24 determines the change destination of the audience position Qa of the user Ua based on the change application received by the communication unit 23.
  • the position-fixing unit 24 determines whether or not the seat indicated in the change front seat information included in the change application is available. For example, if the seat indicated in the change destination seat information is not assigned to the user U, the position-determining unit 24 determines that this seat can be used.
  • the position-determining unit 24 derives, for example, the coordinates on the cyberspace of the event venue AE corresponding to the seats shown in the changed seat information, and uses the derived coordinates as the changed audience position Q (appropriately, the changed audience position). ) Is decided.
  • the communication unit 23 transmits the position information corresponding to the updated spectator position Q.
  • the processing unit 21 specifies the terminal 3a as the transmission destination of the location information based on one or both of the user ID and the terminal ID included in the change application transmitted by the terminal 3a.
  • the communication unit 23 transmits the position information to the destination specified by the processing unit 21.
  • the communication unit 32 of the terminal 3a receives the updated position information transmitted by the processing device 2.
  • the processing unit 33 of the terminal 3a updates the position information D3a stored in the storage unit 31 to the updated position information received by the communication unit 32.
  • the sound adjustment unit 34 adjusts the sound detected by the microphone 41 corresponding to the user Ua by using the audience position Qa changed by the position determination unit 24. For example, after the position information D3a stored in the storage unit 31 is updated, the sound adjustment unit 34 adjusts the audience sound detected by the microphone 41 using the updated position information D3a.
  • the process of changing the audience position may be executed by consuming tokens.
  • the token may include, for example, electronic money compatible with the currency of each country, or may include cryptographic assets.
  • the token may be, for example, a point that can be used by the user in the content distribution service, or may be a point that can be used by the user in a service affiliated with the content distribution service.
  • the information of the token owned by the user U may be included in the user information D1.
  • the value of the token to be consumed may be, for example, a predetermined fixed value, a value specified by the user, or a value determined by the seat to be changed.
  • the processing device 2 may determine the seat to be changed by, for example, an auction using the value of the token specified by the user.
  • the timing at which the tokens are consumed is arbitrary.
  • the token may be consumed when the processing device 2 accepts the change application, or may be consumed when the spectator position is changed by the change application.
  • the position-fixing unit 24 determines whether or not this seat is available, for example, depending on whether or not the seat designated by the user is in use, but whether or not the seat can be used according to other criteria or conditions. It may be determined whether or not. For example, the position-fixing unit 24 may determine that a seat can be used when the value of the token held by the user is equal to or greater than the value of the token consumed in the process of changing the spectator position. Seats in cyberspace may be assigned to only one user or to multiple users. For example, the spectator positions of a plurality of users may be the same, and the spectator positions of any user may be the same as the spectator positions of other users.
  • the voice processing system 1 executes an action reception unit that receives information specifying an action for one or both of the performer P and the user U from the user U, and a process associated with the information received by the action reception unit in advance. It may be provided with a processing unit.
  • the above actions include actions (eg, video, audio) on the content in a form different from the audience sound.
  • the above actions include, for example, actions relating to one or both of video and audio in the content.
  • the above action includes, for example, the action of expressing the user's input by one or both of the video and the audio on the content.
  • the information that specifies the action is referred to as an action request, and the process associated with the information received by the action receiving unit in advance is referred to as an action addition process.
  • the action reception unit includes, for example, an input unit 46 of the terminal 3a.
  • the processing unit includes, for example, a content generation unit 28 of the processing device 2.
  • the user Ua inputs a comment about the event in the input unit 46 in a text format as an action request.
  • the communication unit 32 of the terminal 3a transmits the text information input to the input unit 46.
  • the communication unit 23 of the processing device 2 receives the text information transmitted by the communication unit 32 of the terminal 3a.
  • the content generation unit 28 generates, for example, content including the text included in the text information received by the communication unit 23 as a video.
  • the text input by the user may be displayed while moving on the video of the content, or may be displayed in other forms.
  • the action reception unit may include the microphone 41 of the terminal 3a.
  • the user may input voice instead of inputting text, and the microphone 41 may accept voice emitted by the user as an action request.
  • the processing unit 33 of the terminal 3a may convert the voice input to the microphone 41 into text information by voice recognition.
  • the communication unit 32 of the terminal 3a may transmit the text information generated by the processing unit 33 by voice recognition.
  • the action request information may be input in a format different from text or voice.
  • the terminal 3a causes the display unit 45 to display an icon associated with a predetermined action addition process, and when a user's operation on this icon is detected, requests execution of the action addition process associated with the icon. You may.
  • the terminal 3a displays an icon indicating the addition of the applause sound on the display unit 45.
  • the communication unit 32 transmits a request to add the sound of applause to the content.
  • the processing device 2 receives the request transmitted by the communication unit 32, and the content generation unit 28 generates, for example, content to which a pre-sampled applause sound is added.
  • the action addition process may include a process of making additional adjustments to the audience sound adjusted by the sound adjustment unit 34.
  • the action addition process may include a process of changing the volume of the adjusted audience sound, or may include a process of adding an effect such as making an echo.
  • the action addition process may be a process of adding a video or a process of adding a video effect.
  • the action addition process may include a process of adding a fireworks image to the content image.
  • the action addition process may include a process of transferring a value equivalent to a tip or a thrown money to the performer.
  • the processing unit 21 of the processing device 2 may execute a process of transferring a token having a value specified by the user Ua from the account of the user Ua to the account of the performer P, or transfer the token to the device that manages the token. You may execute the process of transferring.
  • the content generation unit 28 may add a video showing the transfer of the token to the content when the process of transferring the token is executed. For example, the content generation unit 28 may add the tossed money expressed in CG to the video of the content.
  • the above icon may be one type of icon, or may be a plurality of types of icons with different action addition processes.
  • the process associated with the action request may be executed by consuming the token.
  • the value of the token to be consumed may be, for example, a predetermined fixed value or a value specified by the user.
  • the value of the token to be consumed may be a value determined by the type of action addition processing.
  • the value of the token to be consumed may be different between the process of adding text to the video of the content and the process of adding the sound effect to the sound of the content.
  • the value of the token to be consumed may be a value determined by the level of the effect by the action addition process.
  • the action addition process includes a process of adding an applause sound effect.
  • the content generation unit 28 adds the sound effect of applause to the content at the first volume, and the token to be consumed is a second value larger than the first value. If so, the applause sound effect may be added to the content at a second volume higher than the first volume.
  • the action addition process includes a process of adding text to the video.
  • the content generation unit 28 adds text to the content with the first font size when the token to be consumed is the first value, and when the token to be consumed is the second value larger than the first value, the content generation unit 28 adds the text to the content. Text may be added to the content with a second font size that is larger than the first font size.
  • the sound adjusting unit 34 of the terminal 3a or the sound adjusting unit 34 of the terminal 3b is a microphone corresponding to the user Ub based on the relationship between the audience position Qa corresponding to the user Ua and the audience position Qb corresponding to the user Ub. You may adjust the sound detected by.
  • the audience position Qb of the user Ub is assumed to be a range in which the audience sound of the user Ub reaches the audience position Qa of the user Ua when the audience sound of the user Ub is emitted at the audience position Qb.
  • the terminal 3a can execute a process of emphasizing the spectator sound emitted at the spectator position Q around the spectator position Qa (appropriately referred to as a peripheral emphasis process).
  • the terminal 3 can switch between, for example, a first operation mode in which the peripheral emphasis processing is not executed and a second operation mode in which the peripheral enhancement processing is executed. For example, the terminal 3 receives an input for mode switching from the user U, and when it is determined that this input has been received, the terminal 3 switches between the first operation mode and the second operation mode.
  • the adjusted audience sound of the user Ub is synthesized by the synthesis unit 27 of the processing device 2.
  • the synthesized audience sound is output from the speaker 6 to the performer P.
  • the audience sound output from the speaker 6 is detected by the microphone 5 together with the sound emitted by the performer P.
  • the content generation unit 28 generates content using the sound detected by the microphone 5, and the audience sound of the user Ub is included in the voice of the content.
  • the communication unit 23 of the processing device 2 has position information indicating the relationship between the spectator position Qa of the user Ua and the spectator position Qb of the user Ub (appropriately referred to as a second position information and an inter-audience position information).
  • the inter-audience position information is, for example, the audience sound of the user Ua when the audience sound adjusted by the sound adjustment unit 34 of the terminal 3b is emitted at the audience position Qb by the audience sound detected by the microphone 41 corresponding to the user Ub. Includes gain to convert to sound that reaches position Qa.
  • the communication unit 23 of the processing device 2 transmits the adjusted audience sound data corresponding to the user Ub.
  • the communication unit 32 of the terminal 3a receives the adjusted audience sound data transmitted by the communication unit 23 of the processing device 2.
  • the processing unit 33 of the terminal 3a uses the inter-audience position information to send the adjusted spectator sound corresponding to the user Ub to the sound that reaches the spectator position Qa of the user Ua when the spectator sound is emitted at the spectator position Qb. Convert.
  • the processing unit 33 synthesizes the converted audience sound of the user Ub with the sound of the content and outputs it from the speaker 42.
  • the voice processing system 1 of such a form for example, the audience sound of the second user who is around the first user in the cyber space is realistically transmitted to the first user, and the presence of the event is enhanced.
  • the terminal 3a notifies the processing device 2 that the operation is switched when switching from the first operation mode to the second operation mode.
  • the processing unit 21 of the processing device 2 identifies the audience position Qa of the user Ua corresponding to the terminal 3a.
  • the processing unit 21 extracts at least one spectator position Q included in a predetermined range around the spectator position Qa.
  • the processing unit A generates inter-audience position information for each of the extracted spectator positions Q.
  • the processing device 2 provides the generated inter-audience position information to the terminal 3a.
  • the processing device 2 provides the adjusted spectator sound data corresponding to the spectator-to-audience position information to the terminal 3a in association with the spectator-to-audience position information.
  • the processing unit 21 of the processing device 2 allocates identification information for each position information between spectators, and transmits the adjustment sound data as a set with the identification information.
  • FIG. 7 is a diagram showing a voice processing method according to the second embodiment.
  • the same processing as in FIG. 5 is appropriately designated with the same reference numerals as those in FIG. 5, and the description thereof will be omitted or simplified.
  • the first user terminal accepts the input of information regarding the participation application from the user Ua shown in FIG.
  • the first user terminal transmits the entry information in step S21.
  • the processing unit 33 of the terminal 3a of FIG. 6 causes the communication unit 32 to transmit the entry information based on the information input to the input unit 46.
  • the processing device receives the entry information transmitted in step S21.
  • the processing device determines the audience position of the first user in step S22.
  • the position-determining unit 24 of the processing device 2 of FIG. 6 determines the audience position Qa of the user Ua based on the entry information corresponding to the user Ua received by the communication unit 23.
  • the processing device transmits the position information in step S23.
  • the processing unit 21 of the processing device 2 of FIG. 6 uses the spectator position Qa determined by the position determining unit 24 to generate position information indicating the relationship between the spectator position Qa and the performer position P1.
  • the communication unit 23 transmits the position information generated by the processing unit 21.
  • step S24 to step S26 is the same as the process from step S21 to step S23, and the description thereof will be simplified.
  • the second user terminal accepts input of information regarding the participation application from the user Ub shown in FIG.
  • the second user terminal transmits the entry information in step S24.
  • Example The processing device receives the entry information transmitted in step S24.
  • the processing device determines the audience position of the second user (eg, the user Ub in FIG. 6) in step S25.
  • the processing device transmits the position information in step S26.
  • steps S21 to S26 are executed, for example, before the start of the event. It should be noted that at least a part of the processing of steps S21 to S26 may be executed after the start of the event. For example, the user may apply for participation in the event and participate in the event after the distribution of the content is started.
  • the processing from step S1 to step S11 after step S26 is the same as in FIG.
  • the processes of steps S1 to S11 are repeatedly executed, and the process of changing the spectator position and the action addition process are executed during the period in which the processes of steps S1 to S11 are repeatedly executed, respectively.
  • the process of changing the spectator position may be executed before the process of step S1 is executed.
  • the user U may operate the terminal 3 to execute the spectator position change process after the participation application is made and before the distribution of the content is started.
  • FIG. 8 is a diagram showing a voice processing system according to the third embodiment.
  • the voice processing system 1 includes a plurality of first voice processing devices divided into a plurality of groups, and a synthesis unit is provided corresponding to each group of the plurality of groups. (1) It includes a first synthesis unit that synthesizes the sound adjusted by the sound adjustment unit of the voice processing device, and a second synthesis unit that synthesizes the sound synthesized by the first synthesis unit in a plurality of groups.
  • the plurality of terminals 3 are divided into a plurality of groups.
  • any group is appropriately represented by a reference numeral G, and when the group G is distinguished, it is represented by a reference numeral obtained by adding the alphabets a, b, ... To the reference numeral G such as group Ga and group Gb.
  • the voice processing system 1 includes a plurality of processing devices 51 and a processing device 52.
  • the processing device 51 is provided for each group G.
  • a reference numeral in which the alphabets a, b, ... Are added to the reference numeral 51 as in the processing device 51a and the processing device 51b. It is represented by.
  • the processing device 51 includes a synthesis unit 27 described with reference to FIG. 3 and the like.
  • the processing device 51 acquires the adjusted audience sound data from two or more terminals 3 included in the corresponding group.
  • the processing device 51a receives the adjusted audience sound data transmitted by each of the terminals 3 included in the group Ga.
  • the synthesizing unit 27 of the processing device 51a synthesizes the adjusted spectator sound acquired from each of the terminals 3 included in the group Ga by using the received spectator sound data.
  • the adjusted spectator sound synthesized by the processing device 51 is referred to as the spectator sound after the primary synthesis.
  • the processing device 52 is provided corresponding to two or more processing devices 51.
  • the processing device 52 corresponds to the processing device 51a and the processing device 51b.
  • the processing device 52 acquires the data of the audience sound after the primary synthesis from each of the two or more processing devices 51 corresponding to each of the processing devices 51.
  • the processing device 52 includes a synthesis unit 53 and a content generation unit 28.
  • the synthesizing unit 53 synthesizes the spectator sounds represented by the spectator sound data by using the spectator sound data after the primary synthesis acquired from each of the two or more processing devices 51.
  • the processing of the synthesizing unit 53 may be, for example, the same processing as the processing of the synthesizing unit 27.
  • the processing device 52 outputs the data of the audience sound synthesized by the synthesis unit 53.
  • the processing device 52 causes the speaker 6 to output the audience sound synthesized by the synthesis unit 53.
  • the content generation unit 28 generates content using the video captured by the camera 4 and the voice detected by the microphone 5.
  • the processing device 52 provides data of the generated content.
  • the processing device 51 acquires data on the content provided by the processing device 52.
  • the processing device 51 provides content data to the terminal 3 included in the corresponding group G.
  • the processing device 51a provides content data to each of the two or more terminals 3 included in the corresponding group Ga.
  • Each of the two or more terminals 3 included in the group Ga reproduces the content by using the data of the content provided by the processing device 51a.
  • a plurality of terminals 3 are divided into a plurality of groups, and adjusted audience sounds provided by two or more terminals 3 included in each group are synthesized.
  • the processing device 51 is provided. Since the voice processing system 1 distributes and executes the process of synthesizing the adjusted audience sound among a plurality of devices, it is possible to reduce the processing load of each device, for example, to reduce the occurrence of processing delay. be able to.
  • the voice processing system 1 includes a plurality of processing devices (appropriately referred to as higher-level processing devices) for synthesizing the audience sound after the primary synthesis provided by two or more processing devices 51, and the processing device 52 has a plurality of higher-level processing devices. Audience sounds synthesized by each higher-level processing device (appropriately referred to as spectator sounds after secondary synthesis) may be acquired from the processing device, and the spectators after these secondary synthesis may be synthesized and the data may be output.
  • the layer of the device that generates intermediate data between each terminal 3 and the processing device 52 is one layer of the processing device 51 in FIG. 8, but may be two layers of the processing device 51 and the higher-level processing device.
  • the number of layers of the device for generating the intermediate data is arbitrary, and may be 0 layer as shown in FIG. 3, 1 layer as shown in FIG. 8, or a plurality of layers as shown in FIG.
  • the content generation unit 28 may be provided in a device different from the processing device 52, and may be provided in, for example, a device (eg, the processing device 51) that generates intermediate data between the terminal 3 and the processing device 52. ..
  • FIG. 9 is a diagram showing an example of a group according to the third embodiment.
  • Reference numerals EX1 to EX3 in FIG. 9 represent examples of groups, respectively.
  • each of the plurality of groups G is a set of terminals corresponding to the audience position Q in which the distance to the performer position P1 in the event venue AE is within a predetermined range.
  • the group Ga is a set of terminals corresponding to the audience position where the distance to the performer position P1 is less than L1.
  • the group Gb is a set of terminals corresponding to the audience positions where the distance to the performer position P1 is L1 or more and less than L2.
  • each of the plurality of groups G is a set of terminals corresponding to the audience position Q whose orientation from the performer position P1 in the event venue AE is within a predetermined range.
  • the group Ga is a group of terminals corresponding to the audience position Q whose azimuth angle with respect to the performer position P1 is in the range ⁇ 1.
  • the group Gb is a group of terminals corresponding to the audience position Q whose azimuth angle with respect to the performer position P1 is in the range ⁇ 2.
  • the rules for grouping are not limited to the first example or the second example, and are arbitrarily determined.
  • the third example EX3 is an example of grouping in which the grouping rule of the first example and the grouping rule of the second example are combined.
  • the plurality of groups G are determined based on the distance from the performer position P1 and the azimuth angle with respect to the performer position P1.
  • the plurality of groups G may be determined based on the order in which the audience position Q is determined.
  • the plurality of terminals 3 may be grouped based on the order in which the audience position Q is determined.
  • the terminal 3 included in the group Ga may be a terminal corresponding to the spectator position Q in which the spectator position Q is determined from the first to the 100th.
  • the terminal 3 included in the group Gb may be a terminal corresponding to the spectator position Q from 101 to 200 in the order in which the spectator position Q is determined.
  • the plurality of groups G may be determined based on the priority of seats (eg, seat rank) at the event venue AE.
  • the terminal 3 included in the group Ga may be a terminal corresponding to the audience position Q corresponding to the first rank seat (eg, S seat) in the event venue AE.
  • the terminal 3 included in the group Gb may be a terminal corresponding to the audience position Q corresponding to the second rank seat (eg, the A seat) having a lower priority than the first rank in the event venue AE.
  • the first-ranked seat may be a seat in which the participation fee paid by the user is higher than that of the second-ranked seat, or may be a seat assigned to a preselected user (eg, an invited guest).
  • the plurality of groups G may be determined by randomly selecting the terminals 3 included in each group G from the plurality of terminals 3.
  • the above-mentioned grouping rule may be a rule that combines two or more types of rules (eg, conditions).
  • the number of terminals 3 included in each group may be the same or different.
  • the number of terminals 3 included in each group G may be, for example, the number of terminals 3 in which the processing device 51 corresponding to each group G is in charge of processing.
  • the terminal 3 included in the group Ga is a terminal corresponding to the spectator position Q corresponding to the seat of the first rank.
  • the terminal 3 included in the group Gb is a terminal corresponding to the audience position Q corresponding to the second rank seat.
  • the number of terminals 3 included in the group Ga may be smaller than the number of terminals 3 included in the group Gb.
  • the processing device 51a corresponding to the group Ga has a reduced processing load as compared with the processing device 51b corresponding to the group Gb, and the possibility of delay in processing, for example, is reduced.
  • the performance of each processing device 51 may be the same or different.
  • the terminal 3 included in the group Ga is a terminal corresponding to the spectator position Q corresponding to the seat of the first rank.
  • the terminal 3 included in the group Gb is a terminal corresponding to the audience position Q corresponding to the second rank seat.
  • the processing device 51a corresponding to the group Ga may have higher hardware performance (eg, CPU processing speed, storage unit read / write speed, communication speed) than the processing device 51b corresponding to the group Gb. In this case, the processing device 51a corresponding to the group Ga is less likely to cause a processing delay than the processing device 51b corresponding to the group Gb.
  • the process of dividing the plurality of terminals 3 into a plurality of groups is executed by, for example, the reception terminal 8 of FIG.
  • the grouping process may be executed by a device different from the reception terminal 8.
  • the grouping process may be executed by the processing device 2 or may be executed by another device.
  • the grouping process may be executed by an external device of the speech processing system 1.
  • FIG. 10 is a diagram showing a voice processing system according to the fourth embodiment.
  • each predetermined area eg, event venue AE
  • each group corresponds to an audience position included in the partial area corresponding to each group.
  • Each of the plurality of subregions corresponds to one of a plurality of groups.
  • the plurality of terminals 3 are divided into a plurality of groups G.
  • the plurality of groups G are determined based on the orientation of the performer position with respect to the audience position Q, as described in the second example EX2 of FIG.
  • the voice processing system 1 of FIG. 10 includes a plurality of processing devices 2.
  • any processing device is appropriately represented by reference numeral 2, and when the processing device 2 is distinguished, the reference numerals are added to the reference numerals 2 such as the processing device 2a and the processing device 2b. show.
  • Each of the plurality of processing devices 2 corresponds to one group G.
  • the processing apparatus 2a corresponds to the group Ga
  • the processing apparatus 2b corresponds to the group Gb.
  • Each processing device 2 provides content data to the terminal 3 included in the corresponding group G.
  • the processing device 2a provides content data to each terminal 3 included in the group Ga.
  • Each processing device 2 acquires the adjusted audience sound data from the terminal 3 included in the corresponding group G.
  • the processing device 2a receives the adjusted audience sound data transmitted by each terminal 3 included in the group Ga.
  • Each processing device 2 synthesizes the adjusted audience sound acquired from the terminal 3 included in the corresponding group G, and outputs the data of the synthesized audience sound.
  • the processing device 2a synthesizes the adjusted audience sound acquired from each terminal 3 included in the group Ga, and outputs the data of the synthesized audience sound.
  • the voice processing system 1 includes a plurality of voice output devices (eg, speaker 6).
  • any speaker is appropriately represented by reference numeral 6, and when the speaker 6 is distinguished, it is represented by a reference numeral in which the alphabets a, b, ... Are added to the reference numeral 6 such as the speaker 6a and the speaker 6b.
  • the plurality of speakers 6 are arranged in the demonstration area AP. Each of the plurality of speakers 6 has a corresponding relationship with the processing device 2.
  • the speaker 6 is provided in a one-to-one relationship with the processing device 2.
  • the speaker 6a corresponds to the processing device 2a
  • the speaker 6b corresponds to the processing device 2b.
  • Each of the plurality of speakers 6 corresponds to the group G in charge of the corresponding processing device 2.
  • the speaker 6a corresponds to the group Ga
  • the speaker 6b corresponds to the group Gb.
  • the area where the audience position Q corresponding to the terminal 3 included in each group G is distributed is referred to as a group area.
  • the arrangement of the performer P and the plurality of speakers 6 in the demonstration area AP is set so as to correspond to the arrangement of the performer position P1 and the plurality of group areas in the event venue AE.
  • the rotation direction around the vertical direction of the performer P is considered centering on the performer P.
  • This rotation direction is centered on the performer position P1 in the event venue AE, and corresponds to the rotation direction around the vertical direction of the performer position P1.
  • the plurality of speakers 6 are arranged in the order of speaker 6a, speaker 6b, speaker 6c, speaker 6d, and speaker 6e in the clockwise direction in the rotation direction.
  • the regions of the plurality of groups G are arranged in the order of the region of the group Ga, the region of the group Gb, the region of the group Gc, the region of the group Gd, and the region of the group Ge in the clockwise direction in the rotation direction. ..
  • the plurality of speakers 6 are arranged so that the azimuth relationship with respect to the performer P maintains the mutual relationship between the regions of the plurality of groups G with respect to the performer position P1.
  • the voice processing system 1 includes a plurality of cameras 4.
  • any camera is appropriately represented by reference numeral 4, and when the camera 4 is distinguished, it is represented by a reference numeral in which the alphabets a, b, ... Are added to the reference numeral 4 such as the camera 4a and the camera 4b.
  • the plurality of cameras 4 are arranged in the demonstration area AP.
  • Each of the plurality of cameras 4 has a corresponding relationship with the processing device 2.
  • the camera 4 is provided in a one-to-one relationship with the processing device 2.
  • the camera 4a corresponds to the processing device 2a
  • the camera 4b corresponds to the processing device 2b.
  • Each of the plurality of cameras 4 corresponds to the group G in charge of the corresponding processing device 2.
  • the camera 4a corresponds to the group Ga and the camera 4b corresponds to the group Gb.
  • the arrangement of the performer P and the plurality of cameras 4 in the demonstration area AP is set so as to correspond to the arrangement of the performer position P1 and the areas of the plurality of groups G in the event venue AE.
  • the rotation direction around the vertical direction of the performer P is considered centering on the performer P.
  • This rotation direction is centered on the performer position P1 in the event venue AE, and corresponds to the rotation direction around the vertical direction of the performer position P1.
  • the plurality of cameras 4 are arranged in the order of the camera 4a, the camera 4b, the camera 4c, the camera 4d, and the camera 4e in the clockwise direction in the rotation direction.
  • the regions of the plurality of groups G are arranged in the order of the region of the group Ga, the region of the group Gb, the region of the group Gc, the region of the group Gd, and the region of the group Ge in the clockwise direction in the rotation direction. ..
  • the plurality of cameras 4 are arranged so that the relationship of the azimuth angle with respect to the performer P maintains the positional relationship of the plurality of group G regions with respect to the performer position P1.
  • each camera 4 is set to be oriented with respect to the performer P so as to shoot an image corresponding to the state in which the performer position P1 is viewed from the area of the corresponding group G. There is.
  • the processing device 2 generates content by using the sound detected by the microphone 5 and the image taken by the camera 4 corresponding to the processing device 2.
  • the processing device 2a generates content using the sound detected by the microphone 5 and the image captured by the camera 4a.
  • the processing device 2a provides the generated content to the terminal 3 included in the corresponding group Ga.
  • the performer P can hear the synthesized audience sound corresponding to the group G from the same direction as the azimuth angle of the group G with respect to the performer position P1.
  • the group Ga is the area on the left side of the front of the performer position P1. It is assumed that the performer P faces the area of the group Ga in the content and requests the user for a reaction such as applause.
  • the user U who views the content by the terminal 3 included in the group Ga can see the performer P who turns to himself and calls for a reaction.
  • the audience sound corresponding to the reaction is adjusted by the terminal 3, synthesized by the processing device 2a, and output from the speaker 6a.
  • the performer P can hear the audience sound corresponding to the reaction from, for example, the direction in which the user U is called for the reaction.
  • the user U can experience a sense of realism even when participating in an event at a place different from the venue of the event in the real space, and the voice processing system 1 according to the embodiment can provide a new experience.
  • the speaker 6 is provided in a one-to-one correspondence with the processing device 2, but the number of speakers 6 corresponding to one processing device 2 may be one or a plurality.
  • the voice processing system 1 does not have to include at least one speaker 6.
  • the plurality of speakers 6 are devices provided in a system different from the voice processing system 1 (eg, an acoustic system), and the voice processing system is a plurality arranged so as to correspond to an arrangement of regions of a plurality of groups G.
  • a system may be used in which the data of the audience sound after synthesis is output to the speaker 6 of the above.
  • the camera 4 is provided in a one-to-one correspondence with the processing device 2, but the number of cameras 4 corresponding to one processing device 2 may be one or a plurality.
  • the voice processing system 1 does not have to include at least one camera 4.
  • the plurality of cameras 4 are devices provided in a system different from the voice processing system 1 (eg, a photographing system), and the voice processing system is a plurality arranged so as to correspond to the arrangement of the regions of the plurality of groups G. It may be a system that acquires the image of the performer P from the camera 4 of the above.
  • the number of cameras 4 may be one, and for example, the voice processing system 1 may be a system that distributes common contents among a plurality of groups G.
  • the voice processing system 1 may generate content by using the computer graphics of the performer P (appropriately referred to as CG) created in advance.
  • the performer P is represented by a three-dimensional CG such as a polygon, and the voice processing system 1 generates content by moving the three-dimensional CG using the motion of the performer P detected by motion capture or the like. May be good.
  • the audio processing system 1 may generate content by combining the video acquired by the camera 4 and the CG.
  • the voice processing system 1 does not have to have a position determining unit for determining the position of the spectator representing the position of the user in the predetermined area where the event is held, based on the application for participation in the event from the user. good.
  • the spectator position may be determined independently of the application for participation or may be determined by the organizer of the event.
  • the voice processing system 1 can convey the reaction of the user U to the performer P, and for example, the reaction of the user U affects the performance of the performer P, so that the user U is new. Can provide an experience.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

L'invention concerne un système de traitement sonore comprenant : des microphones destinés à détecter des sons émis par des utilisateurs participant à un événement ; une unité de réglage de son afin d'ajuster les sons détectés par les microphones sur la base de relations entre une position de performeur qui indique la position d'un performeur de l'événement dans une zone prédéterminée où l'événement a lieu, et des positions de public qui indiquent les positions des utilisateurs dans la zone prédéterminée ; une unité de synthèse pour synthétiser le son, qui est détecté par le microphone correspondant à un premier utilisateur qui est un utilisateur et qui est ajusté par l'unité de réglage de son, et le son, qui est détecté par un microphone correspondant à un second utilisateur qui est un utilisateur différent du premier utilisateur et est ajusté par l'unité de réglage de son ; et une unité de sortie pour délivrer en sortie des données indiquant les sons synthétisés par l'unité de synthèse.
PCT/JP2020/028035 2020-07-20 2020-07-20 Système de traitement sonore, dispositif de traitement sonore, procédé de traitement sonore et programme de traitement sonore WO2022018786A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/JP2020/028035 WO2022018786A1 (fr) 2020-07-20 2020-07-20 Système de traitement sonore, dispositif de traitement sonore, procédé de traitement sonore et programme de traitement sonore
JP2021549976A JP6951610B1 (ja) 2020-07-20 2020-07-20 音声処理システム、音声処理装置、音声処理方法、及び音声処理プログラム
JP2021155819A JP2022020625A (ja) 2020-07-20 2021-09-24 音声処理システム、音声処理装置、音声処理方法、及び音声処理プログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/028035 WO2022018786A1 (fr) 2020-07-20 2020-07-20 Système de traitement sonore, dispositif de traitement sonore, procédé de traitement sonore et programme de traitement sonore

Publications (1)

Publication Number Publication Date
WO2022018786A1 true WO2022018786A1 (fr) 2022-01-27

Family

ID=78114183

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/028035 WO2022018786A1 (fr) 2020-07-20 2020-07-20 Système de traitement sonore, dispositif de traitement sonore, procédé de traitement sonore et programme de traitement sonore

Country Status (2)

Country Link
JP (2) JP6951610B1 (fr)
WO (1) WO2022018786A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114745653A (zh) * 2022-02-21 2022-07-12 上海卓越睿新数码科技股份有限公司 一种基于多声道环绕音效实现全景实境教学的方法
CN116437282A (zh) * 2023-03-23 2023-07-14 合众新能源汽车股份有限公司 虚拟演唱会的声感处理方法及存储介质、电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015162947A1 (fr) * 2014-04-22 2015-10-29 ソニー株式会社 Dispositif de reproduction d'informations, procédé de reproduction d'informations, dispositif d'enregistrement d'informations, et procédé d'enregistrement d'informations
JP2017033297A (ja) * 2015-07-31 2017-02-09 富士通株式会社 空席位置通知方法、及び、サーバ
WO2017090096A1 (fr) * 2015-11-24 2017-06-01 株式会社ハイスピードボーイズ Système de commande de production créative
JP2018094326A (ja) * 2016-12-16 2018-06-21 株式会社バンダイナムコエンターテインメント イベント制御システム、イベント通知システム及びプログラム

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015162947A1 (fr) * 2014-04-22 2015-10-29 ソニー株式会社 Dispositif de reproduction d'informations, procédé de reproduction d'informations, dispositif d'enregistrement d'informations, et procédé d'enregistrement d'informations
JP2017033297A (ja) * 2015-07-31 2017-02-09 富士通株式会社 空席位置通知方法、及び、サーバ
WO2017090096A1 (fr) * 2015-11-24 2017-06-01 株式会社ハイスピードボーイズ Système de commande de production créative
JP2018094326A (ja) * 2016-12-16 2018-06-21 株式会社バンダイナムコエンターテインメント イベント制御システム、イベント通知システム及びプログラム

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114745653A (zh) * 2022-02-21 2022-07-12 上海卓越睿新数码科技股份有限公司 一种基于多声道环绕音效实现全景实境教学的方法
CN116437282A (zh) * 2023-03-23 2023-07-14 合众新能源汽车股份有限公司 虚拟演唱会的声感处理方法及存储介质、电子设备
WO2024193165A1 (fr) * 2023-03-23 2024-09-26 合众新能源汽车股份有限公司 Procédé de traitement acoustique pour concert virtuel, et support de stockage et dispositif électronique

Also Published As

Publication number Publication date
JP6951610B1 (ja) 2021-10-20
JPWO2022018786A1 (fr) 2022-01-27
JP2022020625A (ja) 2022-02-01

Similar Documents

Publication Publication Date Title
US7725203B2 (en) Enhancing perceptions of the sensory content of audio and audio-visual media
JP2022020625A (ja) 音声処理システム、音声処理装置、音声処理方法、及び音声処理プログラム
US20240187553A1 (en) Integration of remote audio into a performance venue
JP7229146B2 (ja) 情報処理装置、情報処理方法および情報処理プログラム
JP2021197614A (ja) 映像配信システム、それに用いるコンピュータプログラム、及び制御方法
WO2022163137A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et programme
WO2018008434A1 (fr) Dispositif de présentation des performances musicales
WO2021246104A1 (fr) Procédé de commande et système de commande
WO2022024898A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et programme informatique
JP7191146B2 (ja) 配信サーバ、配信方法、及びプログラム
JP2020008752A (ja) 生バンドカラオケライブ配信システム
JP6220576B2 (ja) 複数人による通信デュエットに特徴を有する通信カラオケシステム
WO2023022004A1 (fr) Procédé de fonctionnement d'un système de commande, système de commande et programme
JP2016213667A (ja) 感覚提示装置
US20240015368A1 (en) Distribution system, distribution method, and non-transitory computer-readable recording medium
JP2006047753A (ja) カラオケ情報配信システム、プログラム、情報記憶媒体およびカラオケ情報配信方法
WO2023084933A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et programme
WO2023281820A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et support de stockage
WO2022113289A1 (fr) Procédé de diffusion de données en direct, système de diffusion de données en direct, dispositif de diffusion de données en direct, dispositif de reproduction de données en direct et procédé de reproduction de données en direct
JP7137278B2 (ja) 再生制御方法、制御システム、端末装置およびプログラム
WO2022113288A1 (fr) Procédé de diffusion de données en direct, système de diffusion de données en direct, dispositif de diffusion de données en direct, dispositif de reproduction de données en direct et procédé de reproduction de données en direct
WO2023238637A1 (fr) Dispositif, procédé et programme de traitement d'informations
US20230042477A1 (en) Reproduction control method, control system, and program
US20210320959A1 (en) System and method for real-time massive multiplayer online interaction on remote events
JP2007134808A (ja) 音声配信装置、音声配信方法、音声配信プログラム、および記録媒体

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021549976

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20946201

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20946201

Country of ref document: EP

Kind code of ref document: A1