WO2022196135A1 - 情報処理方法、情報処理装置、および、プログラム - Google Patents
情報処理方法、情報処理装置、および、プログラム Download PDFInfo
- Publication number
- WO2022196135A1 WO2022196135A1 PCT/JP2022/003588 JP2022003588W WO2022196135A1 WO 2022196135 A1 WO2022196135 A1 WO 2022196135A1 JP 2022003588 W JP2022003588 W JP 2022003588W WO 2022196135 A1 WO2022196135 A1 WO 2022196135A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound
- spatial resolution
- user
- information processing
- sound source
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 81
- 238000003672 processing method Methods 0.000 title claims abstract description 42
- 230000005236 sound signal Effects 0.000 claims abstract description 60
- 230000004044 response Effects 0.000 claims description 35
- 238000000034 method Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 238000009877 rendering Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 17
- 230000001934 delay Effects 0.000 description 12
- 230000008859 change Effects 0.000 description 4
- 230000003111 delayed effect Effects 0.000 description 4
- 210000005069 ears Anatomy 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 239000000470 constituent Substances 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
- H04S7/304—For headphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
- H04S1/002—Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/13—Aspects of volume control, not necessarily automatic, in stereophonic sound systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
Definitions
- the present invention relates to an information processing method, an information processing device, and a program.
- stereophonic processing requires a relatively large scale of computation, and there is a problem that the output sound may be delayed depending on the time required for the computation.
- the present invention provides a device such as an information processing method that suppresses delay that may occur in output sound.
- An information processing method acquires a stream including first position and orientation information indicating the position and orientation of a sound source and a sound signal indicating the sound output by the sound source, and second position/posture information indicating a posture, and using the first position/posture information and the second position/posture information, the sound signal is generated according to the positional relationship between the user's head and the sound source.
- This is an information processing method for setting the spatial resolution in the stereophonic processing applied to the.
- the information processing method of the present invention can suppress delays that may occur in output sounds.
- FIG. 1 is an explanatory diagram showing an example of the positional relationship between the user and the sound source in the embodiment.
- FIG. 2 is a block diagram showing the functional configuration of the information processing device according to the embodiment.
- FIG. 3 is a first explanatory diagram of spatial resolution of stereophonic processing in the embodiment.
- FIG. 4 is a second explanatory diagram of the spatial resolution of the stereophonic sound processing in the embodiment.
- FIG. 5 is a third explanatory diagram of the spatial resolution of the stereophonic sound processing in the embodiment.
- FIG. 6 is an explanatory diagram of the response time length of the stereophonic sound processing in the embodiment.
- FIG. 7 is an explanatory diagram showing a first example of parameters for stereophonic sound processing according to the embodiment.
- FIG. 8 is an explanatory diagram showing a second example of parameters for stereophonic sound processing according to the embodiment.
- FIG. 9 is an explanatory diagram showing a third example of parameters for stereophonic sound processing according to the embodiment.
- FIG. 10 is a flow diagram showing processing of the information processing apparatus according to the embodiment.
- the above stereophonic sound processing technology is effective only when the change in the user's posture is relatively small or regular. If the above case is deviated from, the predicted posture information does not match the actual posture information of the user, so the position of the sound image for the user may not be appropriate or the position of the sound image may change rapidly.
- Patent Document 1 may not solve the problem that the output sound may be delayed due to the time required for computation of stereophonic processing.
- an information processing method provides a stream including first position/orientation information indicating the position and orientation of a sound source, and a sound signal indicating the sound output by the sound source. is obtained, second position and orientation information indicating the position and orientation of the user's head is obtained, and the user's head and the sound source are obtained using the first and second position and orientation information.
- This is an information processing method for setting spatial resolution in stereophonic processing applied to the sound signal according to the positional relationship between the two.
- the spatial resolution in the stereophonic processing is set according to the positional relationship between the user's head and the sound source, it is possible to adjust the scale of calculation required for the stereophonic processing. Therefore, when the scale of computation required for stereophonic sound processing is relatively large, reducing the spatial resolution reduces the scale of computation and shortens the time required for stereophonic processing, resulting in a delay that can occur in the output sound. can be suppressed. Thus, according to the above information processing method, it is possible to suppress the delay that may occur in the output sound.
- the larger the distance between the user's head and the sound source, the lower the spatial resolution may be set.
- the spatial resolution in stereophonic processing is set lower as the distance between the user's head and the sound source increases, thereby reducing the scale of computation required for stereophonic processing. possible delays can be suppressed.
- the information processing method it is possible to more easily suppress delays that may occur in the output sound.
- the stream further includes type information indicating whether the sound indicated by the sound signal is human speech, and in the setting of the spatial resolution, the sound indicated by the sound signal is human speech. is indicated in the type information, the spatial resolution may be set higher.
- the stream further includes type information indicating whether or not the sound indicated by the sound signal is human speech, and the spatial resolution setting indicates that the sound indicated by the sound signal is not human speech. is indicated in the type information, the spatial resolution may be set lower.
- the scale of computation required for stereophonic processing for sounds that are not human speech is reduced. Delays that may occur in sound can be suppressed. Compared to human speech, the sound image position of non-human sounds may not require a high degree of accuracy. It can contribute to the suppression of possible delays. Thus, according to the information processing method, it is possible to more easily suppress delays that may occur in the output sound.
- the stream includes the first position and orientation information and the sound signal for the one or more sound sources, and in setting the spatial resolution, the greater the number of the one or more sound sources, the higher the spatial resolution. You can set it lower.
- the spatial resolution is set lower as the number of sound sources included in the stream increases, thereby reducing the scale of computation required for stereophonic sound processing and, as a result, reducing the delay that may occur in the output sound. can be suppressed.
- the information processing method it is possible to more easily suppress delays that may occur in the output sound.
- the time response length in the stereophonic processing may be further set according to the positional relationship.
- the time response length in the stereophonic processing is set according to the positional relationship between the user's head and the sound source, the user can appropriately perceive the distance from the user to the sound source.
- the information processing method it is possible to suppress the delay that may occur in the output sound while allowing the user to appropriately perceive the distance from the user to the sound source.
- the greater the distance between the user's head and the sound source the greater the time response length may be set.
- the user can appropriately perceive the distance from the user to the sound source.
- the above information processing method it is possible to suppress the delay that may occur in the output sound while allowing the user to appropriately perceive the distance from the user to the sound source.
- an output signal indicating a sound output by a speaker is generated, and the generated output signal is transmitted to the speaker. may cause the speaker to output the sound indicated by the output signal.
- the user can listen to the output sound with suppressed delay by outputting the sound based on the output signal generated by stereophonic processing using the set spatial resolution and allowing the user to listen to it. can.
- the information processing method it is possible to suppress the delay that may occur in the output sound, and allow the user to listen to the output sound with the suppressed delay.
- the stereophonic processing uses the first position and orientation information and the second position and orientation information to perform stereophonic processing in a space in which the sound source is arranged according to the positional relationship between the user's head and the sound source.
- the spatial resolution may be the spatial resolution in the rendering process.
- the spatial resolution is set in rendering processing as stereophonic processing. Therefore, according to the information processing method, it is possible to suppress the delay that may occur in the output sound.
- an information processing apparatus includes a decoding unit that acquires a stream including first position/orientation information indicating the position and orientation of a sound source and a sound signal that indicates the sound output by the sound source; an acquisition unit for acquiring second position/posture information indicating the position and posture of the head of the user; and a setting unit that sets spatial resolution in stereophonic processing to be applied to the sound signal according to the relationship.
- a program according to one aspect of the present invention is a program that causes a computer to execute the above information processing method.
- these general or specific aspects may be realized by a system, device, integrated circuit, computer program, or a recording medium such as a computer-readable CD-ROM. Or it may be realized by any combination of recording media.
- FIG. 1 is an explanatory diagram showing an example of the positional relationship between the user U and the sound source 5 in this embodiment.
- FIG. 1 shows a user U existing in a space S and a sound source 5 recognized by the user U.
- the space S is represented as a plane containing the x-axis and the y-axis, but also has an extension in the z-axis direction. The same applies hereafter.
- Walls or objects may be placed in the space S. Walls also include ceilings or floors.
- the information processing device 10 generates a sound signal for the user U to listen to by performing stereophonic processing, which is digital sound processing, based on a stream including the sound signal output by the sound source 5 .
- the stream further includes position and orientation information indicating the position and orientation of the sound source 5 in the space S.
- the sound signal generated by the information processing device 10 is output as a sound by a speaker, and the user U listens to the sound.
- the speaker is assumed to be a speaker included in earphones or headphones worn by the user U, but is not limited to this.
- the sound source 5 is a virtual sound source (generally referred to as a sound image) that is recognized as a sound source by the user U listening to the sound signal generated based on the stream. is not a source of Although a human is shown as the sound source 5 in FIG. 1, the sound source 5 is not limited to a human and may be any sound source.
- the user U listens to the sound based on the sound signal generated by the information processing device 10 and output from the speaker.
- the sound output from the speaker based on the sound signal generated by the information processing device 10 is heard by the left and right ears of the user U, respectively.
- An appropriate time difference or phase difference (also referred to as a time difference or the like) is provided by the information processing device 10 to the sounds heard by the left and right ears of the user U, respectively.
- the user U perceives the direction of the sound source 5 for the user U based on the time difference between the sounds heard by the left and right ears.
- the sounds heard by the left and right ears of the user U include sounds corresponding to sounds that directly arrive from the sound source 5 (referred to as direct sounds), and sounds that are output from the sound source 5 and arrive after being reflected by the wall surface.
- a sound corresponding to (described as reflected sound) is included by the information processing device 10 .
- the user U perceives the distance from the user U to the sound source 5 based on the time interval between the direct sound and the reflected sound included in the heard sound.
- the timing of arrival of the direct sound and the reflected sound at the user U, and the amplitude and phase of the direct sound and the reflected sound are determined based on the sound signal included in the stream.
- a sound signal (described as an output signal) indicating the sound to be output from the speaker is generated.
- Stereophonic processing can involve relatively large scale computations.
- the information processing apparatus 10 When the number of sound signals included in the stream is relatively large, or when the spatial resolution of the stereophonic sound processing is relatively high, the information processing apparatus 10 requires a relatively long time for arithmetic processing, and the output signal is Generation and output can be delayed.
- One of the measures to suppress the delay that may occur in the output signal is to lower the spatial resolution of the stereophonic processing. can decline. In this way, there is a trade-off relationship between the high quality of sound heard by the user U and the amount of arithmetic processing included in stereophonic processing.
- the information processing device 10 uses the distance between the user U and the sound source 5 to adjust the parameters of the stereophonic processing, thereby contributing to the reduction of the processing load of the stereophonic processing. For example, the information processing apparatus 10 reduces the processing load of the stereophonic sound processing by lowering the spatial resolution, which is a parameter of the stereophonic sound processing.
- FIG. 2 is a block diagram showing the functional configuration of the information processing device 10 according to this embodiment.
- the information processing device 10 includes a decoding unit 11, an acquisition unit 12, an adjustment unit 13, a processing unit 14, and a setting unit 15 as functional units.
- the functional units included in the information processing apparatus 10 are implemented by a processor (such as a CPU (Central Processing Unit)) (not shown) included in the information processing apparatus 10 executing a predetermined program using a memory (not shown). obtain.
- a processor such as a CPU (Central Processing Unit)
- the decoding unit 11 is a functional unit that decodes the stream.
- the stream specifically includes position and orientation information (corresponding to first position and orientation information) indicating the position and orientation of the sound source 5 in the space S, and a sound signal indicating the sound output by the sound source 5 .
- the stream may include type information indicating whether or not the sound output by the sound source 5 is human speech.
- voice means human voice.
- the decoding unit 11 provides the sound signal obtained by decoding the stream to the processing unit 14, and provides the position and orientation information obtained by decoding the stream to the adjusting unit 13.
- the stream may be obtained by the information processing device 10 from an external device, or may be stored in advance in a storage device of the information processing device 10 .
- a stream is a stream encoded in a predetermined format. be.
- the position and orientation information indicating the position and orientation of the sound source 5 is the coordinates (x, y and z) of the sound source 5 in the directions of the three axes and the angles around the three axes (yaw angle, pitch angle and roll angle ) and 6 degrees of freedom.
- the position and orientation information of the sound source 5 can specify the position and orientation of the sound source 5 .
- the coordinates are coordinates in an appropriately set coordinate system.
- the posture is an angle around three axes indicating a predetermined direction (referred to as a reference direction) for the sound source 5 .
- the reference direction may be the direction in which the sound source 5 outputs sound, or any other direction that is uniquely determined for the sound source 5 .
- a stream may include, for each of one or more sound sources 5, position and orientation information indicating the position and orientation of the sound source 5, and a sound signal indicating the sound output by the sound source 5.
- the acquisition unit 12 is a functional unit that acquires the position and posture of the user's U head in the space S.
- the acquisition unit 12 acquires position and orientation information (second position and orientation information) including information indicating the position of the head of the user U (described as position information) and information indicating the orientation (described as orientation information) from a sensor or the like.
- position information information indicating the position of the head of the user U
- orientation information information indicating the orientation information from a sensor or the like.
- the position and orientation information of the head of the user U includes the coordinates (x, y and z) of the head of the user U in the three-axis directions and the angles around the three axes (yaw angle, pitch angle and roll angle). It is information of 6 degrees of freedom, including angle).
- the position and orientation of the user U's head can be identified by the position and orientation information of the user's U head.
- the coordinates are coordinates in a coordinate system common to the coordinate system defined for the sound source 5 .
- a position can be defined as a position having a predetermined positional relationship from a predetermined position (eg, origin) in a coordinate system.
- the posture is an angle around three axes indicating the direction in which the user U's head is facing.
- the sensors may be, for example, inertial measurement units (IMUs), accelerometers, gyroscopes, magnetic sensors, or combinations thereof.
- IMUs inertial measurement units
- accelerometers accelerometers
- gyroscopes magnetic sensors
- the sensor or the like is assumed to be worn on the head of the user U, and may be fixed to an earphone or headphone worn by the user U.
- the adjustment unit 13 is a functional unit that adjusts the position and orientation information of the user U in the space S using the parameters in the stereophonic processing performed by the processing unit 14 .
- the adjustment unit 13 acquires the spatial resolution, which is a parameter in stereophonic processing, from the setting unit 15 . Then, the adjustment unit 13 adjusts the position information of the head of the user U acquired by the acquisition unit 12 by changing it to any value that is an integral multiple of the spatial resolution. When changing, the adjustment unit 13 may adopt the value closest to the position information of the user U's head acquired by the acquisition unit 12 from among a plurality of values that are integral multiples of the spatial resolution. The adjustment unit 13 provides the adjusted position information of the user U's head and the adjusted posture information of the user's U head to the processing unit 14 .
- the processing unit 14 is a functional unit that performs stereophonic processing, which is digital acoustic processing, on the sound signal acquired by the decoding unit 11 .
- the processing unit 14 has a plurality of filters used for stereophonic processing. Filters are used, for example, in calculations that adjust the amplitude and phase of sound signals for each frequency.
- the processing unit 14 acquires parameters (that is, spatial resolution and time response length) used for stereophonic processing from the adjusting unit 13, and performs stereophonic processing using the acquired parameters.
- the processing unit 14 calculates the propagation paths of the direct sound and the reflected sound arriving at the user U from the sound source 5, and also calculates the timing at which the direct sound and the reflected sound reach the user. Also, for each range of angular directions around the head of the user U, applying a filter according to the range to the signal indicating the sound (direct sound and reflected sound) arriving at the user U from that range. , the amplitude and phase of the sound arriving at the user U are calculated.
- the setting unit 15 is a functional unit that sets parameters for stereophonic processing executed by the processing unit 14 .
- the parameters of stereophonic processing may include spatial resolution and temporal response length in stereophonic processing.
- the setting unit 15 uses the position and orientation information of the sound source 5 in the space S and the position and orientation information of the user U acquired by the acquisition unit 12 to generate a stereoscopic image according to the positional relationship between the head of the user U and the sound source 5.
- Sets the spatial resolution which is a parameter of acoustic processing.
- the setting unit 15 may further set a time response length, which is a parameter of stereophonic processing, according to the positional relationship.
- the setting unit 15 provides the set parameters to the adjusting unit 13 .
- the distance D between the user U and the sound source 5 can be used for parameter setting.
- Distance D is a vector indicating the position and orientation of the sound source 5
- the setting unit 15 may set the spatial resolution lower as the distance D between the head of the user U and the sound source 5 in the space S increases.
- the setting unit 15 may set the time response length larger as the distance D between the head of the user U and the sound source 5 in the space S increases.
- the spatial resolution of the stereophonic processing is the resolution of the angular range centered on the user U.
- the processing unit 14 applies a filter to sound signals arriving at the user U from each relatively narrow angular range (for example, angular range 30).
- the processing unit 14 applies a filter to sound signals arriving at the user U from each relatively wide angle range (for example, the angle range 40).
- a high spatial resolution corresponds to a narrow angular range
- a low spatial resolution corresponds to a wide angular range.
- the angular range corresponds to the units to which the same filter is applied.
- the processing unit 14 applies a filter corresponding to each angular range 31, 32, 33, . . . By doing so, sound signals representing sounds arriving at the user U from each of the angular ranges 31, 32, 33, . . . are calculated (see FIG. 4). The sound arriving at the user U from each of the angular ranges 31, 32, 33, .
- the processing unit 14 applies a filter corresponding to each angular range 41, 42, 43, . . . Sound signals representing sounds arriving at the user U from each of the angular ranges 41, 42, 43, . . . are calculated (see FIG. 5). The sound arriving at the user U from each of the angular ranges 41, 42, 43, .
- FIG. 6 is an explanatory diagram of the response time length of stereophonic sound processing in this embodiment.
- FIG. 6 shows sound signals generated by stereophonic processing.
- the sound signal includes a waveform 51 corresponding to direct sound arriving at the user U from the sound source 5 and waveforms 52 , 53 , 54 , 55 and 56 corresponding to reflected sounds arriving at the user U from the sound source 5 .
- Each of the waveforms 52, 53, 54, 55 and 56 corresponding to the reflected sound is delayed from the direct sound by the delay time determined by the positional relationship between the sound source 5, the user U and the wall surface in the space S, Amplitude is reduced due to reflection from the wall surface.
- the delay time is determined within a range of approximately 10 msec to 100 msec.
- the time response length is an index that indicates the magnitude of the delay time. The longer the time response length, the longer the delay time, and the shorter the time response length, the shorter the delay time.
- the time response length is only an index of the magnitude of the delay time, and does not indicate the delay time itself of the waveform corresponding to the reflected sound.
- the time width from waveform 51 to waveform 55 and the time response length are substantially equal, but this is not limiting, and there are cases where the time width from waveform 51 to waveform 54 and the time response length are substantially equal.
- the time width from waveform 51 to waveform 56 may be approximately equal to the time response length.
- FIG. 7 is an explanatory diagram showing a first example of parameters for stereophonic processing in this embodiment.
- FIG. 7 shows a correspondence table in which spatial resolution and time response length, which are parameters of stereophonic processing, are associated with each of a plurality of ranges of distance D between user U and sound source 5 .
- a distance D of less than 1 m is associated with a spatial resolution of 10 degrees and a time response length of 10 msec.
- the distance D of 1 m or more and less than 3 m, 3 m or more and less than 20 m, and 20 m or more has a spatial resolution of 30 degrees, 45 degrees, and 90 degrees, and a time response length of 50 msec, 200 msec, and 1 sec. is associated with
- the setting unit 15 has a correspondence table between the distance D and the spatial resolution shown in FIG. 7, and provides the adjustment unit 13 with the correspondence table.
- the adjustment unit 13 refers to the provided correspondence table and acquires the spatial resolution and the time response length associated with the distance D between the head of the user U and the sound source 5 acquired from the acquisition unit 12 .
- the setting unit 15 sets the spatial resolution lower as the distance D between the head of the user U and the sound source 5 in the space S increases. In other words, the setting unit 15 sets a value indicating a lower spatial resolution. . In addition, the setting unit 15 sets the time response length larger as the distance D between the head of the user U and the sound source 5 in the space S increases, in other words, sets a value indicating a longer time response length.
- the setting unit 15 may change the spatial resolution according to whether or not the sound indicated by the sound signal is human speech.
- the information processing apparatus 10 changes the spatial resolution according to whether or not the sound indicated by the sound signal is human speech, thereby contributing to more accurate stereophonic processing of human speech.
- the setting unit 15 may set a higher spatial resolution when the type information indicates that the sound indicated by the sound signal is human speech. In other words, a value indicating higher spatial resolution may be set.
- the setting unit 15 may correct the value to indicate a higher spatial resolution than the already set spatial resolution.
- the setting unit 15 may set the spatial resolution lower when the type information indicates that the sound indicated by the sound signal is not human speech. A value indicating low spatial resolution may be set.
- the setting unit 15 may correct the value to indicate a lower spatial resolution than the already set spatial resolution.
- the setting unit 15 may change the spatial resolution according to the number of sound sources included in the stream.
- the setting unit 15 may set the spatial resolution lower as the number of sound sources included in the stream increases. You may When setting the spatial resolution, if the spatial resolution has already been set, the setting unit 15 may correct the value to indicate a lower spatial resolution than the already set spatial resolution.
- FIG. 8 is an explanatory diagram showing a second example of parameters for stereophonic processing in this embodiment.
- FIG. 8 shows a correspondence table in which spatial resolution is associated with each of a plurality of ranges of the distance D between the user U and the sound source 5, and is an example of the parameters corrected by the setting unit 15 from the parameters shown in FIG. is.
- a distance D of less than 1 m is associated with a spatial resolution of 5 degrees.
- distances D of 1 m or more and less than 3 m, 3 m or more and less than 20 m, and 20 m or more are associated with spatial resolutions of 15 degrees, 22.5 degrees, and 45 degrees, respectively.
- the spatial resolution values shown in FIG. 8 are 1 ⁇ 2 times the spatial resolution values shown in FIG. 7 for each value of distance D.
- FIG. 8 has twice the spatial resolution shown in FIG. 7 for each value of distance D.
- the setting unit 15 converts the correspondence table used for stereophonic processing from the correspondence table shown in FIG. 7 to the correspondence table shown in FIG. to be corrected. This allows the setting unit 15 to set a higher spatial resolution when the type information indicates that the sound indicated by the sound signal is human speech.
- FIG. 9 is an explanatory diagram showing a third example of parameters for stereophonic processing in this embodiment.
- FIG. 9 shows a correspondence table in which the spatial resolution is associated with each of a plurality of ranges of the distance D between the user U and the sound source 5, corrected by the setting unit 15 from the parameters shown in FIG. be.
- a distance D of less than 1 m is associated with a spatial resolution of 20 degrees.
- distances D of 1 m or more and less than 3 m, 3 m or more and less than 20 m, and 20 m or more are associated with spatial resolutions of 60 degrees, 90 degrees, and 180 degrees.
- the spatial resolution values shown in FIG. 9 are twice the spatial resolution values shown in FIG. 7 for each value of distance D.
- the spatial resolution shown in FIG. 9 has, for each value of distance D, half the spatial resolution shown in FIG.
- the setting unit 15 changes the correspondence table used for stereophonic processing from the correspondence table shown in FIG. 7 to the correspondence table shown in FIG. fix it. Accordingly, the setting unit 15 can set the spatial resolution to be lower when the type information indicates that the sound indicated by the sound signal is not human speech.
- FIG. 10 is a flowchart showing processing of the information processing device 10 according to the present embodiment.
- step S101 the decoding unit 11 acquires a stream.
- the stream includes information indicating the position and orientation of the sound source 5 (corresponding to first position and orientation information) and a sound signal indicating the sound output by the sound source 5 .
- step S102 the acquisition unit 12 acquires information indicating the position and orientation of the user's U head (corresponding to second position and orientation information).
- step S103 the setting unit 15 uses the first position/posture information and the second position/posture information to apply stereophonic processing to the sound signal according to the positional relationship between the head of the user U and the sound source 5. Sets the spatial resolution at .
- step S104 the processing unit 14 performs stereophonic processing with the spatial resolution set in step S103, thereby generating and outputting a sound signal to be output by the speaker. It is assumed that the output sound signal is transmitted to a speaker, output as sound, and listened to by the user U.
- the information processing device 10 can suppress delays that may occur in the output sound.
- information processing apparatus 10 sets the spatial resolution in stereophonic processing according to the positional relationship between the user's head and the sound source. can be adjusted. Therefore, when the scale of computation required for stereophonic sound processing is relatively large, reducing the spatial resolution reduces the scale of computation and shortens the time required for stereophonic processing, resulting in a delay that can occur in the output sound. can be suppressed. Thus, according to the above information processing method, it is possible to suppress the delay that may occur in the output sound.
- the information processing apparatus 10 sets the spatial resolution in the stereophonic sound processing to be lower as the distance between the user's head and the sound source increases, thereby reducing the scale of calculation required for the stereophonic sound processing. It is possible to suppress the delay that may occur in the output sound. Thus, according to the information processing method, it is possible to more easily suppress delays that may occur in the output sound.
- the information processing apparatus 10 sets a high spatial resolution in the stereophonic processing of human speech, so that the user can hear human speech with higher quality than non-human speech. Since relatively high accuracy is sometimes required for the sound image position of human speech compared to sound that is not human speech, this can contribute to improving the accuracy of the sound image position of human speech. As described above, according to the information processing method, it is possible to suppress the delay that may occur in the output sound while improving the quality of the human voice included in the output sound.
- the information processing apparatus 10 reduces the scale of computation required for stereophonic processing of non-human sounds by setting the spatial resolution in the stereophonic processing of non-human sounds to a low level. , the delay that may occur in the output sound can be suppressed. Compared to human speech, the sound image position of non-human sounds may not require a high degree of accuracy. It can contribute to the suppression of possible delays. Thus, according to the information processing method, it is possible to more easily suppress delays that may occur in the output sound.
- the information processing apparatus 10 sets the spatial resolution to be lower as the number of sound sources included in the stream increases, thereby reducing the scale of computation required for stereophonic processing. Delay can be suppressed. Thus, according to the information processing method, it is possible to more easily suppress delays that may occur in the output sound.
- the information processing apparatus 10 sets the time response length in the stereophonic processing according to the positional relationship between the user's head and the sound source, the user can appropriately perceive the distance from the user to the sound source. can. As described above, according to the information processing method, it is possible to suppress the delay that may occur in the output sound while allowing the user to appropriately perceive the distance from the user to the sound source.
- the information processing apparatus 10 sets the time response length in stereophonic processing to be larger as the distance between the user's head and the sound source increases, thereby allowing the user to appropriately perceive the distance from the user to the sound source. can be done.
- the information processing apparatus 10 outputs sound based on the output signal generated by the stereophonic processing using the set spatial resolution, and causes the user to listen to the output sound with reduced delay. be able to. As described above, according to the information processing method, it is possible to suppress the delay that may occur in the output sound, and allow the user to listen to the output sound with the suppressed delay.
- the information processing device 10 also sets the spatial resolution in rendering processing as stereophonic processing. Therefore, according to the information processing method, it is possible to suppress the delay that may occur in the output sound.
- each component may be configured by dedicated hardware or implemented by executing a software program suitable for each component.
- Each component may be realized by reading and executing a software program recorded in a recording medium such as a hard disk or a semiconductor memory by a program execution unit such as a CPU or processor.
- the software that implements the information processing apparatus and the like of the above embodiment is the following program.
- this program acquires a stream including first position/orientation information indicating the position and orientation of a sound source and sound signals indicating the sound output by the sound source to a computer, and determines the position and orientation of the user's head.
- second position/posture information is acquired, and the first position/posture information and the second position/posture information are used to apply the sound signal to the sound signal according to the positional relationship between the user's head and the sound source.
- This is a program for executing an information processing method for setting spatial resolution in stereophonic sound processing.
- the present invention can be used for information processing devices that perform stereophonic sound processing.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Stereophonic System (AREA)
Abstract
Description
本発明者は、「背景技術」の欄において記載した立体音響処理に関し、以下の問題が生じることを見出した。
本実施の形態において、出力音に生じ得る遅延を抑制する情報処理方法および情報処理装置などについて説明する。
10 情報処理装置
11 復号部
12 取得部
13 調整部
14 処理部
15 設定部
30、31、32、33、40、41、42、43 角度範囲
51、52、53、54、55、56 波形
S 空間
U ユーザ
Claims (11)
- 音源の位置および姿勢を示す第一位置姿勢情報と、前記音源が出力する音を示す音信号とを含むストリームを取得し、
ユーザの頭部の位置および姿勢を示す第二位置姿勢情報を取得し、
前記第一位置姿勢情報と前記第二位置姿勢情報とを用いて、前記ユーザの頭部と前記音源との位置関係に応じて、前記音信号に施される立体音響処理における空間分解能を設定する
情報処理方法。 - 前記空間分解能の設定では、
前記ユーザの頭部と前記音源との距離が大きいほど、前記空間分解能をより低く設定する
請求項1に記載の情報処理方法。 - 前記ストリームは、前記音信号が示す前記音が人間の音声であるか否かを示す種別情報をさらに含み、
前記空間分解能の設定では、
前記音信号が示す前記音が人間の音声であることが前記種別情報に示されている場合に、前記空間分解能をより高く設定する
請求項1または2に記載の情報処理方法。 - 前記ストリームは、前記音信号が示す前記音が人間の音声であるか否かを示す種別情報をさらに含み、
前記空間分解能の設定では、
前記音信号が示す前記音が人間の音声でないことが前記種別情報に示されている場合に、前記空間分解能をより低く設定する
請求項1~3のいずれか1項に記載の情報処理方法。 - 前記ストリームは、一以上の前記音源についての前記第一位置姿勢情報と前記音信号とを含み、
前記空間分解能の設定では、
一以上の前記音源の個数が多いほど、前記空間分解能をより低く設定する
請求項1~4のいずれか1項に記載の情報処理方法。 - さらに、前記位置関係に応じて、前記立体音響処理における時間応答長を設定する
請求項1~5のいずれか1項に記載の情報処理方法。 - 前記時間応答長の設定では、
前記ユーザの頭部と前記音源との距離が大きいほど、時間応答長をより大きく設定する
請求項6に記載の情報処理方法。 - さらに、
設定した前記空間分解能を用いて前記音信号に対して前記立体音響処理を施すことで、スピーカが出力する音を示す出力用信号を生成し、
生成した前記出力用信号を前記スピーカに提供することで、前記出力用信号が示す音を前記スピーカに出力させる
請求項1~7のいずれか1項に記載の情報処理方法。 - 前記立体音響処理は、
前記第一位置姿勢情報と前記第二位置姿勢情報とを用いて、前記ユーザの頭部と前記音源との位置関係に応じて、前記音源が配置された空間内で前記ユーザが聴取すべき音を生成する処理であるレンダリング処理を含み、
前記空間分解能は、前記レンダリング処理における空間分解能である
請求項1~8のいずれか1項に記載の情報処理方法。 - 音源の位置および姿勢を示す第一位置姿勢情報と、前記音源が出力する音を示す音信号とを含むストリームを取得する復号部と、
ユーザの頭部の位置および姿勢を示す第二位置姿勢情報を取得する取得部と、
前記第一位置姿勢情報と前記第二位置姿勢情報とを用いて、前記ユーザの頭部と前記音源との位置関係に応じて、前記音信号に施される立体音響処理における空間分解能を設定する設定部とを備える
情報処理装置。 - 請求項1~9のいずれか1項に記載の情報処理方法をコンピュータに実行させるプログラム。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023506833A JPWO2022196135A1 (ja) | 2021-03-16 | 2022-01-31 | |
KR1020237030572A KR20230157331A (ko) | 2021-03-16 | 2022-01-31 | 정보 처리 방법, 정보 처리 장치, 및, 프로그램 |
EP22770897.1A EP4311272A1 (en) | 2021-03-16 | 2022-01-31 | Information processing method, information processing device, and program |
CN202280020492.3A CN116965064A (zh) | 2021-03-16 | 2022-01-31 | 信息处理方法、信息处理装置、以及程序 |
US18/243,199 US20230421988A1 (en) | 2021-03-16 | 2023-09-07 | Information processing method, information processing device, and recording medium |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163161499P | 2021-03-16 | 2021-03-16 | |
US63/161,499 | 2021-03-16 | ||
JP2021-194053 | 2021-11-30 | ||
JP2021194053 | 2021-11-30 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/243,199 Continuation US20230421988A1 (en) | 2021-03-16 | 2023-09-07 | Information processing method, information processing device, and recording medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022196135A1 true WO2022196135A1 (ja) | 2022-09-22 |
Family
ID=83320333
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/003588 WO2022196135A1 (ja) | 2021-03-16 | 2022-01-31 | 情報処理方法、情報処理装置、および、プログラム |
Country Status (5)
Country | Link |
---|---|
US (1) | US20230421988A1 (ja) |
EP (1) | EP4311272A1 (ja) |
JP (1) | JPWO2022196135A1 (ja) |
KR (1) | KR20230157331A (ja) |
WO (1) | WO2022196135A1 (ja) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050271212A1 (en) * | 2002-07-02 | 2005-12-08 | Thales | Sound source spatialization system |
JP2017175356A (ja) * | 2016-03-23 | 2017-09-28 | ヤマハ株式会社 | 音響処理装置およびプログラム |
KR20190060464A (ko) * | 2017-11-24 | 2019-06-03 | 주식회사 윌러스표준기술연구소 | 오디오 신호 처리 방법 및 장치 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110313187B (zh) | 2017-06-15 | 2022-06-07 | 杜比国际公司 | 处理媒体内容以供第一装置再现的方法、系统和装置 |
-
2022
- 2022-01-31 WO PCT/JP2022/003588 patent/WO2022196135A1/ja active Application Filing
- 2022-01-31 KR KR1020237030572A patent/KR20230157331A/ko unknown
- 2022-01-31 EP EP22770897.1A patent/EP4311272A1/en active Pending
- 2022-01-31 JP JP2023506833A patent/JPWO2022196135A1/ja active Pending
-
2023
- 2023-09-07 US US18/243,199 patent/US20230421988A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050271212A1 (en) * | 2002-07-02 | 2005-12-08 | Thales | Sound source spatialization system |
JP2017175356A (ja) * | 2016-03-23 | 2017-09-28 | ヤマハ株式会社 | 音響処理装置およびプログラム |
KR20190060464A (ko) * | 2017-11-24 | 2019-06-03 | 주식회사 윌러스표준기술연구소 | 오디오 신호 처리 방법 및 장치 |
Also Published As
Publication number | Publication date |
---|---|
KR20230157331A (ko) | 2023-11-16 |
EP4311272A1 (en) | 2024-01-24 |
US20230421988A1 (en) | 2023-12-28 |
JPWO2022196135A1 (ja) | 2022-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10812925B2 (en) | Audio processing device and method therefor | |
US10972856B2 (en) | Audio processing method and audio processing apparatus | |
EP4214535A2 (en) | Methods and systems for determining position and orientation of a device using acoustic beacons | |
WO2022196135A1 (ja) | 情報処理方法、情報処理装置、および、プログラム | |
WO2022219881A1 (ja) | 情報処理方法、情報処理装置、および、プログラム | |
CN116965064A (zh) | 信息处理方法、信息处理装置、以及程序 | |
JP6303519B2 (ja) | 音響再生装置および音場補正プログラム | |
JP2011188444A (ja) | ヘッドトラッキング装置および制御プログラム | |
CN117121511A (zh) | 信息处理方法、信息处理装置、以及程序 | |
JP2006086756A (ja) | 両耳インパルス応答推定装置、両耳インパルス応答推定方法、移動音生成装置、移動音生成方法 | |
JP2019068123A (ja) | 音声処理用コンピュータプログラム、音声処理装置及び音声処理方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22770897 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023506833 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280020492.3 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022770897 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022770897 Country of ref document: EP Effective date: 20231016 |