WO2022196135A1 - 情報処理方法、情報処理装置、および、プログラム - Google Patents

情報処理方法、情報処理装置、および、プログラム Download PDF

Info

Publication number
WO2022196135A1
WO2022196135A1 PCT/JP2022/003588 JP2022003588W WO2022196135A1 WO 2022196135 A1 WO2022196135 A1 WO 2022196135A1 JP 2022003588 W JP2022003588 W JP 2022003588W WO 2022196135 A1 WO2022196135 A1 WO 2022196135A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
spatial resolution
user
information processing
sound source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2022/003588
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
耕 水野
智一 石川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Intellectual Property Corp of America
Original Assignee
Panasonic Intellectual Property Corp of America
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panasonic Intellectual Property Corp of America filed Critical Panasonic Intellectual Property Corp of America
Priority to EP22770897.1A priority Critical patent/EP4311272A4/en
Priority to KR1020237030572A priority patent/KR20230157331A/ko
Priority to JP2023506833A priority patent/JPWO2022196135A1/ja
Priority to CN202280020492.3A priority patent/CN116965064A/zh
Publication of WO2022196135A1 publication Critical patent/WO2022196135A1/ja
Priority to US18/243,199 priority patent/US12581265B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • H04S7/304For headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/13Aspects of volume control, not necessarily automatic, in stereophonic sound systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/305Electronic adaptation of stereophonic audio signals to reverberation of the listening space
    • H04S7/306For headphones

Definitions

  • the present invention relates to an information processing method, an information processing device, and a program.
  • stereophonic processing requires a relatively large scale of computation, and there is a problem that the output sound may be delayed depending on the time required for the computation.
  • the present invention provides a device such as an information processing method that suppresses delay that may occur in output sound.
  • An information processing method acquires a stream including first position and orientation information indicating the position and orientation of a sound source and a sound signal indicating the sound output by the sound source, and second position/posture information indicating a posture, and using the first position/posture information and the second position/posture information, the sound signal is generated according to the positional relationship between the user's head and the sound source.
  • This is an information processing method for setting the spatial resolution in the stereophonic processing applied to the.
  • the information processing method of the present invention can suppress delays that may occur in output sounds.
  • FIG. 1 is an explanatory diagram showing an example of the positional relationship between the user and the sound source in the embodiment.
  • FIG. 2 is a block diagram showing the functional configuration of the information processing device according to the embodiment.
  • FIG. 3 is a first explanatory diagram of spatial resolution of stereophonic processing in the embodiment.
  • FIG. 4 is a second explanatory diagram of the spatial resolution of the stereophonic sound processing in the embodiment.
  • FIG. 5 is a third explanatory diagram of the spatial resolution of the stereophonic sound processing in the embodiment.
  • FIG. 6 is an explanatory diagram of the response time length of the stereophonic sound processing in the embodiment.
  • FIG. 7 is an explanatory diagram showing a first example of parameters for stereophonic sound processing according to the embodiment.
  • FIG. 8 is an explanatory diagram showing a second example of parameters for stereophonic sound processing according to the embodiment.
  • FIG. 9 is an explanatory diagram showing a third example of parameters for stereophonic sound processing according to the embodiment.
  • FIG. 10 is a flow diagram showing processing of the information processing apparatus according to the embodiment.
  • the above stereophonic sound processing technology is effective only when the change in the user's posture is relatively small or regular. If the above case is deviated from, the predicted posture information does not match the actual posture information of the user, so the position of the sound image for the user may not be appropriate or the position of the sound image may change rapidly.
  • Patent Document 1 may not solve the problem that the output sound may be delayed due to the time required for computation of stereophonic processing.
  • an information processing method provides a stream including first position/orientation information indicating the position and orientation of a sound source, and a sound signal indicating the sound output by the sound source. is obtained, second position and orientation information indicating the position and orientation of the user's head is obtained, and the user's head and the sound source are obtained using the first and second position and orientation information.
  • This is an information processing method for setting spatial resolution in stereophonic processing applied to the sound signal according to the positional relationship between the two.
  • the spatial resolution in the stereophonic processing is set according to the positional relationship between the user's head and the sound source, it is possible to adjust the scale of calculation required for the stereophonic processing. Therefore, when the scale of computation required for stereophonic sound processing is relatively large, reducing the spatial resolution reduces the scale of computation and shortens the time required for stereophonic processing, resulting in a delay that can occur in the output sound. can be suppressed. Thus, according to the above information processing method, it is possible to suppress the delay that may occur in the output sound.
  • the larger the distance between the user's head and the sound source, the lower the spatial resolution may be set.
  • the spatial resolution in stereophonic processing is set lower as the distance between the user's head and the sound source increases, thereby reducing the scale of computation required for stereophonic processing. possible delays can be suppressed.
  • the information processing method it is possible to more easily suppress delays that may occur in the output sound.
  • the stream further includes type information indicating whether the sound indicated by the sound signal is human speech, and in the setting of the spatial resolution, the sound indicated by the sound signal is human speech. is indicated in the type information, the spatial resolution may be set higher.
  • the stream further includes type information indicating whether or not the sound indicated by the sound signal is human speech, and the spatial resolution setting indicates that the sound indicated by the sound signal is not human speech. is indicated in the type information, the spatial resolution may be set lower.
  • the scale of computation required for stereophonic processing for sounds that are not human speech is reduced. Delays that may occur in sound can be suppressed. Compared to human speech, the sound image position of non-human sounds may not require a high degree of accuracy. It can contribute to the suppression of possible delays. Thus, according to the information processing method, it is possible to more easily suppress delays that may occur in the output sound.
  • the stream includes the first position and orientation information and the sound signal for the one or more sound sources, and in setting the spatial resolution, the greater the number of the one or more sound sources, the higher the spatial resolution. You can set it lower.
  • the spatial resolution is set lower as the number of sound sources included in the stream increases, thereby reducing the scale of computation required for stereophonic sound processing and, as a result, reducing the delay that may occur in the output sound. can be suppressed.
  • the information processing method it is possible to more easily suppress delays that may occur in the output sound.
  • the time response length in the stereophonic processing may be further set according to the positional relationship.
  • the time response length in the stereophonic processing is set according to the positional relationship between the user's head and the sound source, the user can appropriately perceive the distance from the user to the sound source.
  • the information processing method it is possible to suppress the delay that may occur in the output sound while allowing the user to appropriately perceive the distance from the user to the sound source.
  • the greater the distance between the user's head and the sound source the greater the time response length may be set.
  • the user can appropriately perceive the distance from the user to the sound source.
  • the above information processing method it is possible to suppress the delay that may occur in the output sound while allowing the user to appropriately perceive the distance from the user to the sound source.
  • an output signal indicating a sound output by a speaker is generated, and the generated output signal is transmitted to the speaker. may cause the speaker to output the sound indicated by the output signal.
  • the user can listen to the output sound with suppressed delay by outputting the sound based on the output signal generated by stereophonic processing using the set spatial resolution and allowing the user to listen to it. can.
  • the information processing method it is possible to suppress the delay that may occur in the output sound, and allow the user to listen to the output sound with the suppressed delay.
  • the stereophonic processing uses the first position and orientation information and the second position and orientation information to perform stereophonic processing in a space in which the sound source is arranged according to the positional relationship between the user's head and the sound source.
  • the spatial resolution may be the spatial resolution in the rendering process.
  • the spatial resolution is set in rendering processing as stereophonic processing. Therefore, according to the information processing method, it is possible to suppress the delay that may occur in the output sound.
  • an information processing apparatus includes a decoding unit that acquires a stream including first position/orientation information indicating the position and orientation of a sound source and a sound signal that indicates the sound output by the sound source; an acquisition unit for acquiring second position/posture information indicating the position and posture of the head of the user; and a setting unit that sets spatial resolution in stereophonic processing to be applied to the sound signal according to the relationship.
  • a program according to one aspect of the present invention is a program that causes a computer to execute the above information processing method.
  • these general or specific aspects may be realized by a system, device, integrated circuit, computer program, or a recording medium such as a computer-readable CD-ROM. Or it may be realized by any combination of recording media.
  • FIG. 1 is an explanatory diagram showing an example of the positional relationship between the user U and the sound source 5 in this embodiment.
  • FIG. 1 shows a user U existing in a space S and a sound source 5 recognized by the user U.
  • the space S is represented as a plane containing the x-axis and the y-axis, but also has an extension in the z-axis direction. The same applies hereafter.
  • Walls or objects may be placed in the space S. Walls also include ceilings or floors.
  • the information processing device 10 generates a sound signal for the user U to listen to by performing stereophonic processing, which is digital sound processing, based on a stream including the sound signal output by the sound source 5 .
  • the stream further includes position and orientation information indicating the position and orientation of the sound source 5 in the space S.
  • the sound signal generated by the information processing device 10 is output as a sound by a speaker, and the user U listens to the sound.
  • the speaker is assumed to be a speaker included in earphones or headphones worn by the user U, but is not limited to this.
  • the sound source 5 is a virtual sound source (generally referred to as a sound image) that is recognized as a sound source by the user U listening to the sound signal generated based on the stream. is not a source of Although a human is shown as the sound source 5 in FIG. 1, the sound source 5 is not limited to a human and may be any sound source.
  • the user U listens to the sound based on the sound signal generated by the information processing device 10 and output from the speaker.
  • the sound output from the speaker based on the sound signal generated by the information processing device 10 is heard by the left and right ears of the user U, respectively.
  • An appropriate time difference or phase difference (also referred to as a time difference or the like) is provided by the information processing device 10 to the sounds heard by the left and right ears of the user U, respectively.
  • the user U perceives the direction of the sound source 5 for the user U based on the time difference between the sounds heard by the left and right ears.
  • the sounds heard by the left and right ears of the user U include sounds corresponding to sounds that directly arrive from the sound source 5 (referred to as direct sounds), and sounds that are output from the sound source 5 and arrive after being reflected by the wall surface.
  • a sound corresponding to (described as reflected sound) is included by the information processing device 10 .
  • the user U perceives the distance from the user U to the sound source 5 based on the time interval between the direct sound and the reflected sound included in the heard sound.
  • the timing of arrival of the direct sound and the reflected sound at the user U, and the amplitude and phase of the direct sound and the reflected sound are determined based on the sound signal included in the stream.
  • a sound signal (described as an output signal) indicating the sound to be output from the speaker is generated.
  • Stereophonic processing can involve relatively large scale computations.
  • the information processing apparatus 10 When the number of sound signals included in the stream is relatively large, or when the spatial resolution of the stereophonic sound processing is relatively high, the information processing apparatus 10 requires a relatively long time for arithmetic processing, and the output signal is Generation and output can be delayed.
  • One of the measures to suppress the delay that may occur in the output signal is to lower the spatial resolution of the stereophonic processing. can decline. In this way, there is a trade-off relationship between the high quality of sound heard by the user U and the amount of arithmetic processing included in stereophonic processing.
  • the information processing device 10 uses the distance between the user U and the sound source 5 to adjust the parameters of the stereophonic processing, thereby contributing to the reduction of the processing load of the stereophonic processing. For example, the information processing apparatus 10 reduces the processing load of the stereophonic sound processing by lowering the spatial resolution, which is a parameter of the stereophonic sound processing.
  • FIG. 2 is a block diagram showing the functional configuration of the information processing device 10 according to this embodiment.
  • the information processing device 10 includes a decoding unit 11, an acquisition unit 12, an adjustment unit 13, a processing unit 14, and a setting unit 15 as functional units.
  • the functional units included in the information processing apparatus 10 are implemented by a processor (such as a CPU (Central Processing Unit)) (not shown) included in the information processing apparatus 10 executing a predetermined program using a memory (not shown). obtain.
  • a processor such as a CPU (Central Processing Unit)
  • the decoding unit 11 is a functional unit that decodes the stream.
  • the stream specifically includes position and orientation information (corresponding to first position and orientation information) indicating the position and orientation of the sound source 5 in the space S, and a sound signal indicating the sound output by the sound source 5 .
  • the stream may include type information indicating whether or not the sound output by the sound source 5 is human speech.
  • voice means human voice.
  • the decoding unit 11 provides the sound signal obtained by decoding the stream to the processing unit 14, and provides the position and orientation information obtained by decoding the stream to the adjusting unit 13.
  • the stream may be obtained by the information processing device 10 from an external device, or may be stored in advance in a storage device of the information processing device 10 .
  • a stream is a stream encoded in a predetermined format. be.
  • the position and orientation information indicating the position and orientation of the sound source 5 is the coordinates (x, y and z) of the sound source 5 in the directions of the three axes and the angles around the three axes (yaw angle, pitch angle and roll angle ) and 6 degrees of freedom.
  • the position and orientation information of the sound source 5 can specify the position and orientation of the sound source 5 .
  • the coordinates are coordinates in an appropriately set coordinate system.
  • the posture is an angle around three axes indicating a predetermined direction (referred to as a reference direction) for the sound source 5 .
  • the reference direction may be the direction in which the sound source 5 outputs sound, or any other direction that is uniquely determined for the sound source 5 .
  • a stream may include, for each of one or more sound sources 5, position and orientation information indicating the position and orientation of the sound source 5, and a sound signal indicating the sound output by the sound source 5.
  • the acquisition unit 12 is a functional unit that acquires the position and posture of the user's U head in the space S.
  • the acquisition unit 12 acquires position and orientation information (second position and orientation information) including information indicating the position of the head of the user U (described as position information) and information indicating the orientation (described as orientation information) from a sensor or the like.
  • position information information indicating the position of the head of the user U
  • orientation information information indicating the orientation information from a sensor or the like.
  • the position and orientation information of the head of the user U includes the coordinates (x, y and z) of the head of the user U in the three-axis directions and the angles around the three axes (yaw angle, pitch angle and roll angle). It is information of 6 degrees of freedom, including angle).
  • the position and orientation of the user U's head can be identified by the position and orientation information of the user's U head.
  • the coordinates are coordinates in a coordinate system common to the coordinate system defined for the sound source 5 .
  • a position can be defined as a position having a predetermined positional relationship from a predetermined position (eg, origin) in a coordinate system.
  • the posture is an angle around three axes indicating the direction in which the user U's head is facing.
  • the sensors may be, for example, inertial measurement units (IMUs), accelerometers, gyroscopes, magnetic sensors, or combinations thereof.
  • IMUs inertial measurement units
  • accelerometers accelerometers
  • gyroscopes magnetic sensors
  • the sensor or the like is assumed to be worn on the head of the user U, and may be fixed to an earphone or headphone worn by the user U.
  • the adjustment unit 13 is a functional unit that adjusts the position and orientation information of the user U in the space S using the parameters in the stereophonic processing performed by the processing unit 14 .
  • the adjustment unit 13 acquires the spatial resolution, which is a parameter in stereophonic processing, from the setting unit 15 . Then, the adjustment unit 13 adjusts the position information of the head of the user U acquired by the acquisition unit 12 by changing it to any value that is an integral multiple of the spatial resolution. When changing, the adjustment unit 13 may adopt the value closest to the position information of the user U's head acquired by the acquisition unit 12 from among a plurality of values that are integral multiples of the spatial resolution. The adjustment unit 13 provides the adjusted position information of the user U's head and the adjusted posture information of the user's U head to the processing unit 14 .
  • the processing unit 14 is a functional unit that performs stereophonic processing, which is digital acoustic processing, on the sound signal acquired by the decoding unit 11 .
  • the processing unit 14 has a plurality of filters used for stereophonic processing. Filters are used, for example, in calculations that adjust the amplitude and phase of sound signals for each frequency.
  • the processing unit 14 acquires parameters (that is, spatial resolution and time response length) used for stereophonic processing from the adjusting unit 13, and performs stereophonic processing using the acquired parameters.
  • the processing unit 14 calculates the propagation paths of the direct sound and the reflected sound arriving at the user U from the sound source 5, and also calculates the timing at which the direct sound and the reflected sound reach the user. Also, for each range of angular directions around the head of the user U, applying a filter according to the range to the signal indicating the sound (direct sound and reflected sound) arriving at the user U from that range. , the amplitude and phase of the sound arriving at the user U are calculated.
  • the setting unit 15 is a functional unit that sets parameters for stereophonic processing executed by the processing unit 14 .
  • the parameters of stereophonic processing may include spatial resolution and temporal response length in stereophonic processing.
  • the setting unit 15 uses the position and orientation information of the sound source 5 in the space S and the position and orientation information of the user U acquired by the acquisition unit 12 to generate a stereoscopic image according to the positional relationship between the head of the user U and the sound source 5.
  • Sets the spatial resolution which is a parameter of acoustic processing.
  • the setting unit 15 may further set a time response length, which is a parameter of stereophonic processing, according to the positional relationship.
  • the setting unit 15 provides the set parameters to the adjusting unit 13 .
  • the distance D between the user U and the sound source 5 can be used for parameter setting.
  • Distance D is a vector indicating the position and orientation of the sound source 5
  • the setting unit 15 may set the spatial resolution lower as the distance D between the head of the user U and the sound source 5 in the space S increases.
  • the setting unit 15 may set the time response length larger as the distance D between the head of the user U and the sound source 5 in the space S increases.
  • the spatial resolution of the stereophonic processing is the resolution of the angular range centered on the user U.
  • the processing unit 14 applies a filter to sound signals arriving at the user U from each relatively narrow angular range (for example, angular range 30).
  • the processing unit 14 applies a filter to sound signals arriving at the user U from each relatively wide angle range (for example, the angle range 40).
  • a high spatial resolution corresponds to a narrow angular range
  • a low spatial resolution corresponds to a wide angular range.
  • the angular range corresponds to the units to which the same filter is applied.
  • the processing unit 14 applies a filter corresponding to each angular range 31, 32, 33, . . . By doing so, sound signals representing sounds arriving at the user U from each of the angular ranges 31, 32, 33, . . . are calculated (see FIG. 4). The sound arriving at the user U from each of the angular ranges 31, 32, 33, .
  • the processing unit 14 applies a filter corresponding to each angular range 41, 42, 43, . . . Sound signals representing sounds arriving at the user U from each of the angular ranges 41, 42, 43, . . . are calculated (see FIG. 5). The sound arriving at the user U from each of the angular ranges 41, 42, 43, .
  • FIG. 6 is an explanatory diagram of the response time length of stereophonic sound processing in this embodiment.
  • FIG. 6 shows sound signals generated by stereophonic processing.
  • the sound signal includes a waveform 51 corresponding to direct sound arriving at the user U from the sound source 5 and waveforms 52 , 53 , 54 , 55 and 56 corresponding to reflected sounds arriving at the user U from the sound source 5 .
  • Each of the waveforms 52, 53, 54, 55 and 56 corresponding to the reflected sound is delayed from the direct sound by the delay time determined by the positional relationship between the sound source 5, the user U and the wall surface in the space S, Amplitude is reduced due to reflection from the wall surface.
  • the delay time is determined within a range of approximately 10 msec to 100 msec.
  • the time response length is an index that indicates the magnitude of the delay time. The longer the time response length, the longer the delay time, and the shorter the time response length, the shorter the delay time.
  • the time response length is only an index of the magnitude of the delay time, and does not indicate the delay time itself of the waveform corresponding to the reflected sound.
  • the time width from waveform 51 to waveform 55 and the time response length are substantially equal, but this is not limiting, and there are cases where the time width from waveform 51 to waveform 54 and the time response length are substantially equal.
  • the time width from waveform 51 to waveform 56 may be approximately equal to the time response length.
  • FIG. 7 is an explanatory diagram showing a first example of parameters for stereophonic processing in this embodiment.
  • FIG. 7 shows a correspondence table in which spatial resolution and time response length, which are parameters of stereophonic processing, are associated with each of a plurality of ranges of distance D between user U and sound source 5 .
  • a distance D of less than 1 m is associated with a spatial resolution of 10 degrees and a time response length of 10 msec.
  • the distance D of 1 m or more and less than 3 m, 3 m or more and less than 20 m, and 20 m or more has a spatial resolution of 30 degrees, 45 degrees, and 90 degrees, and a time response length of 50 msec, 200 msec, and 1 sec. is associated with
  • the setting unit 15 has a correspondence table between the distance D and the spatial resolution shown in FIG. 7, and provides the adjustment unit 13 with the correspondence table.
  • the adjustment unit 13 refers to the provided correspondence table and acquires the spatial resolution and the time response length associated with the distance D between the head of the user U and the sound source 5 acquired from the acquisition unit 12 .
  • the setting unit 15 sets the spatial resolution lower as the distance D between the head of the user U and the sound source 5 in the space S increases. In other words, the setting unit 15 sets a value indicating a lower spatial resolution. . In addition, the setting unit 15 sets the time response length larger as the distance D between the head of the user U and the sound source 5 in the space S increases, in other words, sets a value indicating a longer time response length.
  • the setting unit 15 may change the spatial resolution according to whether or not the sound indicated by the sound signal is human speech.
  • the information processing apparatus 10 changes the spatial resolution according to whether or not the sound indicated by the sound signal is human speech, thereby contributing to more accurate stereophonic processing of human speech.
  • the setting unit 15 may set a higher spatial resolution when the type information indicates that the sound indicated by the sound signal is human speech. In other words, a value indicating higher spatial resolution may be set.
  • the setting unit 15 may correct the value to indicate a higher spatial resolution than the already set spatial resolution.
  • the setting unit 15 may set the spatial resolution lower when the type information indicates that the sound indicated by the sound signal is not human speech. A value indicating low spatial resolution may be set.
  • the setting unit 15 may correct the value to indicate a lower spatial resolution than the already set spatial resolution.
  • the setting unit 15 may change the spatial resolution according to the number of sound sources included in the stream.
  • the setting unit 15 may set the spatial resolution lower as the number of sound sources included in the stream increases. You may When setting the spatial resolution, if the spatial resolution has already been set, the setting unit 15 may correct the value to indicate a lower spatial resolution than the already set spatial resolution.
  • FIG. 8 is an explanatory diagram showing a second example of parameters for stereophonic processing in this embodiment.
  • FIG. 8 shows a correspondence table in which spatial resolution is associated with each of a plurality of ranges of the distance D between the user U and the sound source 5, and is an example of the parameters corrected by the setting unit 15 from the parameters shown in FIG. is.
  • a distance D of less than 1 m is associated with a spatial resolution of 5 degrees.
  • distances D of 1 m or more and less than 3 m, 3 m or more and less than 20 m, and 20 m or more are associated with spatial resolutions of 15 degrees, 22.5 degrees, and 45 degrees, respectively.
  • the spatial resolution values shown in FIG. 8 are 1 ⁇ 2 times the spatial resolution values shown in FIG. 7 for each value of distance D.
  • FIG. 8 has twice the spatial resolution shown in FIG. 7 for each value of distance D.
  • the setting unit 15 converts the correspondence table used for stereophonic processing from the correspondence table shown in FIG. 7 to the correspondence table shown in FIG. to be corrected. This allows the setting unit 15 to set a higher spatial resolution when the type information indicates that the sound indicated by the sound signal is human speech.
  • FIG. 9 is an explanatory diagram showing a third example of parameters for stereophonic processing in this embodiment.
  • FIG. 9 shows a correspondence table in which the spatial resolution is associated with each of a plurality of ranges of the distance D between the user U and the sound source 5, corrected by the setting unit 15 from the parameters shown in FIG. be.
  • a distance D of less than 1 m is associated with a spatial resolution of 20 degrees.
  • distances D of 1 m or more and less than 3 m, 3 m or more and less than 20 m, and 20 m or more are associated with spatial resolutions of 60 degrees, 90 degrees, and 180 degrees.
  • the spatial resolution values shown in FIG. 9 are twice the spatial resolution values shown in FIG. 7 for each value of distance D.
  • the spatial resolution shown in FIG. 9 has, for each value of distance D, half the spatial resolution shown in FIG.
  • the setting unit 15 changes the correspondence table used for stereophonic processing from the correspondence table shown in FIG. 7 to the correspondence table shown in FIG. fix it. Accordingly, the setting unit 15 can set the spatial resolution to be lower when the type information indicates that the sound indicated by the sound signal is not human speech.
  • FIG. 10 is a flowchart showing processing of the information processing device 10 according to the present embodiment.
  • step S101 the decoding unit 11 acquires a stream.
  • the stream includes information indicating the position and orientation of the sound source 5 (corresponding to first position and orientation information) and a sound signal indicating the sound output by the sound source 5 .
  • step S102 the acquisition unit 12 acquires information indicating the position and orientation of the user's U head (corresponding to second position and orientation information).
  • step S103 the setting unit 15 uses the first position/posture information and the second position/posture information to apply stereophonic processing to the sound signal according to the positional relationship between the head of the user U and the sound source 5. Sets the spatial resolution at .
  • step S104 the processing unit 14 performs stereophonic processing with the spatial resolution set in step S103, thereby generating and outputting a sound signal to be output by the speaker. It is assumed that the output sound signal is transmitted to a speaker, output as sound, and listened to by the user U.
  • the information processing device 10 can suppress delays that may occur in the output sound.
  • information processing apparatus 10 sets the spatial resolution in stereophonic processing according to the positional relationship between the user's head and the sound source. can be adjusted. Therefore, when the scale of computation required for stereophonic sound processing is relatively large, reducing the spatial resolution reduces the scale of computation and shortens the time required for stereophonic processing, resulting in a delay that can occur in the output sound. can be suppressed. Thus, according to the above information processing method, it is possible to suppress the delay that may occur in the output sound.
  • the information processing apparatus 10 sets the spatial resolution in the stereophonic sound processing to be lower as the distance between the user's head and the sound source increases, thereby reducing the scale of calculation required for the stereophonic sound processing. It is possible to suppress the delay that may occur in the output sound. Thus, according to the information processing method, it is possible to more easily suppress delays that may occur in the output sound.
  • the information processing apparatus 10 sets a high spatial resolution in the stereophonic processing of human speech, so that the user can hear human speech with higher quality than non-human speech. Since relatively high accuracy is sometimes required for the sound image position of human speech compared to sound that is not human speech, this can contribute to improving the accuracy of the sound image position of human speech. As described above, according to the information processing method, it is possible to suppress the delay that may occur in the output sound while improving the quality of the human voice included in the output sound.
  • the information processing apparatus 10 reduces the scale of computation required for stereophonic processing of non-human sounds by setting the spatial resolution in the stereophonic processing of non-human sounds to a low level. , the delay that may occur in the output sound can be suppressed. Compared to human speech, the sound image position of non-human sounds may not require a high degree of accuracy. It can contribute to the suppression of possible delays. Thus, according to the information processing method, it is possible to more easily suppress delays that may occur in the output sound.
  • the information processing apparatus 10 sets the spatial resolution to be lower as the number of sound sources included in the stream increases, thereby reducing the scale of computation required for stereophonic processing. Delay can be suppressed. Thus, according to the information processing method, it is possible to more easily suppress delays that may occur in the output sound.
  • the information processing apparatus 10 sets the time response length in the stereophonic processing according to the positional relationship between the user's head and the sound source, the user can appropriately perceive the distance from the user to the sound source. can. As described above, according to the information processing method, it is possible to suppress the delay that may occur in the output sound while allowing the user to appropriately perceive the distance from the user to the sound source.
  • the information processing apparatus 10 sets the time response length in stereophonic processing to be larger as the distance between the user's head and the sound source increases, thereby allowing the user to appropriately perceive the distance from the user to the sound source. can be done.
  • the information processing apparatus 10 outputs sound based on the output signal generated by the stereophonic processing using the set spatial resolution, and causes the user to listen to the output sound with reduced delay. be able to. As described above, according to the information processing method, it is possible to suppress the delay that may occur in the output sound, and allow the user to listen to the output sound with the suppressed delay.
  • the information processing device 10 also sets the spatial resolution in rendering processing as stereophonic processing. Therefore, according to the information processing method, it is possible to suppress the delay that may occur in the output sound.
  • each component may be configured by dedicated hardware or implemented by executing a software program suitable for each component.
  • Each component may be realized by reading and executing a software program recorded in a recording medium such as a hard disk or a semiconductor memory by a program execution unit such as a CPU or processor.
  • the software that implements the information processing apparatus and the like of the above embodiment is the following program.
  • this program acquires a stream including first position/orientation information indicating the position and orientation of a sound source and sound signals indicating the sound output by the sound source to a computer, and determines the position and orientation of the user's head.
  • second position/posture information is acquired, and the first position/posture information and the second position/posture information are used to apply the sound signal to the sound signal according to the positional relationship between the user's head and the sound source.
  • This is a program for executing an information processing method for setting spatial resolution in stereophonic sound processing.
  • the present invention can be used for information processing devices that perform stereophonic sound processing.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)
PCT/JP2022/003588 2021-03-16 2022-01-31 情報処理方法、情報処理装置、および、プログラム Ceased WO2022196135A1 (ja)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP22770897.1A EP4311272A4 (en) 2021-03-16 2022-01-31 INFORMATION PROCESSING METHOD, INFORMATION PROCESSING DEVICE, AND PROGRAM
KR1020237030572A KR20230157331A (ko) 2021-03-16 2022-01-31 정보 처리 방법, 정보 처리 장치, 및, 프로그램
JP2023506833A JPWO2022196135A1 (https=) 2021-03-16 2022-01-31
CN202280020492.3A CN116965064A (zh) 2021-03-16 2022-01-31 信息处理方法、信息处理装置、以及程序
US18/243,199 US12581265B2 (en) 2021-03-16 2023-09-07 Information processing method, information processing device, and recording medium

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202163161499P 2021-03-16 2021-03-16
US63/161,499 2021-03-16
JP2021-194053 2021-11-30
JP2021194053 2021-11-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/243,199 Continuation US12581265B2 (en) 2021-03-16 2023-09-07 Information processing method, information processing device, and recording medium

Publications (1)

Publication Number Publication Date
WO2022196135A1 true WO2022196135A1 (ja) 2022-09-22

Family

ID=83320333

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/003588 Ceased WO2022196135A1 (ja) 2021-03-16 2022-01-31 情報処理方法、情報処理装置、および、プログラム

Country Status (5)

Country Link
US (1) US12581265B2 (https=)
EP (1) EP4311272A4 (https=)
JP (1) JPWO2022196135A1 (https=)
KR (1) KR20230157331A (https=)
WO (1) WO2022196135A1 (https=)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050271212A1 (en) * 2002-07-02 2005-12-08 Thales Sound source spatialization system
JP2017175356A (ja) * 2016-03-23 2017-09-28 ヤマハ株式会社 音響処理装置およびプログラム
KR20190060464A (ko) * 2017-11-24 2019-06-03 주식회사 윌러스표준기술연구소 오디오 신호 처리 방법 및 장치

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101901593B1 (ko) * 2012-03-28 2018-09-28 삼성전자주식회사 가상 입체 음향 생성 방법 및 장치
WO2017035281A2 (en) * 2015-08-25 2017-03-02 Dolby International Ab Audio encoding and decoding using presentation transform parameters
US10074012B2 (en) * 2016-06-17 2018-09-11 Dolby Laboratories Licensing Corporation Sound and video object tracking
CN110313187B (zh) 2017-06-15 2022-06-07 杜比国际公司 处理媒体内容以供第一装置再现的方法、系统和装置
WO2018232327A1 (en) 2017-06-15 2018-12-20 Dolby International Ab Methods, apparatus and systems for optimizing communication between sender(s) and receiver(s) in computer-mediated reality applications
KR101919508B1 (ko) * 2018-01-19 2018-11-16 주식회사 킨트 가상 공간에서의 사운드 신호 생성을 통한 입체음향 공급방법 및 장치

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050271212A1 (en) * 2002-07-02 2005-12-08 Thales Sound source spatialization system
JP2017175356A (ja) * 2016-03-23 2017-09-28 ヤマハ株式会社 音響処理装置およびプログラム
KR20190060464A (ko) * 2017-11-24 2019-06-03 주식회사 윌러스표준기술연구소 오디오 신호 처리 방법 및 장치

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4311272A4

Also Published As

Publication number Publication date
US12581265B2 (en) 2026-03-17
EP4311272A1 (en) 2024-01-24
EP4311272A4 (en) 2024-10-09
JPWO2022196135A1 (https=) 2022-09-22
KR20230157331A (ko) 2023-11-16
US20230421988A1 (en) 2023-12-28

Similar Documents

Publication Publication Date Title
US11778406B2 (en) Audio processing device and method therefor
CN106537941A (zh) 虚拟声音系统和方法
WO2022061342A2 (en) Methods and systems for determining position and orientation of a device using acoustic beacons
WO2022196135A1 (ja) 情報処理方法、情報処理装置、および、プログラム
JP7848188B2 (ja) 情報処理方法、情報処理装置、および、プログラム
CN116965064A (zh) 信息处理方法、信息处理装置、以及程序
JP6303519B2 (ja) 音響再生装置および音場補正プログラム
CN117121511A (zh) 信息处理方法、信息处理装置、以及程序
JP2011188444A (ja) ヘッドトラッキング装置および制御プログラム
JP2019068123A (ja) 音声処理用コンピュータプログラム、音声処理装置及び音声処理方法
CN109963232A (zh) 音频信号播放装置及对应的音频信号处理方法
CN118339856A (zh) 音响处理装置、音响处理方法以及程序

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22770897

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023506833

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 202280020492.3

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 202347061538

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 2022770897

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022770897

Country of ref document: EP

Effective date: 20231016