CN113453126B - Information processing method and information processing apparatus - Google Patents

Information processing method and information processing apparatus Download PDF

Info

Publication number
CN113453126B
CN113453126B CN202110290568.1A CN202110290568A CN113453126B CN 113453126 B CN113453126 B CN 113453126B CN 202110290568 A CN202110290568 A CN 202110290568A CN 113453126 B CN113453126 B CN 113453126B
Authority
CN
China
Prior art keywords
information
audio
space
video
localization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110290568.1A
Other languages
Chinese (zh)
Other versions
CN113453126A (en
Inventor
白木原太
森川直
纳户健太郎
三轮明宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yamaha Corp
Original Assignee
Yamaha Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yamaha Corp filed Critical Yamaha Corp
Publication of CN113453126A publication Critical patent/CN113453126A/en
Application granted granted Critical
Publication of CN113453126B publication Critical patent/CN113453126B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/02Spatial or constructional arrangements of loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2205/00Details of stereophonic arrangements covered by H04R5/00 but not provided for in any of its subgroups
    • H04R2205/024Positioning of loudspeaker enclosures for spatial sound reproduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention provides an information processing method and an information processing apparatus, which realize audio-video localization taking the shape of a physical space into consideration. In the information processing method, setting of 1 st spatial information and 2 nd spatial information is received, 1 st audio/video positioning information indicating a position for positioning an audio/video is received in 1 st coordinates in the 1 st spatial information, and the 1 st audio/video positioning information is converted into 2 nd audio/video positioning information corresponding to 2 nd coordinates in the 2 nd spatial information.

Description

Information processing method and information processing apparatus
Technical Field
One embodiment of the present invention relates to an information processing method and an information processing apparatus.
Background
The audio adjustment console of patent document 1 receives coordinates of audio/video anchor points in a rectangular parallelepiped space. The sound adjustment console of patent document 1 calculates the volume of sound output from a plurality of speakers arranged in space so that the audio and video are positioned at the received coordinates.
Patent document 1: japanese patent application laid-open No. 2018-74280
However, the physical space of the concert hall and the like is not limited to a rectangular parallelepiped shape. Therefore, even if the coordinates of the audio/video anchor point are received in the rectangular parallelepiped space, the device such as the audio/video adjustment console of patent document 1 may not be able to locate the audio/video at the position intended by the user because the coordinates of the physical space are not considered.
Disclosure of Invention
It is therefore an object of one embodiment of the present invention to provide an information processing method and an information processing apparatus that realize audio/video localization in consideration of the shape of a physical space.
In the information processing method according to one embodiment of the present invention, setting of 1 st spatial information and 2 nd spatial information is received, 1 st audio/video positioning information indicating a position for positioning an audio/video is received in 1 st coordinates in the 1 st spatial information, and the 1 st audio/video positioning information is converted into 2 nd audio/video positioning information corresponding to 2 nd coordinates in the 2 nd spatial information.
ADVANTAGEOUS EFFECTS OF INVENTION
According to an embodiment of the present invention, audio/video localization in which the shape of a physical space is considered can be realized.
Drawings
Fig. 1 is a block diagram showing the structure of an information processing apparatus 1.
Fig. 2 is a diagram showing an example of a setting screen for audio/video positioning to be displayed on the display 15.
Fig. 3 is a flowchart showing the operation of the processor 12.
Fig. 4 is a diagram illustrating the concept of coordinate transformation.
Fig. 5 is a diagram illustrating the concept of coordinate transformation.
Fig. 6 is a block diagram showing the structure of an information processing apparatus 1A according to modification 1.
Fig. 7 is a flowchart showing the operation of the information processing apparatus 1A.
Fig. 8 is a diagram showing a concept of a layer.
Fig. 9 is a diagram illustrating a modification of the coordinate conversion.
Fig. 10 is a diagram showing an example of a setting screen of audio/video localization displayed on the display 15 when editing audio/video localization position information of a plurality of audio sources 55A and 55B.
Fig. 11 (a) and 11 (B) are diagrams showing an example of a setting screen of audio/video localization displayed on the display 15 when audio/video localization position information of a plurality of audio sources 55A and 55B is edited.
Fig. 12 is a diagram showing an example of a setting screen of audio/video localization displayed on the display 15.
Fig. 13 (a) and 13 (B) are diagrams showing an example of a setting screen for audio/video positioning displayed on the display 15.
Fig. 14 (a) and 14 (B) are diagrams showing an example of a setting screen for audio/video positioning displayed on the display 15.
Fig. 15 (a) and 15 (B) are diagrams showing an example of a setting screen for audio/video positioning displayed on the display 15.
Fig. 16 is a flowchart showing the operation of the information processing apparatus 1 or the information processing apparatus 1A.
Detailed Description
Fig. 1 is a block diagram showing the structure of an information processing apparatus 1. The information processing apparatus 1 includes a communication unit 11, a processor 12, a RAM 13, a flash memory 14, a display 15, and a user I/F16.
The information processing apparatus 1 is constituted by a personal computer, a smart phone, a tablet computer, or the like. In addition, acoustic devices such as mixers are also examples of information processing devices.
The communication unit 11 communicates with another device such as a server. The communication unit 11 has, for example, a wireless communication function such as Bluetooth (registered trademark) or Wi-Fi (registered trademark), and a wired communication function such as USB or LAN. The communication unit 11 acquires, for example, space information indicating the shape of a physical space such as a concert hall. The spatial information is information indicating 2-dimensional or 3-dimensional coordinates with a certain position as a reference point (origin). The space information is information containing 2-dimensional or 3-dimensional coordinates of CAD data representing the shape of a physical space such as a concert hall.
The processor 12 is constituted by a CPU, DSP, soC (System on a Chip), or the like. The processor 12 reads out a program from the flash memory 14 as a storage medium, and temporarily stores the program in the RAM 13, thereby performing various operations. The processor 12 realizes the functional configurations of the space setting unit 141, the audio/video positioning information receiving unit 142, the converting unit 143, and the like by the read program. Further, the program need not be stored in the flash memory 14. The processor 12 may be temporarily stored in the RAM 13, for example, when it is necessary to download from another device such as a server.
The display 15 is constituted by an LCD or the like. The display 15 displays, for example, a setting screen for audio/video localization as shown in fig. 2.
The user I/F16 is an example of an operation section. The user I/F16 is constituted by a mouse, a keyboard, a touch panel, or the like. The user I/F16 accepts the operation of the user. The touch panel may be laminated on the display 15.
Setting screens for audio/video localization will be described with reference to fig. 2 and 3. Fig. 2 is a diagram showing an example of a setting screen for audio/video positioning to be displayed on the display 15. Fig. 3 is a flowchart showing the operation of the processor 12. The setting screen of audio/video localization shown in fig. 2 is an example of an editing screen of content (content). The user edits the audio/video localization position of the audio source included in the content on the audio/video localization setting screen.
The display 15 displays a logical space image 151 of a logical coordinate system and a physical space image 152 of a physical coordinate system. In this example, the display 15 displays the logical space image 151 on the upper left of the screen, and displays the physical space image 152 on the upper right of the screen. The display 15 displays the logical plane image 153 on the lower left side of the screen, and displays the physical plane image 154 on the lower right side of the screen.
The logical space image 151 is, as an example, a rectangular parallelepiped shape. The logical plane image 153 corresponds to a top view of the logical space image 151. The physical space image 152 is in the shape of an eight prism as one example. The physical plane image 154 corresponds to a top view of the physical space image 152.
First, the space setting unit 141 of the processor 12 receives the 1 st space information which is information corresponding to the logical space and the 2 nd space information which is information corresponding to the physical space of the concert hall or the like (S11).
The 1 st spatial information is a logical coordinate. The logical coordinates are normalized coordinates of, for example, 0 to 1. In the present embodiment, the space setting unit 141 receives setting of the space information of the rectangular parallelepiped as the 1 st space information, but may receive various kinds of space information such as pyramid, square column, polyhedron, cylinder, cone, sphere, and the like in addition to this. The space setting unit 141 may receive information in a 2-dimensional space. The 2-dimensional space includes, for example, a polygon formed of a straight line, a circle formed of a curved line, or a composite shape formed of a straight line and a curved line.
The 2 nd spatial information is physical coordinates. The physical coordinates are 2-dimensional or 3-dimensional coordinates included in CAD data or the like representing the shape of a physical space such as a concert hall. The space setting unit 141 of the processor 12 reads information including coordinates of 2-dimensional or 3-dimensional such as CAD data from the flash memory 14, for example, and thereby receives the setting of the 2 nd space information.
Next, the space setting unit 141 generates a logical space image 151, a physical space image 152, a logical plane image 153, and a physical plane image 154, and displays them on the display 15 (S12). In the example of fig. 2, the logical space image 151 is an image of an elevation view of a cube shape, and the logical plane image 153 is an image of a square shape. The physical space image 152 and the physical plane image 154 are images simulating the actual space of a concert hall or the like. The space setting unit 141 generates a physical space image 152 and a physical plane image 154 based on information including coordinates of 2-dimensional or 3-dimensional such as CAD data.
Next, the audio/video localization information receiving unit 142 of the processor 12 receives the speaker arrangement information or audio/video localization information (S13). The speaker arrangement information and the audio/video positioning information are coordinates of a logical coordinate system, respectively, and are 1 st audio/video positioning information.
The user operates the user I/F16, and edits the speaker arrangement information or audio/video positioning information in the logical space image 151 or the logical plane image 153 shown in fig. 2. For example, in the example of fig. 2, the user arranges a center speaker 50C, a left speaker 50L, a right speaker 50R, a left rear speaker 50SL, and a right rear speaker 50SR in the logical space image 151 and the logical plane image 153. The center speaker 50C, the left speaker 50L, the right speaker 50R, the left rear speaker 50SL, and the right rear speaker 50SR are arranged in the middle in the height direction.
If the position of the top-left vertex in the logical plane image 153 is set as the origin, the coordinates of the left speaker 50L are (x, y) = (0, 0). The coordinates of the right speaker 50R are (x, y) = (1, 0). The coordinates of the center speaker 50C are (x, y) = (0.5, 0). The coordinates of the rear left speaker 50SL are (x, y) = (0, 1). The coordinates of the rear right speaker 50SR are (x, y) = (1, 1).
In the example of fig. 2, the user arranges the audio/video localization position of the sound source 55 on the left side (between the left end and the center) of the logical space image 151 and the logical plane image 153 with respect to the center. That is, the coordinates of the sound source 55 are (x, y) = (0.25, 0.5).
In the example of fig. 2, the coordinates of the center speaker 50C, the left speaker 50L, the right speaker 50R, the left rear speaker 50SL, the right rear speaker 50SR, and the sound source in the height direction are all z=0.5.
As shown in fig. 2, for example, the audio/video positioning information receiving unit 142 receives the speaker arrangement information or the audio/video positioning information of the audio source by the user editing the speaker arrangement information or the audio source position information (S13: yes).
The conversion unit 143 performs coordinate conversion based on the received speaker arrangement information or sound source position information (S14).
Fig. 4 and 5 are diagrams for explaining the concept of coordinate transformation. The conversion unit 143 converts the speaker arrangement information and the sound source position information from the 1 st coordinates of the 1 st spatial information of the logical coordinate system to the 2 nd coordinates of the 2 nd spatial information of the physical coordinate system. In the example of fig. 4, 8 reference points 70A (x 1, y 1), 70B (x 2, y 2), 70C (x 3, y 3), 70D (x 4, y 4), 70E (x 5, y 5), 70F (x 6, y 6), 70G (x 7, y 7), 70H (x 8, y 8) and 8 reference points 70A (0, 0), 70B (0.25, 0), 70C (0.75,0), 70D (1, 0), 70E (0, 1), 70F (0.25,1), 70G (0.75,1), 70H (1, 1) of the logical coordinate system before conversion exist in the physical coordinate system. The transformation unit 143 obtains the barycenter G of 8 reference points in the logical coordinate system before transformation and the barycenter G' of 8 reference points in the physical coordinate system after transformation, and generates a triangle mesh centering on these barycenters. The transformation unit 143 transforms the internal space of the triangle of the logical coordinate system and the internal space of the triangle of the physical coordinate system by a predetermined coordinate transformation. The transformation uses affine transformation, for example. Affine transformation is an example of geometric transformation. The affine transformation expresses the transformed x-coordinate (x ') and y-coordinate (y') as a function of the x-coordinate (x) and y-coordinate (y), respectively, before transformation. That is, affine transformation performs coordinate transformation by the formula of x '=ax+by+c and y' =dx+ey+f. The coefficients a to f can be uniquely obtained from the coordinates of the three vertices of the triangle before transformation and the coordinates of the three vertices of the triangle after transformation. The transformation unit 143 obtains affine transformation coefficients similarly for all triangles, thereby transforming from the 1 st coordinate of the logical coordinate system to the 2 nd coordinate of the 2 nd spatial information of the physical coordinate system. The coefficients a to f may be obtained by a least square method.
The conversion unit 143 converts the coordinates of the speaker arrangement information and the sound source position information using the obtained coefficients a to f. In fig. 5, the conversion unit 143 converts the coordinates (x, y) of the logical coordinate system of the sound source 55 into the coordinates (x ', y') of the physical coordinate system using the above formula.
Thus, the coordinates of the speaker arrangement information and the sound source position information are converted into 2 nd audio/video positioning information matching the shape of the physical space. Processor 12 stores the 2 nd audio/video localization information, for example, in flash memory 14. Alternatively, the processor 12 transmits the 2 nd audio/video localization information to another device such as an acoustic device, for example, via the communication unit 11. The acoustic device performs processing for localizing the audio image based on the received 2 nd audio image localization information. The acoustic device calculates the level balance of the audio signals output to the plurality of speakers based on the configuration information of the speakers and the sound source position information included in the 2 nd audio/video localization information, and adjusts the level of the audio signals so that the audio/video of the sound source is localized at the specified position. Thus, the information processing apparatus 1 according to the present embodiment can realize audio/video localization in consideration of the shape of the physical space.
The mesh may be a polygonal mesh other than a triangle, or a combination thereof. For example, as shown in fig. 9, the transformation unit 143 may generate a square grid and perform coordinate transformation. The transformation method is not limited to the affine transformation described above. For example, the conversion unit 143 may convert the grid of the quadrangle based on the following formula, and convert the coordinates (x, y) of the logical coordinate system of the sound source 55 into the coordinates (x ', y') of the physical coordinate system (where x0, y0, x1, y1, x2, y2, x3, y3 are the coordinates of the conversion points, respectively). x '=x0+ (x 1-x 0) x+ (x 3-x 0) y+ (x 0-x1+x2-x 3) xyy' =y0+ (y 1-y 0) x+ (y 3-y 0) y+ (y 0-y1+y2-y 3) xy
Other geometric transformations, such as isometric mapping, similarity transformation, or projective transformation, may be used in addition to the transformation method. For example, projective transformation is represented by the formulas of x '= (ax+by+c)/(gx+hy+1) and y' = (dx+ey+f)/(gx+hy+1). Coefficients were obtained in the same manner as the affine transformation described above. For example, 8 coefficients (a to h) constituting the projective transformation of a quadrangle can be uniquely obtained by an 8-way cube program. Alternatively, the coefficient may be obtained by, for example, a least square method.
Fig. 6 is a block diagram showing the structure of an information processing apparatus 1A according to modification 1. Fig. 7 is a flowchart showing the operation of the information processing apparatus 1A. The same components, functions and operations as those of the information processing apparatus 1 are denoted by the same reference numerals, and description thereof is omitted.
The information processing apparatus 1A also has an audio I/F17. The audio I/F17 is constituted by an analog audio terminal, a digital audio terminal, or the like. The processor 12 obtains the sound signal of the sound source via the audio I/F17. Thus, the processor 12 functions as a sound signal acquisition unit. The audio signal may be acquired from an external device via the communication unit 11. In addition, the audio signal may be stored in the flash memory 14.
The audio I/F17 is connected to a center speaker 50C, a left speaker 50L, a right speaker 50R, a left rear speaker 50SL, and a right rear speaker 50SR, which are provided in an actual space such as a concert hall.
The processor 12 has a DSP. The processor 12 performs predetermined signal processing on the sound signal. The processor 12 outputs the signal-processed sound signals to the center speaker 50C, the left speaker 50L, the right speaker 50R, the left rear speaker 50SL, and the right rear speaker 50SR via the audio I/F17.
The processor 12 reads out the program stored in the flash memory 14 to the RAM 13, thereby also realizing the functional configuration of the positioning processing section 144. The localization processing unit 144 of the processor 12 performs processing for localizing the audio signal audio image at a position corresponding to the 2 nd audio image localization information based on the speaker arrangement information and the audio source position information (2 nd audio image localization information) converted by the conversion unit 143 (S15). That is, the localization processing unit 144 calculates the level balance of the audio signals output to the center speaker 50C, the left speaker 50L, the right speaker 50R, the left rear speaker 50SL, and the right rear speaker 50SR based on the arrangement information of the speakers and the sound source position information included in the 2 nd audio/video localization information, and adjusts the level of the audio signals so that the audio/video of the sound source is localized at the specified position. The information processing apparatus may perform audio-visual localization processing as described above.
In fig. 2 to 5, coordinate transformation in a 2-dimensional space (plane) is shown. However, the information processing apparatus may perform coordinate transformation in a 3-dimensional space. In this case, x ', y ', z ' of the transformed coordinates are expressed by functions of x, y, z, respectively. The conversion unit 143 converts the speaker arrangement information and the sound source position information based on the function.
The information in the 3-dimensional space may be information including plane coordinates (x, y) and information indicating a plurality of layers (layers) in the height direction.
Fig. 8 is a diagram showing a concept of a Layer (Layer). The user operates the user I/F16 to edit the layer, speaker arrangement information, or audio/video localization information of the sound source. In the example of fig. 8, the user designates the height of 3 layers arranged in the height direction. In addition, the user designates the arrangement of speakers or the audio/video of the sound source at any of the designated layers. In the example of fig. 8, the user designates layers 151L1, 151L2, 151L3, 152L1, 152L2, and 152L3, and designates the arrangement of speakers or the audio/video of the sound source from these layers.
The transformation unit 143 transforms the coordinates (x ', y') of the plane of the physical coordinate system by almost transformation as described above. The coordinates of the height are specified by the user. In the example of fig. 8, the layer 151L1 of the logical coordinate system is z=1.0, the layer 151L2 is a coordinate of z=0.5, and the layer 151L3 is z=0. The layer 152L1 of the physical coordinate system corresponds to the highest position in the actual space, i.e., the coordinates of the ceiling surface. The layer 152L3 of the physical coordinate system corresponds to the lowest position in the actual space, i.e., the coordinates of the ground. The layer 152L2 is a coordinate between the coordinates of the ceiling surface and the coordinates of the ground surface. For example, when the sound source 55 is arranged on the layer 151L3 of the logical coordinate system, the conversion unit 143 obtains the coordinate z3 in the height direction of the layer 152L3 of the physical coordinate system as the height information of the 2 nd audio/video localization information.
In addition, any of the arrangement information of the speakers or the sound source position information may specify coordinates between the plurality of layers. For example, configuration information of speakers may specify layers, and sound source position information may specify free positions within a 3-dimensional space. In this case, the conversion unit 143 obtains sound source position information based on the height information of the plurality of layers. The conversion unit 143 obtains the coordinates of the physical coordinate system by linear interpolation, for example. For example, when the sound source is located between the layers 151L1 and 151L2, the conversion unit 143 obtains the sound source coordinate z' after conversion from the coordinate z of the sound source before conversion in the following manner.
z’=(z-z1)*(z’2-z’1)/(z2-z1)+z’1
Of course, the number of layers is not limited to 3. The number of layers may also be 2 or greater than or equal to 4.
All aspects of the description of the present embodiments are to be considered in all respects as illustrative and not restrictive. The scope of the invention is not indicated by the embodiments described above but by the claims. The scope of the present invention encompasses the scope equivalent to the claims.
For example, the user can edit the speaker arrangement information and the audio/video localization information of the sound source in the physical space image 152 or the physical plane image 154. In this case, the space setting unit 141 receives the space information of the physical coordinate system as the 1 st space information and receives the space information of the logical coordinate system as the 2 nd space information. The conversion unit 143 converts the arrangement information of the speakers in the physical coordinate system (1 st audio/video localization information) into the arrangement information of the speakers in the logical coordinate system (2 nd audio/video localization information).
The number of sound sources is not limited to 1. Fig. 10 is a diagram showing an example of a setting screen of audio/video localization displayed on the display 15 when editing audio/video localization position information of a plurality of audio sources 55A and 55B. In the example of fig. 10, the information processing apparatus 1 or the information processing apparatus 1A displays the logical plane image 153 and the physical plane image 154, but actually displays the logical space image 151 and the physical space image 152. The operation of the information processing apparatus 1 is the same as the flowchart shown in fig. 3, and the operation of the information processing apparatus 1A is the same as the flowchart shown in fig. 7.
In this example, the user arranges audio/video localization positions of the audio sources 55A and 55B in the logical plane image 153 and the physical plane image 154. The coordinates of the sound source 55A are (x 1, y 1) = (0.25, 0.5). The coordinates of the sound source 55B are (x 2, y 2) = (0.25 ).
The user edits the sound sources 55A and 55B arranged in the logical plane image 153 or the physical plane image 154, respectively. For example, the user changes the sound source 55A and the sound source 55B arranged in the physical plane image 154 to different positions. The conversion unit 143 converts the sound source position coordinates (1 st sound image localization information) of the physical coordinate system of the sound source 55A and the sound source 55B after the change into sound source position information (2 nd sound image localization information) of the logical coordinate system.
The 1 st audio/video localization information may be defined as a group including a plurality of audio/video images. Fig. 11 (a) and 11 (B) are diagrams showing an example of a setting screen of audio/video localization displayed on the display 15 when audio/video localization position information of a plurality of audio sources 55A and 55B is edited.
In this case, the sound source 55A and the sound source 55B are defined as the same group. The 1 st audio/video localization information is defined as a group including the audio source 55A and the audio source 55B. The 2 nd audio/video localization information is also defined as a group including the audio source 55A and the audio source 55B. The user edits any of the sound sources 55A and 55B arranged in the logical plane image 153 or the physical plane image 154. The audio/video positioning information receiving unit 142 holds the relative positional relationship between the plurality of audio/video images included in the same group and changes the 1 st audio/video positioning information. For example, as shown in fig. 11 (a), the user changes the coordinates of the sound source 55A arranged in the logical plane image 153 from (x 1, y 1) = (0.25, 0.5) to the coordinates of (x 1, y 1) = (0.75 ). The audio/video localization information receiving unit 142 holds the relative positional relationship between the sound sources 55A and 55B included in the same group and changes the coordinates of the sound source 55B. The coordinates of the sound source 55A are (x 1, y 1) = (0.25, 0.5). The coordinates of the sound source 55B are (x 2, y 2) = (0.25 ). In this case, the relative position is (x 1-x2, y1-y 2) = (0,0.25). Therefore, the audio/video localization information receiving unit 142 changes the coordinates of the sound source 55B to (x 2, y 2) = (0.75,0.5). The display 15 displays the sound source 55A and the sound source 55B on the logical plane image 153 in correspondence with the coordinates of the changed sound source 55A and the sound source 55B. The conversion unit 143 then changes the sound source position coordinates (1 st sound image localization information) of the changed sound source 55A and sound source 55B in the logical coordinate system to sound source position information (2 nd sound image localization information) of the physical coordinate system. Alternatively, the conversion unit 143 may convert the coordinates of the sound source 55A and the relative position after the change in the logical coordinate system into the physical coordinate system. In this case, the conversion unit 143 may calculate the position of the sound source 55B in the physical coordinate system based on the coordinates of the sound source 55A in the physical coordinate system and the coordinates of the relative position. Then, as shown in fig. 11 (B), the display 15 changes the positions of the sound source 55A and the sound source 55B in the physical plane image 154.
The user may change any of the sound sources 55A and 55B arranged in the physical plane image 154. For example, if the user changes the sound sources 55A arranged in the physical plane image 154, the audio/video localization information receiving unit 142 holds the relative positional relationship between the sound sources 55A and 55B included in the same group and changes the coordinates of the sound sources 55B. The display 15 displays the sound source 55A and the sound source 55B on the physical plane image 154 in correspondence with the coordinates of the changed sound source 55A and the sound source 55B. The conversion unit 143 converts the sound source position coordinates (1 st sound image localization information) of the physical coordinate system of the sound source 55A and the sound source 55B after the change into sound source position information (2 nd sound image localization information) of the logical coordinate system. Then, the display 15 changes the positions of the sound sources 55A and 55B in the logical plane image 153.
The display 15 may display, for example, a representative point of the group. The user changes the positions of the representative points of the groups to change the positions of the sound sources 55A and 55B in common. In this case, the audio/video localization information receiving unit 142 also holds the relative positional relationship between the sound sources 55A and 55B included in the same group and changes the coordinates of the sound sources.
Next, an example will be described in which the specification of spatial information (3 rd spatial information) of another physical coordinate system is further received in the physical plane image 154. Fig. 12 is a diagram showing an example of a setting screen of audio/video positioning displayed on the display 15 of the information processing apparatus 1 or the information processing apparatus 1A. For ease of explanation, the display of the speaker is omitted in fig. 12. In this case, too, the operation of the information processing apparatus 1 is the same as the flowchart shown in fig. 3, and the operation of the information processing apparatus 1A is the same as the flowchart shown in fig. 7.
In this example, the display 15 further displays a physical plane image 155 within the physical plane image 154. The physical plane image 155 corresponds to 3 rd spatial information, which is different from 1 st spatial information corresponding to the physical plane image 154. The 3 rd spatial information is also physical coordinates. In the operation of S11 shown in fig. 3 and 7, the space setting unit 141 reads out information including 2-dimensional or 3-dimensional coordinates such as CAD data from the flash memory 14, for example, and thereby accepts setting of the 3 rd space information. In the above manner, the space setting unit 141 receives specification of the 3 rd space information from the 1 st space information displayed on the display 15. The audio/video localization information receiving unit 142 receives the audio/video position in the logical plane image 153 or the physical plane image 154. The conversion unit 143 converts the sound source position coordinates in the physical coordinate system into sound source position coordinates in the logical coordinate system or converts the sound source position coordinates in the logical coordinate system into sound source position coordinates in the physical coordinate system.
The audio/video localization information receiving unit 142 receives the change of the audio/video position in the logical plane image 153 or the physical plane image 154. For example, as shown in fig. 13 (a), the user designates the position of the sound source 55 arranged in the physical plane image 154 to the upper right end of the physical plane image 155.
The display 15 displays the position of the sound source 55 after the change on the physical plane image 154. In the example of fig. 13 (a), the display 15 displays the position of the sound source 55 at the upper right end portion of the physical plane image 155 displayed in the physical plane image 154.
The conversion unit 143 converts the sound source position coordinates (1 st sound image localization information) of the physical coordinate system into sound source position coordinates (2 nd sound image localization information) of the logical coordinate system by affine transformation or the like. In the above embodiment, the conversion unit 143 performs conversion between the physical coordinates corresponding to the physical plane image 154 and the logical coordinates corresponding to the logical plane image 153. On the other hand, in the example of fig. 13 (a), the 3 rd spatial information of the physical coordinate system corresponds to the 2 nd spatial information of the logical coordinate system. For example, the coordinates of the lower left end portion of the physical plane image 155 correspond to the coordinates (x, y) = (0, 0) of the logical coordinate system, the coordinates of the upper left end portion of the physical plane image 155 correspond to the coordinates (x, y) = (0, 1) of the logical coordinate system, the coordinates of the lower right end portion of the physical plane image 155 correspond to the coordinates (x, y) = (1, 0) of the logical coordinate system, and the coordinates of the upper right end portion of the physical plane image 155 correspond to the coordinates (x, y) = (1, 1) of the logical coordinate system.
The conversion unit 143 obtains sound source position coordinates in the logical coordinate system based on the 3 rd spatial information of the physical plane image 155 and the 2 nd spatial information of the logical plane image 153. That is, the conversion unit 143 converts the physical coordinates corresponding to the physical plane image 155 into logical coordinates corresponding to the logical plane image 153.
In the example of fig. 13 (a), the sound source 55 is located at the upper right end of the physical planar image 155. Therefore, the sound source position coordinates of the logical coordinate system of the sound source 55 become (x, y) = (1, 1). The display 15 displays the position of the sound source 55 on the sound source position coordinates of the logical coordinate system obtained by the conversion unit 143.
As described above, the conversion unit 143 converts the 1 st audio/video localization information in the physical coordinate system into the 2 nd audio/video localization information in the logical coordinate system based on the 3 rd spatial information and the 2 nd spatial information.
As shown in fig. 14 (a), the user can specify the position of the sound source 55 within the physical planar image 154 and outside the physical planar image 155. The coordinates of the end of the physical plane image 155 correspond to the coordinates of the end of the logical plane image 153. Therefore, when the sound source 55 is located outside the physical plane image 155, the conversion unit 143 causes at least one of the x-coordinate or the y-coordinate of the coordinates of the sound source 55 in the logical coordinate system to correspond to 0 or 1 corresponding to the end of the logical plane image 153. In the example of fig. 14 (B), since the x-coordinate and the y-coordinate of the coordinates of the sound source 55 are both specified outside the physical plane image 155, even if the user changes the position of the sound source 55 further rightward from the upper right end of the physical plane image 155, the position of the sound source 55 of the logical plane image 153 is maintained (x, y) = (1, 1) without change.
The same applies to a plurality of sound sources defined in the same group. When at least 1 sound source among the plurality of sound sources in the same group is specified to be outside the physical plane image 155, the conversion unit 143 associates at least one of the x-coordinate or the y-coordinate of the coordinates of the sound source 55 in the logical coordinate system with 0 or 1 with respect to the sound source specified to be outside the physical plane image 155. The coordinates of the physical coordinate system are changed to the relative positional relationship to be held in the group with respect to the coordinates of the specified physical coordinate system of the sound source specified in the physical planar image 155, with respect to the sound source specified in the same group as the sound source specified in the physical planar image 155, except for the sound source specified in the physical planar image 155.
As described above, the user edits the sound sources 55 arranged in the logical plane image 153 or the physical plane image 154. As shown in fig. 15 (a), when the sound source 55 is located outside the physical plane image 155, the user may change the position of the sound source 55 in the logical plane image 153. As described above, the physical plane image 155 of the physical coordinate system corresponds to the logical plane image 153 of the logical coordinate system. As shown in fig. 15 (a), when the sound source 55 is located outside the physical plane image 155 and the user changes the position of the sound source 55 in the logical plane image 153, it is assumed that if the conversion unit 143 converts the coordinates in the logical coordinate system into the coordinates in the physical coordinate system, the position of the sound source 55 in the physical coordinate system is instantaneously shifted from outside the physical plane image 155 to inside the physical plane image 155. Since the acoustic device performs processing for locating the audio/video based on the position of the sound source 55 in the physical coordinate system, if the position of the sound source 55 in the physical coordinate system is instantaneously shifted from outside the physical planar image 155 to inside the physical planar image 155, the location of the audio/video changes sharply.
Here, the information processing apparatus 1 or the information processing apparatus 1A performs the operation shown in the flowchart of fig. 16. The information processing apparatus 1 or the information processing apparatus 1A performs the operation shown in fig. 16 when the logical coordinate system receives a change in the position of the sound source 55 in a state where the sound source 55 is located outside the physical plane image 155. First, the conversion unit 143 obtains the position of the sound source 55 in the physical coordinate system (sound source position coordinates (1 st audio/video localization information) in the physical coordinate system) (S31). Specifically, the conversion unit 143 obtains the relative positions of the sound source 55 before and after the movement in the logical coordinate system. The conversion unit 143 converts the relative position into a relative position in the physical coordinate system, and obtains the position of the sound source 55 after the movement in the physical coordinate system. In this case, the conversion unit 143 may correlate the physical coordinate system corresponding to the physical plane image 155 with the logical coordinate system, and convert the relative position of the logical coordinate system into the relative position of the physical coordinate system. Alternatively, the conversion unit 143 may correlate the physical coordinate system corresponding to the physical plane image 154 with the logical coordinate system, and convert the relative position of the logical coordinate system into the relative position of the physical coordinate system.
The conversion unit 143 may correlate the physical coordinate system corresponding to the physical plane image 154 with the logical coordinate system, convert the moved coordinates of the sound source 55 in the logical coordinate system into the moved coordinates of the physical coordinate system, and determine the position of the moved sound source 55 in the physical coordinate system.
As shown in fig. 15 a, the display 15 first displays the position of the sound source 55 after the movement on the physical plane image 154 of the physical coordinate system (S32). At this time, the display 15 preferably displays the position of the sound source 55 after movement in the logical coordinate system in a light color or in a broken line as a temporary position.
Then, the conversion unit 143 converts the coordinates of the sound source 55 in the physical coordinate system after the movement into coordinates in the logical coordinate system (sound source position coordinates (2 nd audio/video positioning information) in the logical coordinate system) (S33). When the coordinates of the sound source 55 in the physical coordinate system are within the physical plane image 155, the coordinates of the sound source 55 in the logical coordinate system are within the logical plane image 153. On the other hand, when the coordinates of the sound source 55 in the physical coordinate system are out of the physical plane image 155, the conversion unit 143 makes the coordinates of the sound source 55 in the logical coordinate system correspond to 0 or 1, which is the coordinates of the end of the logical coordinate system. In this case, as shown in fig. 15 (B), the sound source 55 of the logical coordinate system is still kept positioned at the end of the logical plane image 153.
As indicated above. When the sound source 55 is located outside the physical plane image 155, the position of the sound source 55 in the physical coordinate system does not change rapidly even when the position of the sound source 55 in the logical plane image 153 is changed. The localization position of the audio-visual does not change sharply. The operation shown in fig. 16 is not limited to the case where the sound source 55 is positioned outside the physical plane image 155 and the position of the sound source 55 is changed in the logical coordinate system, and may be performed when the position of the sound source 55 is changed in the logical coordinate system.
(other examples)
The images of the logical coordinate system (logical space image 151 and logical plane image 153) and the images of the physical coordinate system (physical space image 152 and physical plane image 154) may be displayed on different devices. For example, an image of a logical coordinate system may be displayed on the information processing apparatus 1, and an image of a physical coordinate system may be displayed on the information processing apparatus 1A. In this case, the information processing apparatus 1 and the information processing apparatus 1A may transmit and receive spatial information and information indicating coordinates of the audio source to and from each other.
Fig. 10 to 15 show the display of sound sources in a 2-dimensional space (plane) and coordinate transformation. However, the display of the sound source in the 3-dimensional space and the coordinate transformation may be performed. As shown in fig. 8, the information in the 3-dimensional space may include plane coordinates (x, y) and information indicating a plurality of layers in the height direction.
Description of the reference numerals
1. 1A … information processing device
11 … communication part
12 … processor
13…RAM
14 … flash memory
15 … display
16 … user I/F
17 … Audio I/F
50C … central loudspeaker
50L … left loudspeaker
50R … right loudspeaker
50SL … left rear loudspeaker
50SR … right rear loudspeaker
55 … sound source
70A, 70B, 70C, 70D, 70E, 70F, 70G, 70H … datum point
141 … space setting part
142 … audio-video positioning information receiving part
143 … conversion unit
144 … positioning processing part
151 … logical space image
152 … physical space image
151L1, 151L2, 151L3, 152L1, 152L2, 152L3, … layers
153 … logical plane image
154 … physical plane image

Claims (23)

1. An information processing method for accepting a setting of 1 st space information which is information corresponding to one of a logical space and a physical space and 2 nd space information which is information corresponding to the other of the logical space and the physical space,
the logical space and the physical space are different in shape,
receiving 1 st audio/video positioning information in the 1 st coordinates of the 1 st spatial information, the 1 st audio/video positioning information indicating a position for positioning an audio/video of an audio source included in the content,
and converting the 1 st audio-video positioning information into 2 nd audio-video positioning information corresponding to the 2 nd coordinate of the 2 nd spatial information.
2. The information processing method according to claim 1, wherein,
a sound signal is obtained and a sound signal is obtained,
and positioning the audio signal at a position corresponding to the 2 nd audio-video positioning information.
3. The information processing method according to claim 1 or 2, wherein,
the 1 st spatial information and the 2 nd spatial information include information of a 2-dimensional space.
4. The information processing method according to claim 1 or 2, wherein,
the 1 st spatial information and the 2 nd spatial information include information of a 3-dimensional space.
5. The information processing method according to claim 4, wherein,
the information of the 3-dimensional space includes plane coordinates and information representing a plurality of layers in a height direction.
6. The information processing method according to claim 5, wherein,
the 1 st audio-visual localization information is transformed into the 2 nd audio-visual localization information based on the height information of the plurality of layers.
7. The information processing method according to claim 1 or 2, wherein,
the 1 st audio/video localization information is defined as a group including a plurality of audio/video images,
when the position of at least any one of the plurality of audio images is changed, the 1 st audio image positioning information is maintained and changed by maintaining the relative positional relationship of each of the plurality of audio images included in the group,
and converting the 1 st audio image localization information of each of the plurality of audio images included in the group into the 2 nd audio image localization information.
8. The information processing method according to claim 1 or 2, wherein,
the 1 st space information is information corresponding to a logical space,
the 2 nd space information is information corresponding to a physical space.
9. The information processing method according to claim 1 or 2, wherein,
the 1 st space information is information corresponding to a physical space,
the 2 nd space information is information corresponding to a logical space.
10. The information processing method according to claim 9, wherein,
displaying the 1 st spatial information, the 2 nd spatial information, the 1 st audio-video positioning information and the 2 nd audio-video positioning information,
further, the specification of the 3 rd space information corresponding to the physical space is accepted among the 1 st space information displayed.
11. The information processing method according to claim 10, wherein,
the change of the position of the audio/video is accepted in the 2 nd spatial information,
the 1 st audio/video localization information is obtained based on the received change, and then the 1 st audio/video localization information is converted into the 2 nd audio/video localization information based on the 3 rd spatial information and the 2 nd spatial information.
12. An information processing apparatus, comprising:
a space setting unit that receives a setting of 1 st space information, which is information corresponding to one of a logical space and a physical space, and 2 nd space information, which is information corresponding to the other of the logical space and the physical space, the logical space and the physical space having different shapes;
an audio/video positioning information receiving unit that receives 1 st audio/video positioning information indicating a position at which an audio/video of an audio source included in the content is positioned, in the 1 st coordinates of the 1 st spatial information; and
and a conversion unit configured to convert the 1 st audio/video localization information into 2 nd audio/video localization information corresponding to the 2 nd coordinate of the 2 nd spatial information.
13. The information processing apparatus according to claim 12, wherein,
the device comprises:
a sound signal acquisition unit that acquires a sound signal; and
and a localization processing unit that localizes the audio/video at a position corresponding to the 2 nd audio/video localization information.
14. The information processing apparatus according to claim 12 or 13, wherein,
the 1 st spatial information and the 2 nd spatial information include information of a 2-dimensional space.
15. The information processing apparatus according to claim 12 or 13, wherein,
the 1 st spatial information and the 2 nd spatial information include information of a 3-dimensional space.
16. The information processing apparatus according to claim 15, wherein,
the information of the 3-dimensional space includes plane coordinates and information representing a plurality of layers in a height direction.
17. The information processing apparatus according to claim 16, wherein,
the conversion unit converts the 1 st audio/video localization information into the 2 nd audio/video localization information based on the height information of the plurality of layers.
18. The information processing apparatus according to claim 12 or 13, wherein,
the 1 st audio/video localization information is defined as a group including a plurality of audio/video images,
when receiving a change in the position of at least one of the plurality of audio images, the audio image localization information receiving unit holds the relative positional relationship between the plurality of audio images included in the group and changes the 1 st audio image localization information,
the conversion unit converts the 1 st audio image localization information of each of the plurality of audio images included in the group into the 2 nd audio image localization information.
19. The information processing apparatus according to claim 12 or 13, wherein,
the 1 st space information is information corresponding to a logical space,
the 2 nd space information is information corresponding to a physical space.
20. The information processing apparatus according to claim 12 or 13, wherein,
the 1 st space information is information corresponding to a physical space,
the 2 nd space information is information corresponding to a logical space.
21. The information processing apparatus according to claim 12 or 13, wherein,
the device comprises:
a display that displays the 1 st spatial information, the 2 nd spatial information, the 1 st audio/video positioning information, and the 2 nd audio/video positioning information; and
an operation unit that receives an operation of a user who designates the 1 st audio/video positioning information.
22. The information processing apparatus according to claim 21, wherein,
the device comprises:
a display that displays the 1 st spatial information, the 2 nd spatial information, the 1 st audio/video positioning information, and the 2 nd audio/video positioning information; and
an operation unit for receiving an operation of a user who designates the 1 st audio/video positioning information,
the operation unit receives specification of 3 rd space information among the 1 st space information displayed on the display.
23. The information processing apparatus according to claim 22, wherein,
the operation unit receives the change of the position of the audio/video in the 2 nd spatial information,
the conversion unit obtains the 1 st audio/video localization information based on the received change, and then converts the 1 st audio/video localization information into the 2 nd audio/video localization information based on the 3 rd spatial information and the 2 nd spatial information.
CN202110290568.1A 2020-03-24 2021-03-18 Information processing method and information processing apparatus Active CN113453126B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020-052801 2020-03-24
JP2020052801 2020-03-24

Publications (2)

Publication Number Publication Date
CN113453126A CN113453126A (en) 2021-09-28
CN113453126B true CN113453126B (en) 2023-04-28

Family

ID=77809021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110290568.1A Active CN113453126B (en) 2020-03-24 2021-03-18 Information processing method and information processing apparatus

Country Status (2)

Country Link
JP (1) JP2021153292A (en)
CN (1) CN113453126B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023141461A (en) * 2022-03-24 2023-10-05 ヤマハ株式会社 Video processing method and video processing device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1419796A (en) * 2000-12-25 2003-05-21 索尼株式会社 Virtual sound image localizing device, virtual sound image localizing, and storage medium
CN1901755A (en) * 2005-07-19 2007-01-24 雅马哈株式会社 Acoustic design support apparatus
JP2015179986A (en) * 2014-03-19 2015-10-08 ヤマハ株式会社 Audio localization setting apparatus, method, and program
CN106961647A (en) * 2013-06-10 2017-07-18 株式会社索思未来 Audio playback and method
CN108429998A (en) * 2018-03-29 2018-08-21 广州视源电子科技股份有限公司 Source of sound localization method and system, sound box system localization method and sound box system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1419796A (en) * 2000-12-25 2003-05-21 索尼株式会社 Virtual sound image localizing device, virtual sound image localizing, and storage medium
CN1901755A (en) * 2005-07-19 2007-01-24 雅马哈株式会社 Acoustic design support apparatus
CN106961647A (en) * 2013-06-10 2017-07-18 株式会社索思未来 Audio playback and method
JP2015179986A (en) * 2014-03-19 2015-10-08 ヤマハ株式会社 Audio localization setting apparatus, method, and program
CN108429998A (en) * 2018-03-29 2018-08-21 广州视源电子科技股份有限公司 Source of sound localization method and system, sound box system localization method and sound box system

Also Published As

Publication number Publication date
JP2021153292A (en) 2021-09-30
CN113453126A (en) 2021-09-28

Similar Documents

Publication Publication Date Title
KR101923562B1 (en) Method for efficient re-rendering objects to vary viewports and under varying rendering and rasterization parameters
JP4065507B2 (en) Information presentation apparatus and information processing method
AU2014203449B2 (en) Information processing apparatus, and displaying method
KR101932537B1 (en) Method and Apparatus for displaying the video on 3D map
JP4926826B2 (en) Information processing method and information processing apparatus
JP2016006627A (en) Image processor and image processing method
US7123214B2 (en) Information processing method and apparatus
JPWO2014128760A1 (en) Display device, display method, display program, and position setting system
CN113453126B (en) Information processing method and information processing apparatus
KR20210023671A (en) Image processing method and image processing apparatus for generating texture of 3d content using 2d image
EP2022010A1 (en) Virtual display method and apparatus
US10902554B2 (en) Method and system for providing at least a portion of content having six degrees of freedom motion
EP3885894A2 (en) Information processing method, information processing device, and non-transitory storage medium
EP4149124A1 (en) Information processing method, information processing apparatus and program
CN109462811B (en) Sound field reconstruction method, device, storage medium and device based on non-central point
JP2000010756A (en) Computer capable of connecting plural speakers, and record medium
CN109302668B (en) Sound field reconstruction method, device, storage medium and device based on non-central point
CN111476873A (en) Mobile phone virtual doodling method based on augmented reality
KR102662058B1 (en) An apparatus and method for generating 3 dimension spatial modeling data using a plurality of 2 dimension images acquired at different locations, and a program therefor
JP5200141B2 (en) Video presentation system, video presentation method, program, and recording medium
US20220044351A1 (en) Method and system for providing at least a portion of content having six degrees of freedom motion
KR20190056694A (en) Virtual exhibition space providing method using 2.5 dimensional object tagged with digital drawing element
JP2003323636A (en) Three-dimensional supplying device and method and image synthesizing device and method and user interface device
CN109348398B (en) Sound field reconstruction method, device, storage medium and device based on non-central point
JP2002245486A (en) Device, method and program for three-dimensional model generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant