CN110120229A - The processing method and relevant device of Virtual Reality audio signal - Google Patents

The processing method and relevant device of Virtual Reality audio signal Download PDF

Info

Publication number
CN110120229A
CN110120229A CN201810114171.5A CN201810114171A CN110120229A CN 110120229 A CN110120229 A CN 110120229A CN 201810114171 A CN201810114171 A CN 201810114171A CN 110120229 A CN110120229 A CN 110120229A
Authority
CN
China
Prior art keywords
redundant
audio signal
motion information
audio
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810114171.5A
Other languages
Chinese (zh)
Other versions
CN110120229B (en
Inventor
杨磊
高巧展
王立众
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Samsung Telecommunications Technology Research Co Ltd
Samsung Electronics Co Ltd
Original Assignee
Beijing Samsung Telecommunications Technology Research Co Ltd
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Samsung Telecommunications Technology Research Co Ltd, Samsung Electronics Co Ltd filed Critical Beijing Samsung Telecommunications Technology Research Co Ltd
Priority to CN201810114171.5A priority Critical patent/CN110120229B/en
Publication of CN110120229A publication Critical patent/CN110120229A/en
Application granted granted Critical
Publication of CN110120229B publication Critical patent/CN110120229B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

The present invention provides the processing methods and relevant device of a kind of Virtual Reality audio signal, this method comprises: obtaining the corresponding redundancy motion information of VR audio signal;According to the redundancy motion information, processing is adjusted to the VR audio signal.Compared with prior art, the present invention processing can be adjusted to VR audio signal according to the corresponding redundancy motion information of the VR audio signal got, to eliminate influence of the redundancy movement to VR audio signal, the stability and accuracy of VR audio signal are promoted.

Description

Virtual reality VR audio signal processing method and corresponding equipment
Technical Field
The invention relates to the technical field of virtual reality VR audio, in particular to a processing method and corresponding equipment of a virtual reality VR audio signal.
Background
With the interest in VR (Virtual Reality) products, many companies and organizations are focusing on developing VR technology. VR audio is a key technology in the VR field, and can provide users with auditory content with spatial resolution, as shown in fig. 1, the audio signal applied by VR is perfectly fused with the real scene, so that users can have an immersive VR experience.
With the continuous development of the VR technology, the recording and sharing of VR audios and videos also become a very important ring in the VR market. The advent of VR cameras has provided the possibility of more realistic audio and video capture and reproduction. The VR camera adopts a three-dimensional panoramic technology, and a virtual reality scene can be generated through image splicing.
Different from the traditional two-dimensional image technology, because the VR introduces the position information of the third dimension, the consistency of sound and picture can generate great influence on the VR audio and video. In order to better realize the immersion feeling during the playback of VR audios and videos, the synchronization of VR audios and images is particularly important.
In the traditional two-dimensional audio and video recording process, if meaningless jitter occurs, a user can see the jittered picture and possibly feel uncomfortable when watching the recorded audio and video subsequently, so that image anti-jitter is an important function of a two-dimensional camera. However, most of the existing two-dimensional audio and video recording devices are monaural stereo, and have no position information of the third dimension, so that the influence of sound position distortion caused by jitter on users is not great, and the two-dimensional audio frequency is generally not considered in anti-jitter. Since jitter during recording is meaningless to the recorded content, the jitter may be referred to as redundant motion.
However, in the process of recording three-dimensional audio and video content (for example, a user holds a VR recording device to record VR content), if jitter occurs, as shown in fig. 2, when the recorded three-dimensional audio and video is played back, the jitter audio may cause severe discomfort to the user due to the fact that the VR introduces position information of a third dimension, and therefore anti-jitter processing of the three-dimensional audio needs to be considered. However, an effective way of handling jitter during recording three-dimensional audio is not proposed, which makes the user experience poor when listening to recorded three-dimensional audio content.
In addition to jitter when recording three-dimensional audio, when viewing VR content with a VR playback device, a user may generate motion in a real scene, such as when the user is in motion (e.g., running or walking), or when the user is in a vehicle in motion (e.g., when the user is sitting in a running car or a boat). When playing the VR content, the VR playing device will use the movement of the user as a playing operation instruction of the VR, and adjust the content played by the VR according to the movement of the user, but the movement of the user does not actually control the playing of the VR content, and for playing the VR content, the movement is meaningless, and may also be referred to as redundant movement.
Due to the redundant motion of the user, the played VR audio will be continuously jittered, which may cause serious discomfort to the user.
Disclosure of Invention
In order to overcome the above technical problems or at least partially solve the above technical problems, the following technical solutions are proposed:
the embodiment of the invention provides a method for processing a Virtual Reality (VR) audio signal, which comprises the following steps:
acquiring redundant motion information corresponding to the VR audio signal;
and adjusting the VR audio signal according to the redundant motion information.
Wherein the redundant motion information comprises: collecting redundant motion information during VR audio or playing redundant motion information during VR audio;
according to the redundant motion information, adjusting the VR audio signal, including:
adjusting the collected VR audio signal according to the redundant motion information during the collection of the VR audio; or,
and adjusting the VR audio signal to be played according to the redundant motion information when the VR audio is played.
Specifically, obtaining redundant motion information corresponding to a VR audio signal includes:
and determining redundant motion information corresponding to the VR audio signal through a deep learning network or a mode identification mode according to the collected scene video signal and/or sensor signal.
Optionally, the redundant motion information is redundant motion information when a VR audio is played;
through the deep learning network, redundant motion information corresponding to the VR audio signal is determined, and the method comprises the following steps:
determining redundant motion information corresponding to the VR audio signal through a deep learning network according to at least one of the following items:
acceleration information of the VR playing device;
collected sensor signals;
a captured scene audio signal;
a user motion pattern.
Optionally, the redundant motion information is redundant motion information when a VR audio is played;
before redundant motion information corresponding to the VR audio signal is obtained, the method further includes:
matching a redundant motion model determined according to a VR video signal to be played with a redundant motion model determined according to an acquired sensor signal;
and when the VR audio signals do not match with the virtual reality system, the step of obtaining the redundant motion information corresponding to the VR audio signals is executed.
Optionally, the redundant motion information is redundant motion information during VR audio acquisition;
before the adjustment processing is performed on the VR audio signal, the method further includes:
determining relative movement parameters of VR audio acquisition equipment corresponding to VR video acquisition equipment;
and carrying out relative movement correction processing on the VR audio signal according to the relative movement parameter.
Optionally, the determining a relative movement parameter of the VR audio acquisition device corresponding to the VR video acquisition device includes:
acquiring the rotation angle of the acquisition equipment according to the acquired sensor signal;
and determining the relative movement parameters of the VR audio acquisition equipment corresponding to the VR video acquisition equipment according to the rotation angle of the acquisition equipment and the pre-stored position relationship between the VR audio acquisition equipment and the VR video acquisition equipment.
Optionally, obtaining redundant motion information corresponding to the VR audio signal includes:
and acquiring effective motion information and redundant motion information corresponding to the VR audio signal.
Before the adjustment processing is performed on the VR audio signal, the method further includes:
and smoothing the redundant motion information based on the effective motion information.
Optionally, obtaining valid motion information and redundant motion information corresponding to the VR audio signal includes:
and determining effective motion information and redundant motion information corresponding to the VR audio signal through a deep learning network or a mode identification mode according to the collected scene video signal and/or sensor signal.
Optionally, determining, by a deep learning network, valid motion information and redundant motion information corresponding to the VR audio signal includes:
according to the collected scene video signals and/or sensor signals, effective motion information and redundant motion information corresponding to VR audio signals are determined through a deep learning network in combination with a user motion mode;
determining valid motion information and redundant motion information corresponding to the VR audio signal by a pattern recognition mode, including:
according to the collected scene video signals and/or sensor signals, combining with a user motion mode, and through matching with a preset general motion mode, predicting effective motion information;
and determining effective motion information corresponding to the VR audio signal according to the predicted effective motion information and the effective motion information obtained in the mode identification mode.
Optionally, the smoothing processing on the redundant motion information includes:
and determining the redundant motion information after the current frame is smoothed according to the effective motion information of the current frame of the VR audio signal, the redundant motion information of the current frame and the redundant motion information after the previous frame is smoothed.
Optionally, the redundant motion information includes: redundant angular rotation parameters and/or redundant displacement parameters.
Optionally, the adjusting the VR audio signal according to the redundant motion information includes:
according to the redundant angle rotation parameter, redundant angle elimination processing is carried out on the VR audio signal; and/or
And according to the redundant displacement parameter, carrying out redundant displacement elimination processing on the VR audio signal.
Optionally, the VR audio signal comprises a multi-channel audio signal;
according to the redundant angle rotation parameter, carrying out redundant angle elimination processing on the multi-channel audio signal, comprising the following steps:
determining a first virtual loudspeaker array before redundant angle elimination processing according to the multi-channel audio signal;
and determining the audio signal of each second virtual loudspeaker in the second virtual loudspeaker array after redundant angle elimination according to the redundant angle rotation parameter and the audio signal of each first virtual loudspeaker in the first virtual loudspeaker array.
Optionally, determining the audio signal of each second virtual speaker in the second virtual speaker array after performing the redundant angle elimination processing includes:
for each second virtual speaker of the second virtual speaker array, performing the following operations, respectively:
determining an angle relationship between the second virtual loudspeaker and each first virtual loudspeaker in the first virtual loudspeaker array according to the redundant angle rotation parameter; determining the audio signal of the second virtual loudspeaker according to the audio signal of each first virtual loudspeaker and the determined angle relation; or
Determining the angle relationship between the second virtual loudspeaker and each adjacent first virtual loudspeaker according to the redundant angle rotation parameters; and determining the audio signal of the second virtual loudspeaker according to the audio signals of the first virtual loudspeakers adjacent to the second virtual loudspeaker and the determined angular relationship.
Optionally, the VR audio signal comprises a high fidelity stereo Ambisonic signal;
according to the redundant angle rotation parameter, redundant angle elimination processing is carried out on the Ambisonic signal, and the redundant angle elimination processing method comprises the following steps:
determining the rotation angle of the Ambisonics signal according to the redundant angle rotation parameter;
and rotating the Ambisonics signal according to the rotation angle of the Ambisonics signal.
Optionally, performing redundant displacement elimination processing on the VR audio signal according to the redundant displacement parameter includes:
determining a third virtual loudspeaker array before redundant displacement elimination according to the VR audio signal;
and determining the audio signal of each fourth virtual loudspeaker in the fourth virtual loudspeaker array after the redundant displacement elimination processing is carried out according to the redundant displacement parameter and the audio signal of each third virtual loudspeaker in the third virtual loudspeaker array.
Optionally, determining the audio signal of each fourth virtual speaker in the fourth virtual speaker array after performing the redundant displacement elimination processing includes:
for each fourth virtual speaker of the fourth virtual speaker array, performing the following operations, respectively:
determining the relative position relationship between the fourth virtual loudspeaker and each third virtual loudspeaker according to the redundant displacement parameters; and determining the audio signal of the fourth virtual loudspeaker according to the audio signal of each third virtual loudspeaker and the determined relative position relation.
Optionally, if the VR audio signal is an Ambisonic signal, before performing the redundancy offset elimination processing on the VR audio signal, the method further includes:
performing multi-channel conversion processing on the Ambisonic signal;
after redundant displacement elimination processing is carried out on the VR audio signal, the method further comprises the following steps:
and converting the audio signal subjected to the redundant displacement elimination processing into an Ambisonic signal.
An embodiment of the present invention further provides a processing device for a virtual reality VR audio signal, including:
the acquisition module is used for acquiring redundant motion information corresponding to the VR audio signal;
and the processing module is used for adjusting and processing the VR audio signal according to the redundant motion information.
An embodiment of the present invention further provides an electronic device, including:
a processor; and
a memory configured to store machine-readable instructions that, when executed by the processor, cause the processor to perform the method of any of the above.
Compared with the prior art, the processing method and the corresponding equipment for the VR audio signal provided by the embodiment of the invention can adjust and process the VR audio signal according to the obtained redundant motion information corresponding to the VR audio signal, thereby eliminating the influence of redundant motion on the VR audio signal and improving the stability and accuracy of the VR audio signal.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a diagram of the visual and auditory co-production VR application immersion provided by the background art;
FIG. 2 is a diagram illustrating how the redundant motion provided by the prior art causes the sound and picture misregistration;
fig. 3 is a flowchart illustrating a processing method of a VR audio signal according to an embodiment of the present invention;
fig. 4 is an exploded schematic view of a redundant angular rotation parameter and a redundant displacement parameter according to an embodiment of the present invention;
fig. 5 is a schematic diagram of the separation of the effective motion and the redundant motion provided by the embodiment of the present invention.
Fig. 6 is a schematic flowchart of a process of acquiring redundant motion information when a VR audio is acquired in a pattern recognition manner according to an embodiment of the present invention;
fig. 7 is a schematic flowchart of a process of acquiring redundant motion information when a VR audio is acquired through a deep learning network according to an embodiment of the present invention;
fig. 8 is a schematic diagram illustrating position correction of a VR audio acquisition device and a VR video acquisition device according to a second embodiment of the present invention;
fig. 9 is a flowchart illustrating a VR audio signal processing method according to a second embodiment of the present invention;
fig. 10 is a schematic diagram of a method for calculating a relative movement parameter of a VR audio acquisition device corresponding to a VR video acquisition device according to a second embodiment of the present invention;
fig. 11 is a schematic flowchart of a processing method for processing relative movement parameters of a VR audio acquisition device corresponding to a VR video acquisition device according to a second embodiment of the present invention;
fig. 12 is a flowchart illustrating a method for processing redundant motion information during VR audio playback according to a third embodiment of the present invention;
fig. 13 is a schematic flowchart of a process of acquiring redundant motion information when playing VR audio through a deep learning network according to a third embodiment of the present invention;
fig. 14 is a schematic diagram of a flow of acquiring acceleration information of a VR playback device according to a third embodiment of the present invention;
fig. 15 is a schematic diagram illustrating obtaining acceleration information of a VR playback device according to a third embodiment of the present invention;
FIG. 16a is a schematic diagram of the smoothing without user action possibility according to the fourth embodiment of the present invention;
FIG. 16b is a schematic diagram of the smoothing of the possibility of user action according to the fourth embodiment of the present invention;
fig. 17 is a schematic flowchart of a processing method for redundant motion information during VR audio acquisition according to a fourth embodiment of the present invention;
fig. 18 is a schematic view of an angular rotation existing in a three-dimensional space according to a fifth embodiment of the present invention;
FIG. 19 is a schematic view of an angular rotation according to the fifth embodiment of the present invention;
FIG. 20 is a diagram illustrating a redundant angle elimination process for a pair of multi-channel signals according to a fifth embodiment of the present invention;
FIG. 21 is a diagram illustrating a redundant angle elimination process for two-pair multi-channel signals according to a fifth embodiment of the present invention;
fig. 22 is a schematic diagram of displacement existing in a three-dimensional space according to a fifth embodiment of the present invention;
fig. 23 is a schematic flowchart of a redundant displacement elimination process according to a fifth embodiment of the present invention;
FIG. 24 is a schematic diagram of a redundant shift elimination process according to a fifth embodiment of the present invention;
FIG. 25 is a schematic diagram of multi-channel conversion of Ambisonic signals provided in the fifth embodiment of the present invention;
fig. 26 is a schematic structural diagram of a VR audio signal processing apparatus according to a sixth embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The technical solution of the embodiments of the present invention is specifically described below with reference to the accompanying drawings.
An embodiment of the present invention provides a method for processing a VR audio signal, as shown in fig. 3, including:
step S310: and acquiring redundant motion information corresponding to the VR audio signal.
In the embodiment of the present invention, the redundant motion information mainly includes:
(1) redundant motion information during VR audio acquisition
That is, when the user performs VR content acquisition, redundant motion is generated, such as jitter generated when the user holds the VR acquisition device to acquire VR content. In this case, redundant motion information of the VR audio signal at the time of acquisition needs to be acquired so as to perform adjustment processing on the acquired VR audio signal.
(2) Redundant motion information while playing VR Audio
That is, when a user plays VR content using a VR playback device, redundant motion is generated in the VR playback device due to the influence of the external environment. In this case, redundant motion information of the VR audio signal during playing needs to be obtained, so as to adjust the VR audio signal to be played.
Specifically, in the embodiment of the invention, according to the collected scene video signal and/or sensor signal, the redundant motion information corresponding to the VR audio signal is determined through a deep learning network or a mode identification and the like.
The VR acquisition equipment provided by the embodiment of the invention can also be called VR recording equipment, and the acquisition of VR content can also be called recording VR content.
The inventor of the present invention has observed that, in practical cases, the redundant motion is composed of two position changes, i.e. in the embodiment of the present invention, the obtained redundant motion information at least includes: redundant angular rotation parameters and/or redundant displacement parameters.
In practical applications, the scene video signal may be a video signal of a real scene captured by a camera of the VR acquisition device, or a video signal of a visual field captured by a camera of the VR playback device, where the scene video signal includes redundant motion data information and may be obtained by analyzing a video frame.
The sensor signal may be captured by a sensor of the VR acquisition device or the VR playback device, and the signal includes redundant motion data information. Specifically, the displacement-related data information of the redundant motion may be obtained from sensors such as an acceleration sensor or a displacement sensor; the angular rotation-related data information of the redundant motion may be obtained from a magnetic sensor, an angular sensor, or the like. When a plurality of sensors exist, the data obtained by the sensor with higher precision can be selected, or the plurality of data are weighted and averaged to obtain the final parameter.
Step S320: and adjusting the VR audio signal according to the redundant motion information.
The VR audio signal may be an audio signal of a real scene captured by a VR audio capturing device (hereinafter, described by taking a microphone as an example) of the VR capturing device, and may be played in a VR playing device, where the signal may be an Ambisonic (Ambisonic) signal or a multichannel audio signal.
Redundant motion can be analyzed in terms of displacement changes or angular changes, broken down into two parts. As shown in fig. 4, the rotation angle and the displacement amount corresponding to the redundant motion of the VR audio signal are calculated and eliminated in the post-processing, respectively.
Adjusting the VR audio signal, specifically, according to the redundant angle rotation parameter, performing redundant angle elimination processing on the VR audio signal; and/or according to the redundant displacement parameter, carry out redundant displacement elimination processing to VR audio signal, generate stable VR audio frequency, effectively promote the uniformity of video and audio frequency.
In order to more clearly illustrate the technical solution of the present invention, a plurality of embodiments of the present invention are specifically described below.
Example one
The embodiment of the invention provides a description of how to eliminate redundant motion generated when a VR audio signal is collected.
For example, when a user holds a VR acquisition device to acquire VR content, if the user is walking, running or in other states, redundant motions such as shaking and shaking may be generated, after the acquisition device detects the redundant motions, the acquisition device may perform corresponding processing on the acquired audio according to the redundant motions, or may send related information to other processing devices, and the other processing devices perform corresponding processing on the acquired audio according to the redundant motions.
As shown in fig. 3, when performing the processing of eliminating the redundant motion, the redundant motion information may be determined first, and then the VR audio signal may be adjusted according to the redundant motion information. Specifically, in the first embodiment of the present invention, the redundant motion information includes redundant motion information during VR audio acquisition, and the acquired VR audio signal is adjusted according to the redundant motion information during VR audio acquisition.
As can be seen from the above, the collected scene video signal and the sensor signal can both obtain redundant motion data information, in the first embodiment of the present invention, only one of the two signals can be selected for processing, and if both signals exist, the data can be weighted and averaged, so that the processing effect is more accurate.
Specifically, in step S310, the valid motion information and the redundant motion information corresponding to the VR audio signal are obtained. In practical applications, the captured scene video signal or sensor signal contains both user motion and redundant motion. The purpose of this step is to determine which part of the acquired signal is redundant motion information and which part is valid motion information. In the present invention, the effective motion information may be given in the form of user action possibility (probability). Wherein, the user action possibility refers to the probability of which action the current motion of the user may belong to.
Preferably, as shown in fig. 5, the valid motion information and the redundant motion information corresponding to the VR audio signal can be more accurately identified in combination with the user motion pattern. The user motion pattern is a user personalized motion pattern, and can represent a personalized motion rule of the user, and the user motion pattern may be a user motion pattern of the user obtained through a historical record, and may be historical redundant motion information, that is, a redundant angle rotation parameter and/or a redundant displacement parameter obtained in a historical manner when VR audio is collected.
According to the collected scene video signals and/or sensor signals, the redundant motion information (redundant angle rotation parameters and/or redundant displacement parameters) and the effective motion information (user action possibility) can be more accurate by combining the user motion mode. For example, when different users perform the same motion, the motion information may be different, and the corresponding redundant motion information may also be different, so that when the redundant motion information and the effective motion information are obtained, if the motion mode information of the user is referred to, the redundant motion information and the effective motion information can be more accurately extracted, and the accuracy of the subsequent audio signal adjustment processing is further improved.
As can be seen from the above, the present invention may determine the redundant motion information corresponding to the VR audio signal through the deep learning network or the pattern recognition mode, and specifically, determine the effective motion information and the redundant motion information corresponding to the VR audio signal through the deep learning network or the pattern recognition mode according to the collected scene video signal and/or the sensor signal.
The specific processing mode based on the pattern recognition mode is as follows:
the collected scene video signals and/or sensor signals are subjected to low-pass filtering or windowing processing or self-adaptive filtering, and effective motion information and redundant motion information are obtained by adopting a pattern recognition method, such as a KNN (k-nearest neighbor classification algorithm), a Bayesian algorithm and the like.
Preferably, as shown in fig. 6, according to the captured scene video signal and/or the sensor signal, redundant motion information (including redundant angular rotation parameters and/or redundant displacement parameters) and effective motion information (which may be user action possibility) can be obtained based on the pattern recognition mode. According to the collected scene video signals and/or sensor signals, the user motion patterns are combined, effective motion information (corresponding to the predicted user motion in fig. 6) is predicted through matching with the preset general motion patterns, and the predicted effective motion information is used for further confirming the action possibility of the user (namely effective motion information corresponding to the VR audio signals). The probability value corresponding to the predicted user motion can be determined from the probability values of the user motion possibilities obtained based on the mode identification mode, and the probability value is used as finally determined effective motion information, namely the user motion possibilities.
And determining effective motion information corresponding to the VR audio signal according to the predicted effective motion information and the effective motion information obtained in the mode identification mode.
The preset general motion mode comprises a basic common motion type and corresponding rotation and displacement parameters. The conventional lens direction movement is also included, for example, the lens is horizontally rotated by 90 degrees to the right, that is, the displacement is unchanged, and the angle is rotated by 90 degrees clockwise along the z-axis. The universal movement pattern can characterize the rules of movement for the prevalence of the underlying common movement types.
As an example, the user motion pattern and the general motion pattern are pattern-matched with the scene video signal and the sensor signal, the predicted user motion obtained by pattern matching is "running", the probability that the user motion obtained by the pattern recognition method corresponds to "running" is 80%, the probability that the user motion corresponds to "walking" is 20%, and the probability that the user motion is predicted according to the predicted user motion obtained by pattern matching is 80% of the probability that the user motion is predicted to be "running".
As shown in fig. 7, the specific processing manner of the Artificial Intelligence (AI) based deep learning network is as follows:
and determining effective motion information and redundant motion information corresponding to the VR audio signal through a deep learning network according to the collected scene video signal and/or sensor signal and in combination with the user motion mode.
Specifically, in the deep learning network training stage, the redundant motion data, that is, the redundant angle rotation parameter and the redundant displacement parameter, and the user motion data excluding the redundant motion may be mixed into the mixed mobile data, and feature extraction may be performed thereon. Taking the user action data excluding the redundant motion as a training label, and extracting features; the two and the user motion pattern are input into the deep learning network to carry out network model training (corresponding to the AI-based shake capture in fig. 7), and a deep learning network model is obtained through training.
Applying the network model to perform online processing: the collected scene video signal and/or sensor signal is used as mixed movement data, after feature extraction, the collected scene video signal and/or sensor signal and the user movement mode are input into the network model for processing (corresponding to the AI-based shake capture in fig. 7), and then the redundant angle rotation parameter, the redundant displacement parameter and the user action possibility during collecting VR audio can be obtained.
The AI-based method derived user action likelihood also refers to the probability to which the user's current action is most likely to belong. For example, as can be seen from the analysis, if the probability of "running" corresponding to the current action of the user is 80%, and the probability of "walking" corresponding to the current action of the user is 20%, the probability of the action of the user based on the AI method is the highest probability value, and is 80%.
In the first embodiment of the present invention, after obtaining the redundant angle rotation parameter, the redundant displacement parameter, and the user action possibility corresponding to the VR audio signal, the VR audio signal may be adjusted, and a specific processing method will be described in detail below.
Example two
The second embodiment of the present invention also introduces how to eliminate redundant motion generated when a VR audio signal is captured, and is different from the first embodiment in that it is difficult for a VR capture device to locate a VR audio capture device (such as a microphone) and a VR video capture device (such as a camera) at the same position, as shown in fig. 8, a user may generate common jitter of the microphone and the camera when using such a device to record VR content, and such a position difference may cause a serious sound-picture misregistration due to the common jitter generated by different holding points. Especially when the VR audio frequency collection equipment shakes, this kind of dystopy is more obvious, and is great to the accuracy influence that VR video space felt.
Therefore, taking the example that the VR audio acquisition device is located above the VR video acquisition device as an example, for the influence of the ectopic position between the VR audio acquisition device and the VR video acquisition device on the VR audio signal when the VR audio acquisition device is used to acquire the VR audio, the processing method for the VR audio signal provided by the second embodiment of the present invention, as shown in fig. 9, further includes:
step S910: determining relative movement parameters of the VR audio acquisition device corresponding to the VR video acquisition device.
Specifically, the rotation angle of the acquisition device needs to be acquired according to the acquired sensor signal, and as shown in fig. 10, the movement angle θ of the VR acquisition device relative to the horizontal plane when the VR audio is acquired is extracted.
And acquiring the position relationship between the VR audio acquisition device and the VR video acquisition device in the VR acquisition device (corresponding to the position relationship between the microphone and the camera in fig. 11), where the position relationship is acquired by using the device calibration parameters and is stored in advance to obtain the distance L between the VR audio acquisition device and the VR video acquisition device.
According to the rotation angle theta of the acquisition equipment and the position relation L of the pre-stored VR audio acquisition equipment and the pre-stored VR video acquisition equipment, determining the relative movement parameters of the VR audio acquisition equipment corresponding to the VR video acquisition equipment, and matching the positions of the VR audio acquisition equipment and the VR video acquisition equipment.
Continuing with FIG. 10, the relative movement parameters are Lcos θ in the x-direction and Lsin θ in the y-direction.
Optionally, when determining valid motion information and redundant motion information corresponding to the VR audio signal, the relative movement parameter may be used in the AI deep learning network at the same time to ensure that the redundant motion information includes the relative movement parameter of the VR audio capture device corresponding to the VR video capture device.
Step S920: and carrying out relative movement correction processing on the VR audio signal according to the relative movement parameter.
In the second embodiment of the present invention, after the relative movement parameter corresponding to the VR audio signal is obtained for the ectopic position of the visual center and the auditory center, the VR audio signal may be subjected to the relative movement correction, and the microphone relative movement correction is to match the position of the camera to correct the position of the auditory center, so that the VR audio sounds more natural.
The relative movement parameter corresponding to the VR audio signal is equivalent to a redundant displacement parameter, that is, performing the relative movement correction on the VR audio signal is equivalent to performing the redundant displacement elimination processing on the VR audio signal in the first embodiment, and a specific processing method will be described in detail below.
It should be noted that, as shown in fig. 11, the VR audio signal that is subjected to the relative movement correction processing by acquiring the relative movement parameter also needs to perform the processing procedure in the first embodiment. That is, after the VR audio signal is subjected to the relative movement correction processing, the redundant motion elimination processing in the first embodiment needs to be performed on the VR audio signal after the correction processing.
EXAMPLE III
In the third embodiment of the present invention, how to eliminate redundant motion generated by the external environment when the VR audio signal plays the VR audio is described.
Considering that when a user watches the played VR content, the user may have certain redundant motions, such as the fluctuation of a ship, the left-right swing of an automobile, and the like, besides some actions of manipulating the VR content, and the redundant motions generated by external factors need to be eliminated to ensure the stability and accuracy of the VR audio.
In the VR audio signal processing method provided by the third embodiment of the present invention, the VR playback device may perform corresponding processing on the audio played in the speaker or the earphone according to the redundant motion of the external environment, so as to eliminate the redundant motion, or other processing devices may perform corresponding processing according to the redundant motion of the external environment, and then send the processed audio to the VR playback device, and play the processed audio through the speaker or the earphone.
Specifically, the redundant motion information includes redundant motion information when playing VR audio; according to the redundant motion information, adjusting the VR audio signal, including: and adjusting the VR audio signal to be played according to the redundant motion information when the VR audio is played.
As shown in fig. 12, before obtaining the redundant motion information corresponding to the VR audio signal, a VR scene analysis is performed on the VR video signal to be played first, and a redundant motion model extracted from the VR scene is determined. And analyzing the playing scene of the acquired sensor signal, and determining a redundant motion model extracted from the real scene of the playing VR content.
The redundant motion model determined from the VR video signal to be played (corresponding to the dither model in fig. 12) is matched to the redundant motion model determined from the acquired sensor signal (corresponding to the dither model in fig. 12). If the two redundant motion models are the same, the fact that redundant motion does not exist in the real scene is shown, and at the moment, the redundant motion models output the original audio signals.
And when the signals are not matched, executing the step of acquiring redundant motion information corresponding to the VR audio signal, and then adjusting the VR audio signal according to the redundant motion information. Specifically, in the third embodiment of the present invention, the redundant motion information is redundant motion information of an external environment in a playing scene when the VR audio is played, and the VR audio signal to be played is adjusted according to the redundant motion information when the VR audio is played.
Optionally, the playing scene analysis may also be performed on the scene video signal acquired in real time, a redundant motion model extracted from the real scene of the playing VR content is determined, and then the redundant motion model determined according to the VR video signal to be played is matched with the redundant motion model determined according to the scene video signal acquired in real time. And when the signals are not matched, executing the step of acquiring redundant motion information corresponding to the VR audio signal, and then adjusting the VR audio signal according to the redundant motion information. Or, matching a redundant motion model determined according to the VR video signal to be played, a redundant motion model determined according to the scene video signal collected in real time and a redundant motion model determined according to the collected sensor signal, when the redundant motion models are not matched, executing the step of obtaining redundant motion information corresponding to the VR audio signal, and then adjusting the VR audio signal according to the redundant motion information.
Specifically, in step S310, the valid motion information and the redundant motion information corresponding to the VR audio signal are obtained. In practice, the sensor signal also contains both user actions and redundant movements of the external environment. The purpose of this step is to determine which part of the collected signal is redundant motion information and which part is valid motion information, i.e., the user action probability.
Preferably, the valid motion information and the redundant motion information corresponding to the VR audio signal can be more accurately identified in combination with the user motion pattern. For a detailed description of the user motion pattern, refer to the first embodiment, and are not described herein again.
In the third embodiment of the present invention, the redundant motion information corresponding to the VR audio signal may be determined through a deep learning network, and specifically, as shown in fig. 13, the redundant motion information corresponding to the VR audio signal is determined through the deep learning network according to at least one of the following items: acceleration information of the VR playing device; collected sensor signals; a captured scene audio signal; a user motion pattern. The acceleration information of the VR player can be determined by the collected scene video signal and/or the collected sensor signal.
It is assumed that a displacement vector generated by the motion of the user is a, an external environment (such as a vehicle) is exemplified by a vehicle, the generated displacement vector is B, and a vector detected by the sensor is S. Then S ═ a + B is known. The user motion pattern for generating displacement vector a is very different from the vehicle motion pattern for generating displacement vector B.
The motion of the vehicle generally has two states, a constant speed running state and an acceleration state. The motion of the vehicle at a constant speed is regular and predictable, such as a slight upward movement, a slight downward movement, or a side-to-side sway of the direction of sound. The movement of the vehicle in the acceleration running state is irregular, such as turning or starting and stopping. Generally, the speed of the acceleration driving state is higher than the speed of the self-movement of the user. This enables the uniform speed and acceleration state of the vehicle itself, the movement of the vehicle and the movement of the user to be well distinguished.
According to the above situation, the problem of extracting the user's own motion under complex conditions can be regarded as a noise suppression problem, wherein the goal is to obtain the user action a, and the noise is the displacement vector B generated by the external environment, i.e. the redundant motion information corresponding to the redundant motion.
Specifically, as shown in fig. 13, in the training phase, redundant motion data of the external environment and user motion data excluding the redundant motion need to be mixed into mixed movement data, and feature extraction is performed on the mixed movement data. And taking the user action data excluding the redundant motion as a training label, and extracting the characteristics. In addition, the sound effect of the external environment scene can be distinguished by collecting the audio signal of the external environment (i.e. scene audio signal), for example, the environmental sounds of ships, airplanes and vehicles can be greatly distinguished from each other. And also acceleration information of the VR player device determined based on the captured scene video signal and/or the sensor signal (fig. 13 does not show the sensor signal during the training phase). The above information and the user motion pattern are input into the deep learning network for training (corresponding to the AI-based jitter detection in fig. 13), and a network model is obtained by training.
As shown in fig. 14, capturing a video signal of a real environment by a camera to obtain a scene video signal, marking the scene video signal with a reference object, that is, marking an object in a field of view as a reference object, calculating a relative acceleration a of the external environment according to a position of the reference object in the current frame and a position of the reference object in the next frame, and then inputting the relative acceleration a into a trained network model.
The relative acceleration a is calculated as follows:
as shown in fig. 15, where t (n) is the time of the nth frame, t (n +1) is the time of the (n +1) th frame, and x is the distance moved by the reference object from the nth frame to the (n +1) th frame.
And then, applying the network model obtained by the training to perform online processing: the collected sensor signals are used as mixed movement data, and are input into the network model for processing (corresponding to AI-based shake detection in fig. 13) together with collected scene audio signals, acceleration information of VR playback equipment determined based on collected scene video signals and/or sensor signals, and user movement patterns, so that redundant angle rotation parameters, redundant displacement parameters, and user movement possibility during VR audio playback can be obtained.
In the third embodiment of the present invention, after obtaining the redundant angle rotation parameter, the redundant displacement parameter, and the user action possibility corresponding to the VR audio signal, the VR audio signal may be adjusted and processed to render a new audio signal for playing, and a specific processing method will be described in detail below.
Example four
Before the VR audio signal is adjusted, for the redundant motion information and the effective motion information corresponding to the VR audio signal acquired in the first embodiment, the second embodiment, and the third embodiment, in the fourth embodiment of the present invention, the redundant motion information may be smoothed based on the effective motion information, so that the parameters are more accurate and effective, the stability of the parameters is improved, the residual noise is eliminated, and the delay of the parameter output is avoided.
Specifically, the redundant motion information after the current frame smoothing is determined according to the effective motion information of the current frame of the VR audio signal, the redundant motion information of the current frame and the redundant motion information after the previous frame smoothing.
Hereinafter, a method of data smoothing that can be used will be described by taking smoothing of redundant displacement parameters based on the likelihood of user action as an example.
① moving average method
The moving average method is an improved arithmetic average method, which is a prediction method for forming an average value time sequence by sequentially calculating an average value comprising a certain period number according to the gradual transition of the time sequence so as to reflect the development trend of things. The size of the moving period is determined according to specific conditions, the moving period is small, the change can be reflected quickly, but the change trend cannot be reflected; the moving period is large, the variation trend can be reflected, but the predicted value has obvious lag deviation.
The basic idea of the moving average method is as follows: the moving average can eliminate or reduce the random variation of time sequence data caused by the interference of accidental factors, and is suitable for short-term prediction. The calculation formula is as follows:
wherein, YtIs the predicted value of the displacement of the next period, n is the number of periods of the moving average, Xt-1、Xt-2、Xt-3、……、Xt-nIs the actual value of the displacement in the early stage.
② weighted average method
The weighted moving average gives unequal weight to each variable value within a fixed span time period. The principle is as follows: the effect of historical period data on predicting the future is not the same. In addition to the periodic variation with a period of n, the variable values away from the target period have a relatively low influence and should be given a lower weight. The calculation formula is as follows:
wherein, YtIs the predicted value of the displacement of the next period, n is the number of periods of the moving average, Xt-1、Xt-2、Xt-3、……、Xt-nIs the actual value of the displacement in the early stage. w is a1Is the weight of the actual data in the t-1 period, w2Is the weight of the actual data in period t-2, w3Is the weight of the actual data in the t-3 period, … …, wnIs the weight of the actual data in the t-n period, and w1+w2+w3+…+wn=1。
③ exponential smoothing method
The exponential smoothing method gives a greater weight to the historical data closer to the prediction on the premise of not discarding the historical data, and the weight decreases exponentially from near to far, carries out weighted average calculation according to the actual value and the predicted value in the current period and by means of a smoothing coefficient α to predict the value in the next period, gives weighted smoothing to the time series data to obtain the regular change and trend thereof, and the calculation formula is as follows:
Yt=αXt-1+(1-α)Yt-1
wherein, YtIs a displacement smoothed value of time t, Xt-1Is the actual value of the displacement at time t-1, Yt-1Is the displacement smoothing value at time t-1, and α is the smoothing coefficient with a value ranging from 0 to 1.
In the fourth embodiment of the present invention, the action possibility of the user that can be used in the smoothing process of the displacement parameter is: and determining the redundant displacement parameters after the current frame is smoothed according to the user action possibility of the current frame, the redundant displacement parameters of the current frame and the redundant displacement parameters after the previous frame is smoothed. The calculation formula is as follows:
wherein,is the redundant displacement parameter after the smoothing process of the nth frame,is the redundancy shift parameter of the nth frame acquired in the first to third embodiments,is the redundant displacement parameter after the smoothing of the previous frame, and P is the user action possibility.
The smoothing of the redundant angular rotation parameters is similar to that described above,in the above formula, willReplacing with redundant angle rotation parameters of the nth frameReplacing the redundant angle rotation parameter with the redundant angle rotation parameter after the previous frame is subjected to smoothing processing to obtainNamely the redundant angle rotation parameter after the smoothing processing of the nth frame.
The redundant motion information without smoothing may have residual noise, i.e. estimation error, and the smoothing result without referring to the user action possibility is shown in fig. 16 a. The smoothing result without reference to the possibility of user action may cause a delay as shown in fig. 16 b. Taking the moving average method as an example, the smooth value of time t is related to the actual values of time t-1 and t-2 … t-n, so the smooth value of time t must be obtained from time t-n to time t-1, and there is a delay time of length n as shown in fig. 16 b. But if user action possibilities are introduced, the actual value matching weights can be adjusted to reduce the delayed interference. For example, the motion situation of the motion performed by the user at the moment can be known according to the motion possibility of the user, so that the motion situation of the user can be predicted, including redundant motion and effective motion, and therefore, the delay and the residual noise are eliminated by smoothly eliminating the motion possibility of the user.
As shown in fig. 17, the redundant angle rotation parameters, the redundant displacement parameters, and the user action possibility obtained in the first embodiment may be subjected to the smoothing processing, so as to obtain smoothed redundant angle rotation parameters and redundant displacement parameters, so as to more accurately perform adjustment processing on the VR audio signal.
As shown in fig. 11, the redundant angle rotation parameter, the redundant displacement parameter, and the user action possibility obtained in the second embodiment may be subjected to the smoothing processing, so as to obtain the smoothed redundant angle rotation parameter and the smoothed redundant displacement parameter, and the VR audio signal may be more accurately adjusted by combining the relative movement parameter.
As shown in fig. 12, the redundant angle rotation parameters, the redundant displacement parameters, and the user action possibility obtained in the third embodiment may be subjected to the smoothing processing, so as to obtain the smoothed redundant angle rotation parameters and the smoothed redundant displacement parameters, so as to more accurately perform the adjustment processing on the VR audio signal.
The specific processing method for the VR audio signal will be described in detail in the fifth embodiment.
EXAMPLE five
For the redundant angle rotation parameter, the redundant displacement parameter, and the relative movement parameter corresponding to the VR audio signal acquired in the first embodiment, the second embodiment, the third embodiment, and the fourth embodiment, in the fifth embodiment of the present invention, the VR audio signal is adjusted, that is, the specific processing method in the step S320 is described in detail.
As can be seen from the above, in the present invention, the VR audio signal may be an Ambisonic (Ambisonic) signal or a multichannel audio signal. The adjusting the VR audio signal according to the redundant motion information may include: according to the redundant angle rotation parameter, redundant angle elimination processing is carried out on the VR audio signal; and/or performing redundant displacement elimination processing on the VR audio signal according to the redundant displacement parameter.
In the fifth embodiment of the present invention, an implementation method of specific redundant angle elimination processing is as follows:
as shown in fig. 18, the angular rotation of the redundant motion in the three-dimensional space is decomposed into the amount of change in the clockwise angular rotation in the x, y, and z directions.
If the VR audio signal is a multi-channel audio signal, the multi-channel signal needs to be matched to the virtual speaker according to the position, and the corresponding weights are calculated according to the redundant angle rotation parameters, and finally the weights are synthesized into the rotating speaker, as shown in fig. 19. According to the multi-channel audio signal, a first virtual loudspeaker array before redundant angle elimination processing is determined, namely the loudspeaker array which correspondingly generates redundant angle rotation. And determining the audio signal of each second virtual loudspeaker in the second virtual loudspeaker array after redundant angle elimination according to the redundant angle rotation parameter and the audio signal of each first virtual loudspeaker in the first virtual loudspeaker array. Specifically, there are two processing methods:
the method comprises the following steps: considering only the influence of neighboring virtual loudspeakers
According to the method, aiming at each second virtual loudspeaker in a second virtual loudspeaker array, according to a redundant angle rotation parameter, determining an angle relation between the second virtual loudspeaker and each adjacent first virtual loudspeaker; and determining the audio signal of the second virtual loudspeaker according to the audio signals of the first virtual loudspeakers adjacent to the second virtual loudspeaker and the determined angular relationship.
As shown in FIG. 20, A, B represent two adjacent first virtual speakers in the first virtual speaker array, and the corresponding signals are SA、SBC is a second virtual speaker located between A, B in the second virtual speaker array (for convenience and clarity, the second virtual speakers at other positions except C in the second virtual speaker array are hidden in the figure, and the second virtual speakers at other positions can be processed by analogy with C, which is not described in detail later), and the included angles between C and A, B are α and β, respectively, from the obtained redundant angle rotation parameter, then the signal S corresponding to the second virtual speaker C is obtainedCThe method comprises the following steps:
SC=SAcosα+SBcosβ
the second method comprises the following steps: taking into account the influence of all virtual loudspeakers
According to the method, aiming at each second virtual loudspeaker in a second virtual loudspeaker array, according to a redundant angle rotation parameter, determining an angle relation between the second virtual loudspeaker and each first virtual loudspeaker in a first virtual loudspeaker array; determining the audio signal of the second virtual loudspeaker according to the audio signal of each first virtual loudspeaker and the determined angle relation;
as shown in fig. 21, the dark circles indicate the first virtual speaker array, where the signal corresponding to the ith first virtual speaker is Si. The light-colored dot a is any one of the second virtual speakers in the second virtual speaker array (for convenience and clarity, the second virtual speakers at other positions in the second virtual speaker array except for a are hidden in the figure, and the second virtual speakers at other positions can be processed by analogy with a, which is not described in detail later). The included angle between the ith first virtual loudspeaker and A (light-color dots) in the first virtual loudspeaker array (dark-color dots) is thetai
The signal S corresponding to the second virtual loudspeaker a in the second virtual loudspeaker arrayAThe method comprises the following steps:
if the VR audio signal is an Ambisonics signal, the rotation angle of the Ambisonics signal is determined according to the acquired redundant angle rotation parameter, and the Ambisonics signal is rotated according to the rotation angle of the Ambisonics signal.
The 1-order Ambisonic signal processing method is taken as an example for introduction:
the Ambisonic signal is composed of a plurality of recording channel signals, wherein the 1 st order Ambisonic signal has 4 recording channel signals, which are W, X, Y, Z respectively, wherein W is the omnidirectional recording channel signal and does not need to rotate, X, Y, Z points to the X, Y and Z axes respectively, and X ', Y ' and Z ' are rotated X, Y, Z channel signals respectively. The rotation formula is:
[X′ Y′ Z′]=[X Y Z]J
wherein, the obtained redundant angle rotation parameters can be used to obtain that the clockwise rotation angles of the Ambisonics signals around the x, y and z axes are respectively theta, and z,ω, the rotation matrix J can be calculated:
in the fifth embodiment of the present invention, an implementation method of specific redundant displacement elimination processing is as follows:
as shown in fig. 22, the redundant motion is shifted in position in the three-dimensional space and decomposed into displacement variations in the x, y, and z directions.
As shown in fig. 23, it is necessary to determine whether the VR audio signal is an Ambisonic signal or a multichannel signal.
If the VR audio signal is a multi-channel audio signal, determining a third virtual loudspeaker array before redundant displacement elimination according to the VR audio signal; and determining the audio signal of each fourth virtual loudspeaker in the fourth virtual loudspeaker array after the redundant displacement elimination processing is carried out according to the redundant displacement parameter and the audio signal of each third virtual loudspeaker in the third virtual loudspeaker array.
As shown in fig. 23, if the VR audio signal is an Ambisonic signal, before performing the redundancy shift elimination processing on the VR audio signal, the Ambisonic signal is subjected to a multi-channel conversion processing (corresponding to the multi-channel structure conversion in fig. 23); and then, executing a redundant displacement elimination processing process which is the same as the multichannel sound signal, and converting the audio signal subjected to the redundant displacement elimination processing into an Ambisonic signal to obtain the Ambisonic signal subjected to the redundant displacement elimination processing.
The redundant displacement elimination process described above is described as follows:
as shown in fig. 24, for each fourth virtual speaker in the fourth virtual speaker array, determining a relative positional relationship between the fourth virtual speaker and each third virtual speaker according to the redundant displacement parameter; and determining the audio signal of the fourth virtual loudspeaker according to the audio signal of each third virtual loudspeaker and the determined relative position relation.
Specifically, in a first step, a fourth virtual speaker array B is placed at a default position with each fourth virtual speaker in the horizontal plane at a radius r from the listener. If the fourth virtual speaker array B has n fourth virtual speakers and is equidistantly distributed, the default position is equidistantly distributed on a circle with a radius r. The non-equidistant distribution is similar to the equidistant distribution calculation method, and the equidistant distribution is taken as an example to introduce the fifth embodiment of the invention. In this step, taking the top fourth virtual speaker as an example, the coordinates of the top fourth virtual speaker are (a, b).
And secondly, moving the fourth virtual loudspeaker array B to the position of the third virtual loudspeaker array A according to the acquired redundant displacement parameters. Assuming that the redundant displacement parameter is a variable Δ c in the y direction at this time, the coordinate of the third virtual speaker at the top of the third virtual speaker array a at this time is (a, b + Δ c). In this case, the ith third virtual speaker of the third virtual speaker array A is located at a distance x from the listener in the planeiThe connection line of the ith third virtual loudspeaker and the listener forms an included angle theta with the positive y directioni
And thirdly, respectively calculating the relative positions of each third virtual loudspeaker in the third virtual loudspeaker array A and each fourth virtual loudspeaker in the fourth virtual loudspeaker array B. Continuing with the example of the top fourth virtual speaker, the relative distance d between the third virtual speaker i in the third virtual speaker array A and the top fourth virtual speaker in the fourth virtual speaker array BiComprises the following steps:
and fourthly, calculating the audio signal of each fourth virtual loudspeaker in the fourth virtual loudspeaker array B according to the relative distance between each fourth virtual loudspeaker in the fourth virtual loudspeaker array B and each third virtual loudspeaker in the third virtual loudspeaker array A and the audio signal of each third virtual loudspeaker in the third virtual loudspeaker array A.
Specifically, the relative distance of the audio signal of each third virtual speaker in the third virtual speaker array a may be divided by the relative distance of the fourth virtual speaker in the fourth virtual speaker array B, i.e., the relative distance obtained in the previous step, and the attenuation of the relative distance may be obtained. Continuing to take the top-end fourth virtual speaker as an example, the audio signal after the adjustment processing of the top-end fourth virtual speaker is:
wherein y (t) is the audio signal after the adjustment process corresponding to the fourth virtual speaker in the fourth virtual speaker array B, si(t) is the audio signal of the third virtual speaker i in the third virtual speaker array A. And calculating each fourth virtual loudspeaker in the array B to obtain the fourth virtual loudspeaker B after the displacement jitter is eliminated by adjusting the audio signal.
In the displacement elimination process, the position of the virtual speaker is actually kept unchanged, and the audio signal of the virtual speaker is adjusted according to the displacement parameter.
In the fifth embodiment of the present invention, the order of the redundant displacement elimination processing and the process redundant angle elimination processing for the VR audio signal may be reversed, which is not limited herein.
In the fifth embodiment of the present invention, the multichannel conversion processing on the Ambisonic signal specifically includes:
the Ambisonic signals are composed of a plurality of recording channel signals, wherein the 1 st order Ambisonic signals have 4 recording channel signals, respectively W, X, Y, Z, wherein W is the omni-directional recording channel signal which does not need to be rotated, and X, Y, Z points to the x, y, and z axes, respectively. As shown in fig. 25, taking four virtual speakers uniformly distributed in the xOy plane as an example, the following relationships exist with their corresponding multi-channel outputs:
thus, the Ambisonic signal can be subjected to multi-channel conversion processing and the audio signal subjected to the redundancy displacement elimination processing can be converted into the Ambisonic signal.
Therefore, the VR audio signal complete processing procedure for the user redundant motion in the first embodiment continues as shown in fig. 17: according to the collected scene video signals and/or sensor signals, in combination with a user motion mode, determining redundant motion information (including redundant angle rotation parameters and/or redundant displacement parameters) and effective motion information (including user action possibility) corresponding to the VR audio signals, and performing the smoothing processing on the redundant angle rotation parameters, the redundant displacement parameters and the user action possibility to obtain smooth redundant angle rotation parameters and smooth redundant displacement parameters. And carrying out redundant angle elimination processing on the VR audio signal according to the smooth redundant angle rotation parameter, and carrying out redundant displacement elimination processing on the VR audio signal according to the smooth redundant displacement parameter to obtain the adjusted VR audio signal.
The VR audio signal complete processing procedure for the redundant motion when there is an ectopic position between the VR audio acquisition device and the VR video acquisition device in the second embodiment continues as shown in fig. 11: according to the acquired sensor signals and the pre-stored position relationship between the VR audio acquisition equipment and the VR video acquisition equipment (corresponding to the position relationship between the microphone and the camera in the graph 11), determining the relative movement parameters of the VR audio acquisition equipment corresponding to the VR video acquisition equipment, and using the relative movement parameters to correct the relative displacement of the microphone so as to match the position of the camera. Meanwhile, according to the collected scene video signals and/or sensor signals, in combination with the user motion mode, determining redundant motion information (including redundant angle rotation parameters and/or redundant displacement parameters) and effective motion information (including user action possibility) corresponding to the VR audio signals, and performing the smoothing processing on the redundant angle rotation parameters, the redundant displacement parameters and the user action possibility to obtain smooth redundant angle rotation parameters and smooth redundant displacement parameters. And carrying out relative movement correction processing on the VR audio signal according to the relative movement parameters, carrying out redundancy angle elimination processing on the VR audio signal according to the smooth redundancy angle rotation parameters, and carrying out redundancy displacement elimination processing on the VR audio signal according to the smooth redundancy displacement parameters to obtain the adjusted VR audio signal.
The VR audio signal completion processing procedure for the redundant motion of the external environment in the third embodiment continues as shown in fig. 12: the redundant motion model determined from the VR video signal to be played (corresponding to the dither model in fig. 12) is matched to the redundant motion model determined from the acquired sensor signal (corresponding to the dither model in fig. 12). If the two redundant motion models are the same, the fact that redundant motion does not exist in the real scene is shown, and at the moment, the redundant motion models only need to output original VR audio signals. If the two redundant motion models are different, the redundant motion information (including redundant angle rotation parameters and/or redundant displacement parameters) and the effective motion information (including user action possibility) corresponding to the VR audio signal are determined according to the collected scene video signals and/or the collected sensor signals and by combining the user motion mode, and the redundant angle rotation parameters, the redundant displacement parameters and the user action possibility are subjected to smoothing processing to obtain the smooth redundant angle rotation parameters and the smooth redundant displacement parameters. And carrying out redundant angle elimination processing on the VR audio signal according to the smooth redundant angle rotation parameter, and carrying out redundant displacement elimination processing on the VR audio signal according to the smooth redundant displacement parameter to obtain the adjusted VR audio signal.
It should be noted that the redundant motion of the user in the first embodiment, the redundant motion of the VR audio capture device and the VR video capture device in the second embodiment in the presence of an ectopic site, and the redundant motion of the external environment in the third embodiment may occur simultaneously. When two or three of the two or three occur simultaneously, the problem of the acquisition end needs to select a corresponding processing method according to the first embodiment and the second embodiment in combination with the fourth and fifth embodiments according to whether the ectopic problem of the microphone and the camera is included; the problem of the playing end is that the corresponding processing method is executed according to the third embodiment in combination with the fourth and fifth embodiments.
When the redundant motion of the acquisition end in the first embodiment or the second embodiment is processed, the VR audio acquisition device may process the VR audio signal, and when the redundant motion of the playback end in the third embodiment is processed, the VR audio playback device may process the VR audio signal.
EXAMPLE six
An embodiment of the present invention provides a processing device for a VR audio signal, as shown in fig. 26, including:
an obtaining module 2610, configured to obtain redundant motion information corresponding to the VR audio signal;
the processing module 2620 is configured to perform adjustment processing on the VR audio signal according to the redundant motion information.
Wherein the redundant motion information comprises: collecting redundant motion information during VR audio or playing redundant motion information during VR audio;
optionally, the processing module 2620 is specifically configured to adjust the collected VR audio signal according to the redundant motion information during the collection of the VR audio;
or, the processing module 2620 is specifically configured to perform adjustment processing on the VR audio signal to be played according to the redundant motion information when the VR audio is played.
Optionally, the obtaining module 2610 is specifically configured to determine, according to the acquired scene video signal and/or the sensor signal, redundant motion information corresponding to the VR audio signal through a deep learning network or a pattern recognition mode.
Optionally, the redundant motion information is redundant motion information when the VR audio is played;
the obtaining module 2610 is specifically configured to determine, through a deep learning network, redundant motion information corresponding to the VR audio signal according to at least one of the following:
acceleration information of the VR playing device;
collected sensor signals;
scene audio signals collected by VR;
a user motion pattern.
Optionally, the redundant motion information is redundant motion information when the VR audio is played;
the processing device provided by the sixth embodiment of the present invention further includes:
the matching module is used for matching a redundant motion model determined according to the VR video signal to be played with a redundant motion model determined according to the acquired sensor signal;
the obtaining module 2610 is specifically configured to, when the VR audio signal does not match the VR audio signal, perform a step of obtaining redundant motion information corresponding to the VR audio signal.
Optionally, the redundant motion information is redundant motion information during VR audio acquisition;
the processing device provided by the sixth embodiment of the present invention further includes:
the determining module is used for determining relative movement parameters of the VR audio acquisition equipment corresponding to the VR video acquisition equipment;
the processing module 2620 is specifically configured to perform relative movement correction processing on the VR audio signal according to the relative movement parameter.
Optionally, the determining module is specifically configured to obtain a rotation angle of the acquisition device according to the acquired sensor signal;
and the determining module is specifically used for determining relative movement parameters of the VR audio acquisition equipment corresponding to the VR video acquisition equipment according to the rotation angle of the acquisition equipment and the pre-stored position relationship between the VR audio acquisition equipment and the VR video acquisition equipment.
Optionally, the obtaining module 2610 is specifically configured to obtain valid motion information and redundant motion information corresponding to the VR audio signal.
The processing device provided by the sixth embodiment of the present invention further includes:
and the smoothing module is used for smoothing the redundant motion information based on the effective motion information.
Optionally, the obtaining module 2610 is specifically configured to determine, according to the acquired scene video signal and/or the sensor signal, effective motion information and redundant motion information corresponding to the VR audio signal through a deep learning network or a pattern recognition mode.
Optionally, the obtaining module 2610 is specifically configured to determine, according to the collected scene video signal and/or sensor signal, effective motion information and redundant motion information corresponding to the VR audio signal through a deep learning network in combination with the user motion mode;
or, the obtaining module 2610 is specifically configured to predict effective motion information by matching with a preset general motion pattern according to the collected scene video signal and/or sensor signal in combination with the user motion pattern;
the obtaining module 2610 is specifically configured to determine effective motion information corresponding to the VR audio signal according to the predicted effective motion information and the effective motion information obtained through the pattern recognition.
Optionally, the smoothing module is specifically configured to determine the redundant motion information of the current frame after the smoothing processing according to the effective motion information of the current frame of the VR audio signal, the redundant motion information of the current frame, and the redundant motion information of the previous frame after the smoothing processing.
Specifically, the redundant motion information includes: redundant angular rotation parameters and/or redundant displacement parameters.
Optionally, the processing module 2620 is specifically configured to perform redundant angle elimination processing on the VR audio signal according to the redundant angle rotation parameter;
and/or the processing module 2620 is specifically configured to perform redundant displacement elimination processing on the VR audio signal according to the redundant displacement parameter.
Optionally, the VR audio signal comprises a multi-channel audio signal;
the processing module 260 is specifically configured to determine, according to the multi-channel audio signal, a first virtual speaker array before performing the redundant angle elimination processing;
the processing module 2620 is specifically configured to determine, according to the redundant angle rotation parameter and the audio signal of each first virtual speaker in the first virtual speaker array, an audio signal of each second virtual speaker in the second virtual speaker array after the redundant angle elimination.
Optionally, the processing module 2620 is specifically configured to, for each second virtual speaker in the second virtual speaker array, respectively perform the following operations:
determining an angle relationship between the second virtual loudspeaker and each first virtual loudspeaker in the first virtual loudspeaker array according to the redundant angle rotation parameter; determining the audio signal of the second virtual loudspeaker according to the audio signal of each first virtual loudspeaker and the determined angle relation; or
According to the redundant angle rotation parameters, determining the angle relationship between the second virtual loudspeaker and each adjacent first virtual loudspeaker; and determining the audio signal of the second virtual loudspeaker according to the audio signals of the first virtual loudspeakers adjacent to the second virtual loudspeaker and the determined angular relationship.
Optionally, the VR audio signal comprises an Ambisonic signal;
the processing module 2620 is specifically configured to determine a rotation angle of the Ambisonics signal according to the redundant angle rotation parameter; and rotating the Ambisonics signal according to the rotation angle of the Ambisonics signal.
Optionally, the processing module 2620 is specifically configured to determine, according to the VR audio signal, a third virtual speaker array before performing the redundant displacement elimination processing;
the processing module 2620 is specifically configured to determine, according to the redundant displacement parameter and the audio signal of each third virtual speaker in the third virtual speaker array, an audio signal of each fourth virtual speaker in the fourth virtual speaker array after the redundant displacement elimination processing is performed.
Optionally, the processing module 2620 is specifically configured to, for each fourth virtual speaker in the fourth virtual speaker array, respectively perform the following operations:
determining the relative position relationship between the fourth virtual loudspeaker and each third virtual loudspeaker according to the redundant displacement parameters; and determining the audio signal of the fourth virtual loudspeaker according to the audio signal of each third virtual loudspeaker and the determined relative position relation.
Optionally, if the VR audio signal is an Ambisonic signal, the processing module 2620 is specifically configured to perform multi-channel conversion processing on the Ambisonic signal before performing redundancy displacement elimination processing on the VR audio signal;
the processing module 2620 is specifically configured to convert the audio signal after the redundancy offset cancellation process is performed on the VR audio signal into an Ambisonic signal after the redundancy offset cancellation process is performed on the VR audio signal.
The implementation principle and the technical effects of the device provided by the embodiment of the present invention are the same as those of the method embodiment, and for the sake of brief description, no part of the device embodiment is mentioned, and reference may be made to the corresponding contents in the method embodiment, which is not described herein again.
When processing the redundant motion of the acquisition end, the processing equipment of the VR audio signal can be VR audio acquisition equipment; when processing the redundant motion of the playing end, the processing device of the VR audio signal may be a VR audio playing device.
According to the processing equipment for the VR audio signal, provided by the embodiment of the invention, the VR audio signal can be adjusted according to the obtained redundant motion information corresponding to the VR audio signal, so that the influence of redundant motion on the VR audio signal is eliminated, and the stability and the accuracy of the VR audio signal are improved.
An embodiment of the present invention further provides an electronic device, including: a processor; and a memory configured to store machine-readable instructions that, when executed by the processor, cause the processor to perform the method of any of the above.
When processing the redundant movement of the acquisition end, the electronic equipment can be VR audio acquisition equipment; when processing the redundant motion of the playing end, the electronic device may be a VR audio playing device.
The electronic device may be any terminal device including a computer, a mobile phone, a tablet computer, a PDA (Personal digital assistant), a POS (Point of Sales), a vehicle-mounted computer, or a server.
The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory may mainly include a program storage area and a data storage area. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The processor is a control center, connects various parts of the whole terminal by using various interfaces and lines, and executes various functions and processes data by operating or executing software programs and/or modules stored in the memory and calling data stored in the memory, thereby integrally monitoring the terminal. Alternatively, the processor may include one or more processing units; preferably, the processor may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor.
An embodiment of the present invention further provides a readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the method according to any of the above embodiments.
It will be understood by those within the art that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. Those skilled in the art will appreciate that the computer program instructions may be implemented by a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, implement the features specified in the block or blocks of the block diagrams and/or flowchart illustrations of the present disclosure.
Those of skill in the art will appreciate that various operations, methods, steps in the processes, acts, or solutions discussed in the present application may be alternated, modified, combined, or deleted. Further, various operations, methods, steps in the flows, which have been discussed in the present application, may be interchanged, modified, rearranged, decomposed, combined, or eliminated. Further, steps, measures, schemes in the various operations, methods, procedures disclosed in the prior art and the present invention can also be alternated, changed, rearranged, decomposed, combined, or deleted.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (21)

1. A method for processing a Virtual Reality (VR) audio signal, comprising the steps of:
acquiring redundant motion information corresponding to the VR audio signal;
and adjusting the VR audio signal according to the redundant motion information.
2. The processing method of claim 1, wherein the redundant motion information comprises: collecting redundant motion information during VR audio or playing redundant motion information during VR audio;
according to the redundant motion information, adjusting the VR audio signal, including:
adjusting the collected VR audio signal according to the redundant motion information during the collection of the VR audio; or,
and adjusting the VR audio signal to be played according to the redundant motion information when the VR audio is played.
3. The processing method of claim 2, wherein obtaining redundant motion information corresponding to the VR audio signal comprises:
and determining redundant motion information corresponding to the VR audio signal through a deep learning network or a mode identification mode according to the collected scene video signal and/or sensor signal.
4. The processing method of claim 3, wherein the redundant motion information is redundant motion information when playing VR audio;
through the deep learning network, redundant motion information corresponding to the VR audio signal is determined, and the method comprises the following steps:
determining redundant motion information corresponding to the VR audio signal through a deep learning network according to at least one of the following items:
acceleration information of the VR playing device;
collected sensor signals;
a captured scene audio signal;
a user motion pattern.
5. The processing method according to any one of claims 2 to 4, wherein the redundant motion information is redundant motion information when playing VR audio;
before redundant motion information corresponding to the VR audio signal is obtained, the method further includes:
matching a redundant motion model determined according to a VR video signal to be played with a redundant motion model determined according to an acquired sensor signal;
and when the VR audio signals do not match with the virtual reality system, the step of obtaining the redundant motion information corresponding to the VR audio signals is executed.
6. The processing method according to any one of claims 2 to 5, wherein the redundant motion information is redundant motion information when VR audio is collected;
before the adjustment processing is performed on the VR audio signal, the method further includes:
determining relative movement parameters of VR audio acquisition equipment corresponding to VR video acquisition equipment;
and carrying out relative movement correction processing on the VR audio signal according to the relative movement parameter.
7. The processing method of claim 6, wherein determining the relative movement parameters of the VR audio capture device corresponding to the VR video capture device comprises:
acquiring the rotation angle of the acquisition equipment according to the acquired sensor signal;
and determining the relative movement parameters of the VR audio acquisition equipment corresponding to the VR video acquisition equipment according to the rotation angle of the acquisition equipment and the pre-stored position relationship between the VR audio acquisition equipment and the VR video acquisition equipment.
8. The processing method of any one of claims 1-7, wherein obtaining redundant motion information corresponding to the VR audio signal includes:
obtaining effective motion information and redundant motion information corresponding to the VR audio signal;
before the adjustment processing is performed on the VR audio signal, the method further includes:
and smoothing the redundant motion information based on the effective motion information.
9. The processing method of claim 8, wherein obtaining valid motion information and redundant motion information corresponding to the VR audio signal comprises:
and determining effective motion information and redundant motion information corresponding to the VR audio signal through a deep learning network or a mode identification mode according to the collected scene video signal and/or sensor signal.
10. The processing method of claim 9, wherein determining valid motion information and redundant motion information corresponding to the VR audio signal via a deep learning network comprises:
according to the collected scene video signals and/or sensor signals, effective motion information and redundant motion information corresponding to VR audio signals are determined through a deep learning network in combination with a user motion mode;
determining valid motion information and redundant motion information corresponding to the VR audio signal by a pattern recognition mode, including:
according to the collected scene video signals and/or sensor signals, combining with a user motion mode, and through matching with a preset general motion mode, predicting effective motion information;
and determining effective motion information corresponding to the VR audio signal according to the predicted effective motion information and the effective motion information obtained in the mode identification mode.
11. The processing method according to any one of claims 8 to 10, wherein smoothing the redundant motion information comprises:
and determining the redundant motion information after the current frame is smoothed according to the effective motion information of the current frame of the VR audio signal, the redundant motion information of the current frame and the redundant motion information after the previous frame is smoothed.
12. The processing method according to any of claims 1-11, wherein the redundant motion information comprises: redundant angular rotation parameters and/or redundant displacement parameters.
13. The processing method of claim 12, wherein the performing an adjustment process on the VR audio signal based on the redundant motion information comprises:
according to the redundant angle rotation parameter, redundant angle elimination processing is carried out on the VR audio signal; and/or
And according to the redundant displacement parameter, carrying out redundant displacement elimination processing on the VR audio signal.
14. The processing method of claim 13, wherein the VR audio signal comprises a multi-channel audio signal;
according to the redundant angle rotation parameter, carrying out redundant angle elimination processing on the multi-channel audio signal, comprising the following steps:
determining a first virtual loudspeaker array before redundant angle elimination processing according to the multi-channel audio signal;
and determining the audio signal of each second virtual loudspeaker in the second virtual loudspeaker array after redundant angle elimination according to the redundant angle rotation parameter and the audio signal of each first virtual loudspeaker in the first virtual loudspeaker array.
15. The processing method of claim 14, wherein determining the audio signal for each second virtual speaker in the second virtual speaker array after performing the redundant angle elimination process comprises:
for each second virtual speaker of the second virtual speaker array, performing the following operations, respectively:
determining an angle relationship between the second virtual loudspeaker and each first virtual loudspeaker in the first virtual loudspeaker array according to the redundant angle rotation parameter; determining the audio signal of the second virtual loudspeaker according to the audio signal of each first virtual loudspeaker and the determined angle relation; or
Determining the angle relationship between the second virtual loudspeaker and each adjacent first virtual loudspeaker according to the redundant angle rotation parameters; and determining the audio signal of the second virtual loudspeaker according to the audio signals of the first virtual loudspeakers adjacent to the second virtual loudspeaker and the determined angular relationship.
16. The processing method of claim 13, wherein the VR audio signal comprises a high fidelity stereo Ambisonic signal;
according to the redundant angle rotation parameter, redundant angle elimination processing is carried out on the Ambisonic signal, and the redundant angle elimination processing method comprises the following steps:
determining the rotation angle of the Ambisonics signal according to the redundant angle rotation parameter;
and rotating the Ambisonics signal according to the rotation angle of the Ambisonics signal.
17. The processing method of claim 13, wherein performing redundancy shift elimination processing on the VR audio signal according to the redundancy shift parameter comprises:
determining a third virtual loudspeaker array before redundant displacement elimination according to the VR audio signal;
and determining the audio signal of each fourth virtual loudspeaker in the fourth virtual loudspeaker array after the redundant displacement elimination processing is carried out according to the redundant displacement parameter and the audio signal of each third virtual loudspeaker in the third virtual loudspeaker array.
18. The processing method of claim 17, wherein determining the audio signal of each fourth virtual speaker in the fourth virtual speaker array after performing the redundant displacement elimination process comprises:
for each fourth virtual speaker of the fourth virtual speaker array, performing the following operations, respectively:
determining the relative position relationship between the fourth virtual loudspeaker and each third virtual loudspeaker according to the redundant displacement parameters; and determining the audio signal of the fourth virtual loudspeaker according to the audio signal of each third virtual loudspeaker and the determined relative position relation.
19. The processing method of claim 13, 17 or 18, wherein if the VR audio signal is an Ambisonic signal, before performing the redundancy elimination processing on the VR audio signal, the method further comprises:
performing multi-channel conversion processing on the Ambisonic signal;
after redundant displacement elimination processing is carried out on the VR audio signal, the method further comprises the following steps:
and converting the audio signal subjected to the redundant displacement elimination processing into an Ambisonic signal.
20. A processing device for a virtual reality, VR, audio signal, comprising:
the acquisition module is used for acquiring redundant motion information corresponding to the VR audio signal;
and the processing module is used for adjusting and processing the VR audio signal according to the redundant motion information.
21. An electronic device, comprising:
a processor; and
a memory configured to store machine-readable instructions that, when executed by the processor, cause the processor to perform the method of any of claims 1-19.
CN201810114171.5A 2018-02-05 2018-02-05 Processing method of virtual reality VR audio signal and corresponding equipment Active CN110120229B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810114171.5A CN110120229B (en) 2018-02-05 2018-02-05 Processing method of virtual reality VR audio signal and corresponding equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810114171.5A CN110120229B (en) 2018-02-05 2018-02-05 Processing method of virtual reality VR audio signal and corresponding equipment

Publications (2)

Publication Number Publication Date
CN110120229A true CN110120229A (en) 2019-08-13
CN110120229B CN110120229B (en) 2024-09-20

Family

ID=67519822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810114171.5A Active CN110120229B (en) 2018-02-05 2018-02-05 Processing method of virtual reality VR audio signal and corresponding equipment

Country Status (1)

Country Link
CN (1) CN110120229B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110703911A (en) * 2019-09-26 2020-01-17 深圳市酷开网络科技有限公司 Motion equipment control method, system and storage medium based on VR equipment
CN113112982A (en) * 2021-04-13 2021-07-13 上海联影医疗科技股份有限公司 Medical equipment, noise reduction method of medical equipment and storage medium
TWI738543B (en) * 2020-10-07 2021-09-01 財團法人工業技術研究院 Orientation predicting method, virtual reality headset and non-transitory computer-readable medium
WO2022242479A1 (en) * 2021-05-17 2022-11-24 华为技术有限公司 Three-dimensional audio signal encoding method and apparatus, and encoder
WO2023183053A1 (en) * 2022-03-25 2023-09-28 Magic Leap, Inc. Optimized virtual speaker array

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1575620A (en) * 2001-10-22 2005-02-02 西门子公司 Method and device for the interference elimination of a redundant acoustic signal
CN102789313A (en) * 2012-03-19 2012-11-21 乾行讯科(北京)科技有限公司 User interaction system and method
CN102985277A (en) * 2010-12-31 2013-03-20 北京星河易达科技有限公司 Intelligent traffic safety system based on comprehensive state detection and decision method thereof
US20150302651A1 (en) * 2014-04-18 2015-10-22 Sam Shpigelman System and method for augmented or virtual reality entertainment experience
US20160026253A1 (en) * 2014-03-11 2016-01-28 Magic Leap, Inc. Methods and systems for creating virtual and augmented reality
CN105931645A (en) * 2016-04-12 2016-09-07 深圳市京华信息技术有限公司 Control method of virtual reality device, apparatus, virtual reality device and system
CN106331371A (en) * 2016-09-14 2017-01-11 维沃移动通信有限公司 Volume adjustment method and mobile terminal
CN106464907A (en) * 2014-06-30 2017-02-22 韩国电子通信研究院 Apparatus and method for eliminating redundancy of view synthesis prediction candidate in merge mode
CN106537290A (en) * 2014-05-09 2017-03-22 谷歌公司 Systems and methods for biomechanically-based eye signals for interacting with real and virtual objects
CN106650816A (en) * 2016-12-28 2017-05-10 深圳信息职业技术学院 Video quality evaluation method and device
CN106937531A (en) * 2014-06-14 2017-07-07 奇跃公司 Method and system for producing virtual and augmented reality

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1575620A (en) * 2001-10-22 2005-02-02 西门子公司 Method and device for the interference elimination of a redundant acoustic signal
CN102985277A (en) * 2010-12-31 2013-03-20 北京星河易达科技有限公司 Intelligent traffic safety system based on comprehensive state detection and decision method thereof
CN102789313A (en) * 2012-03-19 2012-11-21 乾行讯科(北京)科技有限公司 User interaction system and method
US20160026253A1 (en) * 2014-03-11 2016-01-28 Magic Leap, Inc. Methods and systems for creating virtual and augmented reality
US20150302651A1 (en) * 2014-04-18 2015-10-22 Sam Shpigelman System and method for augmented or virtual reality entertainment experience
CN106537290A (en) * 2014-05-09 2017-03-22 谷歌公司 Systems and methods for biomechanically-based eye signals for interacting with real and virtual objects
CN106937531A (en) * 2014-06-14 2017-07-07 奇跃公司 Method and system for producing virtual and augmented reality
CN106464907A (en) * 2014-06-30 2017-02-22 韩国电子通信研究院 Apparatus and method for eliminating redundancy of view synthesis prediction candidate in merge mode
CN105931645A (en) * 2016-04-12 2016-09-07 深圳市京华信息技术有限公司 Control method of virtual reality device, apparatus, virtual reality device and system
CN106331371A (en) * 2016-09-14 2017-01-11 维沃移动通信有限公司 Volume adjustment method and mobile terminal
CN106650816A (en) * 2016-12-28 2017-05-10 深圳信息职业技术学院 Video quality evaluation method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨磊, 何克忠, 郭木河, 张钹: "虚拟现实技术在机器人技术中的应用与展望", 机器人, no. 01 *
沈燕飞;李锦涛;朱珍民;张勇东;: "高效视频编码", 计算机学报, no. 11 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110703911A (en) * 2019-09-26 2020-01-17 深圳市酷开网络科技有限公司 Motion equipment control method, system and storage medium based on VR equipment
TWI738543B (en) * 2020-10-07 2021-09-01 財團法人工業技術研究院 Orientation predicting method, virtual reality headset and non-transitory computer-readable medium
CN114296539A (en) * 2020-10-07 2022-04-08 财团法人工业技术研究院 Direction prediction method, virtual reality device and non-transitory computer readable medium
US11353700B2 (en) 2020-10-07 2022-06-07 Industrial Technology Research Institute Orientation predicting method, virtual reality headset and non-transitory computer-readable medium
CN114296539B (en) * 2020-10-07 2024-01-23 财团法人工业技术研究院 Direction prediction method, virtual reality device and non-transitory computer readable medium
CN113112982A (en) * 2021-04-13 2021-07-13 上海联影医疗科技股份有限公司 Medical equipment, noise reduction method of medical equipment and storage medium
CN113112982B (en) * 2021-04-13 2022-08-16 上海联影医疗科技股份有限公司 Medical equipment, noise reduction method of medical equipment and storage medium
WO2022242479A1 (en) * 2021-05-17 2022-11-24 华为技术有限公司 Three-dimensional audio signal encoding method and apparatus, and encoder
WO2023183053A1 (en) * 2022-03-25 2023-09-28 Magic Leap, Inc. Optimized virtual speaker array

Also Published As

Publication number Publication date
CN110120229B (en) 2024-09-20

Similar Documents

Publication Publication Date Title
CN110120229B (en) Processing method of virtual reality VR audio signal and corresponding equipment
EP3440538B1 (en) Spatialized audio output based on predicted position data
CN109564504B (en) Multimedia device for spatializing audio based on mobile processing
EP3729829B1 (en) Enhanced audiovisual multiuser communication
CN108370471A (en) Distributed audio captures and mixing
RU2759012C1 (en) Equipment and method for reproducing an audio signal for playback to the user
US20120207308A1 (en) Interactive sound playback device
JP2020532914A (en) Virtual audio sweet spot adaptation method
EP3503592B1 (en) Methods, apparatuses and computer programs relating to spatial audio
CN111050271B (en) Method and apparatus for processing audio signal
WO2018121524A1 (en) Data processing method and apparatus, acquisition device, and storage medium
US20220225050A1 (en) Head tracked spatial audio and/or video rendering
US20210092545A1 (en) Audio processing
US12010490B1 (en) Audio renderer based on audiovisual information
WO2013083875A1 (en) An apparatus and method of audio stabilizing
CN110572710B (en) Video generation method, device, equipment and storage medium
JP7483852B2 (en) Discordant Audiovisual Capture System
JP6807744B2 (en) Image display method and equipment
US20190313174A1 (en) Distributed Audio Capture and Mixing
US20240096334A1 (en) Multi-order optimized ambisonics decoding
KR102670181B1 (en) Directional audio generation with multiple arrangements of sound sources
US20240098439A1 (en) Multi-order optimized ambisonics encoding
CN117012217A (en) Data processing method, device, equipment, storage medium and program product
CN117676002A (en) Audio processing method and electronic equipment
CN118413803A (en) Obtaining calibration data for capturing spatial audio

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant