WO2018152004A1 - Filtrage contextuel pour audio immersif - Google Patents

Filtrage contextuel pour audio immersif Download PDF

Info

Publication number
WO2018152004A1
WO2018152004A1 PCT/US2018/017460 US2018017460W WO2018152004A1 WO 2018152004 A1 WO2018152004 A1 WO 2018152004A1 US 2018017460 W US2018017460 W US 2018017460W WO 2018152004 A1 WO2018152004 A1 WO 2018152004A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
orientation
dominant
audio stream
sound source
Prior art date
Application number
PCT/US2018/017460
Other languages
English (en)
Inventor
Pasi Sakari Ojala
Original Assignee
Pcms Holdings, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pcms Holdings, Inc. filed Critical Pcms Holdings, Inc.
Publication of WO2018152004A1 publication Critical patent/WO2018152004A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Definitions

  • Live streaming of a 360-degree video can contains spatial audio to create an immersive 3D user experience.
  • the spatial audio captures the whole audio scenery at the recording spot and delivers the environment to the receiver.
  • the live video content consumer is able to explore the 360-degree view by scrolling the screen and changing the view point direction.
  • a consuming user may also apply wearable devices, in which the application renders the view according to the direction the user is looking and changes the view when the user turns the device.
  • Audio sources remain in the same location relative to the user regardless of whether the user alters the screen view or turns a device to change the viewpoint. For example, an audio source appearing at the right-hand side remains on the right-hand side, even if the user turns the view towards the audio source. The location of sound sources appearing on the video are thus mismatched with the audio presented when the user changes the viewpoint.
  • Such applications presenting the streamed content lack the ability to control the audio chain from the receiving application.
  • Systems and methods described herein manage 3D immersive audio for video streaming player applications.
  • On embodiment is directed to a method including receiving an immersive video data and a multi-channel audio stream, receiving directional information regarding at least one dominant sound source, generating at least one dominant source audio stream from the multi-channel audio stream, determining user view orientation information, generating orientation-adjusted audio by applying perceptual audio coding to the at least one dominant source audio stream based on the user view orientation information and the directional information regarding the at least one dominant sound source, and corresponding to the user view orientation information, presenting orientation-adjusted audio and a view of the immersive video data.
  • the perceptual audio coding includes calculating an inter-channel level difference, a time difference, and an inter-channel coherence for each transform domain time frequency. In another embodiment, perceptual audio coding is applied to one or more coherent sound sources selected from the at least one dominant sound source. In another embodiment, perceptual audio coding includes applying a head-related transfer function if the user view orientation information includes horizontal and vertical movement.
  • generating orientation-adjusted audio further includes applying at least two band-splitting filters to at least one dominant source audio stream to generate filter output signals, down-sampling the filter output signals to generate down-sampled output signals, sub-band domain contextual filtering the down-sampled outputs to generate contextual filtering output signals, up-sampling the contextual filtering outputs to generate up-sampled outputs, and applying synthesis filters to the up-sampled outputs to generate orientation-adjusted audio signals.
  • synthesis filters to the up-sampled outputs to generate orientation-adjusted audio signals.
  • Another method is directed to receiving a multi-channel audio stream, receiving directional information associated with at least one dominant sound source, generating at least one dominant source audio stream from the multi-channel audio stream, receiving user view orientation information, generating orientation-adjusted audio by applying binaural cue coding using signal coherence parameters to the at least one dominant source audio stream based on the user view orientation information and the directional information regarding the at least one dominant sound source, and presenting orientation-adjusted audio that corresponds to the user view orientation information.
  • Another embodiment is directed to a device including a processor, and a non-transitory computer-readable medium coupled to the processor that stores instructions that are operative, when executed on the processor, to perform the functions of receiving an immersive video data and a multichannel audio stream, receiving directional information regarding at least one dominant sound source, generating at least one dominant source audio stream from the multi-channel audio stream, determining user view orientation information, generating orientation-adjusted audio by applying perceptual audio coding to the at least one dominant source audio stream based on the user view orientation information and the directional information regarding the at least one dominant sound source, and presenting orientation-adjusted audio and a view of the immersive video data that correspond to the user view orientation information.
  • FIG. 1 is a plan view schematic of an example user microphone array for capturing sound sources encircling a user in accordance with an embodiment.
  • FIG. 2 is a flow diagram of an exemplary method for extracting context information for user actions and captured audio signals in accordance with an embodiment.
  • FIG. 3 is a block diagram of a time-frequency analysis of a multi-channel signal appropriate for embodiments herein.
  • FIG. 4 is a message structure block diagram for a synchronous MPEG-2 TS metadata stream in accordance with an embodiment.
  • FIG. 5 is a flow diagram of an embodiment of a process for creating a natural audio presentation using context data and local sensor signals.
  • FIG. 6 is a plan view schematic for a clockwise rotation of an audio image for a receiving user rotating counter-clockwise relative to the sound sources in accordance with an embodiment.
  • FIG. 7 is a three-dimensional diagram for turning an audio image in both a horizontal and a vertical direction in accordance with an embodiment.
  • FIG. 8 shows a block diagram for contextual filtering of a multi-channel audio signal in accordance with an embodiment.
  • FIG. 9 is a block diagram of sub-band contextual filtering with analysis and synthesis filter banks in accordance with an embodiment.
  • FIG. 10 is a time frequency (TF) domain diagram that shows classification of time frequency slots for different sound sources in accordance with an embodiment.
  • FIG. 11 is a system diagram showing an example overall architecture and data flows for a live video streaming service in accordance with an embodiment.
  • FIG. 12 is a system flow diagram showing one embodiment of modification of audio data for use with a 360-degree video presentation.
  • FIG. 13A depicts an example wireless transmit/receive unit (WTRU) that may be used as a client within an exemplary immersive audio/video system.
  • WTRU wireless transmit/receive unit
  • FIG. 13B depicts an exemplary network entity that may be used as a server within an exemplary immersive audio/video system.
  • a recording device on a user 100 is shown with Sound source 1 110, sound source 2 120 and Sound source 3 130.
  • the recording user is shown with binaural, which could be multi-channel microphones to determine contextual information about an audio image regarding dominant sound source locations and their coherence.
  • the recording user is shown listening to several simultaneous sound sources in different locations, 110, 120 and 130.
  • contextual metadata contains information about three separate sound sources and their characteristics.
  • the resulting audio context, as well as the recording user context, for location and motion is transmitted as metadata to the receiver together with the audio-visual content.
  • the metadata is sent to a receiving application that uses contextual filtering to analyze the audio according to context metadata and relative motion context of recording and receiving users.
  • the contextual filter processes the audio presentation by positioning identified audio sources accordingly to maintain the natural immersive 3D audio experience.
  • the direction of the audio sources is sent in the metadata to allow tracking of audio sources.
  • a client live video streaming application can post-process the directionality of audio sub- streams, such as a multi-channel audio stream and apply the user context to render the audio presentation as if the receiving user was in the middle of the events.
  • contextual analysis and filtering of the audio image exists in both recording and receiving applications to enable a correct live rendering of the audio image for the listener.
  • a recording application is able to use raw microphone signals as recording device 100 instead of a receiving application attempting to extract all details from a lossy, compressed signal received over a wireless channel, sound sourcing can be more reliable.
  • front- back confusion and determining whether a sound source is in front or behind a user may be resolved in a recording application if all recording condition information is available.
  • a recording application may also have a priori data about the traced sources, which may be applied as part of the audio context metadata.
  • a recording application captures live video, audio images around a user, context data about an audio environment, and metadata about a user's head movements, such as shown in FIG. 1 100.
  • live video is recorded with a smartphone.
  • a 360-degree stream is recorded with an external camera module, such as device 100.
  • a recording application may capture multi-channel audio with a microphone array.
  • the device is connected to a headset with binaural microphones, 100, and an application streams surround sound.
  • the captured audio-visual content is streamed to a live video streaming service.
  • captured audio-visual content is edited and distributed via a content service.
  • content is compressed with standard video and audio codecs.
  • Many live video streaming protocols work with any media codec (or are media codec agnostic).
  • a recording application may use any standard ("state-of-the-art" or otherwise) to encode audio-visual content.
  • multi-channel audio of two or more channels is compressed with stereo or spatial audio codecs. Such a methodology maintains a perceived audio image, even in 3D.
  • An MPEG DASH player supports video coding with the DASH-AVC/264 codec (Advanced Video Codec) and stereo audio with the HE-AACv2 Profile, level 2 codec. If a recording device supports multi-channel audio or binaural audio capture, the MPEG Spatial Audio Codec (SAC) may be used.
  • the MPEG SAC provides a backward compatible way to transmit audio to a receiver. Under MPEG SAC, the core bit stream is mono audio (e.g., AAC), and spatial audio is in a separate field. For such an embodiment, the codec selection is listed in the MPEG DASH Media Presentation Description file.
  • FIG. 2 is a flow diagram of an exemplary recording application that extracts context information for user actions and captured audio equipment. More particularly, the flow diagram includes recording user context data on location, orientation and motion 210. For example, as shown in FIG. 1 , 100.
  • a recording application may record sounds for audio media and perform user context handling.
  • a recording application collects information from motion sensors such as a GPS, gyro, compass, and accelerometer to determine the location and orientation of a recording user and equipment.
  • block 220 includes external knowledge about the sound sources.
  • the audio signal can be from multi-channel microphone equipment 230.
  • audio and video content recorders are fixed to physical coordinates and the orientation of the recording device relative to audio visual targets, for example, is determined.
  • Both the external knowledge about sound sources 220 and the audio signal from multi-channel microphone equipment (or binaural equipment) 230 is provided in block 240 to an audio manager that analyzes the audio trace sound sources and locates diffuse sounds.
  • audio context analysis is performed for a raw microphone signal.
  • a recording device 100 captures an audio signal with a multi-microphone setup (the input signal has two more channels), 230.
  • An audio context manager then analyzes the signal and extracts the most dominant sound sources with their locations within the audio environment, 240.
  • the audio signal analysis 240 may be conducted in the time and frequency domains simultaneously to extract information on both time- and frequency-related phenomena.
  • Each sound source and corresponding location determined in block 240 may be used in determining context information.
  • audio context is packetized together with user motion context in metadata.
  • Block 270 provides that the packetized context information is transmitted together with the audio (and video) content from block 260 to a live video streaming service 279, which could also be to a content service.
  • FIG. 3 is a block diagram of a time-frequency analysis of a multi-channel signal for audio context analysis in recording applications.
  • two channels ( i(z) 310 and x 2 (z)) 312 are filtered with low- and high-pass filters (H 0 (z) 320, 324 and H ⁇ z)) 322, 326 and down-sampled by 2 as shown in blocks 330, 332, 334, and 336 to maintain as constant the overall number of samples.
  • a filter bank has several stages of band-splitting filters to achieve a certain (or sufficient) level of frequency resolution.
  • a different filter configuration may be used to get a non-uniform band split. Different configurations regarding sampling can also be used. If the resulting band-limited signals are segmented, the result may be a time- frequency domain parameterization of the original time series. Signal components limited by frequency and time slots may be used as time frequency parameters of the original signal.
  • the down-sampled signals are used for sub-band domain contextual analysis 340, and context data is output 350.
  • the contextual analysis block 340 searches for potential dominant sound sources and their diffuseness in multi-channel audio signals.
  • different beam forming methods may be used, for example, applying the level and time difference of different channels.
  • parametric methods such as binaural cue coding (BCC) may be used to extract location (direction of arrival) cues from a sound source.
  • a BCC method may use signal coherence parameterization to classify (or sort) sound sources based on their diffuseness.
  • BCC analysis may comprise computation of inter channel level difference (ILD), time difference (ITD) and inter-channel coherence (ICC) parameters estimated within each transform domain time frequency slot (in each frequency band of each input frame).
  • ILD inter channel level difference
  • ITD time difference
  • ICC inter-channel coherence
  • ICC inter-channel coherence
  • a high coherence may indicate that the sound source is point-like with an accurate direction of arrival, while low coherence may mean a diffuse sound without a clear direction of arrival. For example, reverberation and reflected signals coming from many different directions may have a low coherence.
  • the number of different sound sources may be estimated using the direction of arrival estimation cues in different time frequency slots. If an audio image contains one or more discrete sound sources, a distribution of a direction of arrival estimates may reveal distinct peaks. A peak in a direction of arrival parameter distribution may represent a separate sound source. The mean value of all coherence cues determined in the time frequency slots corresponding to these direction of arrival estimates may describe the coherence of a given sound source.
  • a recording application may also have other information about the sound sources around a recording user. For some embodiments, such information is included in contextual metadata. For other embodiments, such information is used with estimating the context, such as regarding the direction of arrival cues. A search for sound source parameters for audio content may be concentrated around initial values.
  • audio context information in a recording application contains the following parameters: the number of separate of separate sound sources (the number of peaks in a direction of arrival distribution), the direction of arrival in a horizontal plane for each identified sound source (the mean value of each peak in the direction of arrival distribution curve), and a coherence of each sound source (the mean of coherence parameters corresponding to the direction of arrival cues in a given distribution peak).
  • the direction of arrival in a horizontal plane for each identified sound source is a value of 0 to 360 degrees.
  • the direction of arrival may have both horizontal and vertical angle parameters. In general, higher levels of coherence indicate narrower source.
  • Some embodiments operate to distinguish between front and back for direction of arrival estimations.
  • Level and time difference cues are in some cases not sufficient to distinguish between front and back.
  • combining level and time differences with user location and orientation may be used to determine the difference between front and back.
  • the motion of sound source that appears in the front is inversely proportional to the motion of the recording device, while the motion of sound source that appears in the back is proportional to the motion of recording device.
  • the direction of arrival cue listed above is revised accordingly if the user context is available.
  • the recording user and device context may be used for analyzing the relative motion of recording and receiving devices.
  • the receiving application may use the relative motion of recording and receiving devices to create the presentation in local coordinates.
  • the presentation may be fixed relative to a physical location or to a camera's coordinates. If the presentation is fixed relative to a physical location, the recording device motion may be cancelled and the presentation may be kept static relative to the environment.
  • the data structure for three identified sound sources (audio environment) and a recording user context is presented, e.g., in XML format as shown below:
  • Audio environment metadata may be dynamically updated for changes to the number of sound sources, direction of arrival, or coherence.
  • the context update rate is in the range of seconds because the data is provided for the receiving application as guidance for finding the context in the received audio.
  • Live streaming protocols such as the MPEG DASH Media Presentation Description (MPD) in the ISO/IEC 23009-1 specification, may contain information about media segments, such as the relationships between media segments and information for choosing between media segments.
  • MPD may contain information about different streams with different bit rates.
  • An XML file may contain information about codec mime types, audio bandwidth, bit rates, image resolution, and image size.
  • Each media segment location may have an explicit URL.
  • the receiver picks the appropriate bit rate stream to fit the transmission channel and device capabilities.
  • MPEG-2 TS Transport Stream
  • ISO/IEC 13818-1 is applied in MPEG DASH for a media container format and for transmitting media packets over HTTP.
  • Video and Audio elementary streams are a continual series (or set) of video frames and audio data samples.
  • audio image metadata contains a continuous (or continual) stream of information about sound sources and streamed as timed metadata within a transport system.
  • the MPD describing a presentation has URL links where a receiver may access a stream.
  • the MPD does not contain any contextual metadata.
  • MPEG-2 TS supports encapsulation of synchronous and asynchronous metadata encapsulated in a stream.
  • Synchronous metadata may be used because the audio context is tightly related to the audio content.
  • metadata is synchronized with video using a Presentation Time Stamp (PTS) found in a Packetized Elementary Stream (PES) header. This time stamp may be coded in the MPEG-2 Systems PES layer.
  • the packet payload may be for a single metadata cell that includes a five-byte header followed by KLV (Key-Length-Value) metadata.
  • context data contains a fixed data structure.
  • metadata segments containing context data about an audio environment and recording user information is described above by the XML source code.
  • each segment uses a PTS for a start time, and each segment is handled with the same timeline with media components.
  • a message structure block diagram in accordance with an embodiment is illustrated for a synchronous MPEG-2 TS metadata stream (or multiplex).
  • a three-byte start code 410 is followed by a 1 -byte stream identifier (streamjd) 420, a two-byte packet length field 430, three bytes of flags 440, a five-byte Presentation Time Stamp 450, and a packet payload 460.
  • a streamjd 420 equal to OxFC (252 in decimal) indicates a "metadata stream.”
  • a packet payload 460 is a single metadata segment that includes a five-byte header followed by KLV metadata.
  • FIG. 5 a flow diagram illustrates an embodiment of a process for creating a natural audio presentation using context data and local sensor signals.
  • FIG. 5 shows one embodiment for a receiving application to handle audio media and user context data.
  • a receiving application receives a live video stream bundle with a video stream (or immersive video data), an audio stream (which may be multi-channel), and metadata about the recording context within the audio transport stream.
  • block 510 provides for metadata on user and audio context that is provided to an audio manager.
  • Block 520 represents a multi-channel audio stream also provided to an audio manager.
  • a receiving application analyzes an audio signal and picks out parameters related to sound sources in metadata.
  • An audio manager determines parameters for a multi-channel audio signal and matches parameters that correspond to location (direction of arrival) of dominant sound sources identified in context data.
  • Block 530 represents audio manager analyzing the audio to trace sound sources and to classify the diffuse sounds.
  • a search is controlled by coherence information and time and level difference cues extracted from the direction of arrival metadata.
  • contextual filtering as shown in block 550 is performed with modification cues.
  • context filtering can operate based on location and orientation information of recording and receiving devices.
  • relative location and orientation of the receiving user and the recording user are provided to enable context filtering.
  • the receiving device determines the receiving user motion context, for example using a head tracking instrumentation on headphones.
  • the application determines relative motion of the recording and receiving devices at block 540 which may determine how to modify the audio image. The same information may be applied to select the point of view from the 360-degree video.
  • a contextual filter 550 in a receiving application combines this information and determines relative location, orientation, and motion of recording and receiving entities.
  • a receiving application renders a spatial audio presentation based on contextual filtering with modification cues as shown in block 570.
  • a receiving user may turn the device or a headset or pointer on screen or the like to explore a 360-degree presentation or near 360-degree presentation as shown in block 580.
  • a search for the recorded device context data pertaining to location, orientation and motion 560 can occur which then enables collection of the relative location and orientation of both the receiving user the recording user 540, to once again perform context filtering with new modification cues due to the movement of the receiving user.
  • an audio image at an angle is created based on a relative rotation of recording and receiving devices. If a receiving user rotates in the horizontal direction to the left, audio manager creates of a natural audio experience by performing a relative rotation of an audio image around the receiving user 90° to the right.
  • the contextual filtering 550 implements an audio image rotation by using extracted audio parameters.
  • the parameters related to individual sound sources are modified by altering the time and level difference for the rotation angle.
  • the rotation is implemented using only parameters related to coherent sound sources, while parameters related to diffuse, non-coherent sounds, such as background noise and reverberant sounds, are not used.
  • rotation effects for non-coherent sound components that lack a location (direction of arrival) are not used.
  • FIG. 6 a plan view schematic illustrates clockwise rotation of an audio image for a receiving user rotating counter-clockwise relative to the sound sources.
  • FIG. 6 shows an example of turning an audio image including sound sources 1 , 610, 2, 620, 3 630 around a receiving user 602.
  • the receiving (listening) user 602 turns his or her head to the left, and the audio image is turned to the right as shown by the arrow and dashed sound sources 1 612, 2 622 and 3 632.
  • the user experiences a natural audio environment and is able to explore the audio image as if listening to the environment without headphones.
  • the time and level difference modifications of the audio parameters are sufficient for turning an audio image in a horizontal direction.
  • (conventional) audio cues on time and level differences are used for audio parameter modifications made only in a horizontal plane.
  • FIG. 7 a three-dimensional diagram illustrates turning an audio image in both a horizontal and a vertical direction.
  • 360-degree video streams include data conveying that a user is looking up and down in a three-dimensional environment.
  • relative rotation has both horizontal and vertical axis components.
  • FIG. 7 shows relative rotation with ⁇ 710 representing horizontal rotation and ⁇ 720 representing vertical rotation.
  • contextual filtering applies a Head-Related Transfer Function (HRTF) to modify audio image content if a user has horizontal and vertical movement.
  • HRTF Head-Related Transfer Function
  • an audio image is a representation of directionality and/or context of audio with respect to an image and/or video.
  • An HRTF enables positioning of audio content anywhere around the user.
  • a contextual filter removes level and time differences of parameters corresponding to dominant and coherent sound sources and applies an HRTF filter to move the audio component to another location (or direction).
  • FIG. 8 illustrates a block diagram for contextual filtering of a multi-channel audio signal.
  • a receiving application performing contextual filtering renders an audio image for a user.
  • the receiving application applies the received context data, as well as local context data for a receiving user's location, orientation, and motion to create a natural 3D audio presentation.
  • FIG. 8 shows a sub-band domain contextual filtering process 800 (which may be within a receiving application) that receives recording and receiving context input data 810 and 820 as well as audio channel input data (x;(z)) 830 and outputs a modified multi-channel audio presentation (y;(z)) 840.
  • contextual filtering (or perceptual audio coding) is applied to a multi-channel audio signal (or stream) to generate orientation-adjusted audio data based on user view orientation information (which may include a user's location, orientation, and motion) and directional information for a determined dominant sound source associated with a dominant source audio stream.
  • user view orientation information which may include a user's location, orientation, and motion
  • directional information for a determined dominant sound source associated with a dominant source audio stream.
  • FIG. 9 is a block diagram of sub-band contextual filtering with analysis and synthesis filter banks.
  • contextual filtering is a filter bank implemented similar to the audio analysis in the recording application.
  • contextual filtering may use a Quadrature Mirror Filter (QMF) or a Fourier transform domain filter.
  • QMF Quadrature Mirror Filter
  • a critically- sampled filter bank uses band-splitting low-pass H 0 ( z ) and high-pass Hi(z) analysis filters on the multi-channel audio inputs i(z) 902 and x 2 (z) 904 and down-samples by 2 the analysis filter outputs.
  • the sub-band domain contextual filtering 950 operates in the time frequency domain on the down-sampled inputs.
  • the sub-band domain contextual filtering outputs are up-sampled by 2, and a synthesis filterbank with low-pass G 0 (z) and high-pass G x (z) synthesis filter components are applied to the up-sampled outputs to reconstruct the signal back into the time domain as yi(z) 980 and y 2 (z) 990.
  • the band-splitting may have several stages and may also form a non-uniform structure, which may be similar to a recording application.
  • the resampling and filtering may be conducted in a different order.
  • the audio is split into bands, e.g., with perceptual bands.
  • the audio split into bands may be different in the contextual filtering implementations in recording and receiving applications.
  • the number of frequency bands and time window length may differ.
  • time resolution (the length of analysis frame window) may be aligned with the metadata timing.
  • the analysis time frame window may be extended if the metadata does not have any updates.
  • the time frequency signal is analyzed to extract direction of arrival cues for each slot.
  • context metadata contains the number of sound sources present in a signal and corresponding direction of arrival and coherence data.
  • context filtering determines time and level differences for each frequency slot and selects slots that correspond to each identified sound source for a corresponding direction of arrival cue.
  • contextual filtering is a search for time frequency slots that match the direction of arrival cues of each sound source found in received metadata.
  • a sound source search is allocated according to coherence cues.
  • the search range for direction of arrival level and time difference cues, interaural level differences (ILD) and interaural time differences (ITD)
  • ILD interaural level differences
  • ITD interaural time differences
  • a low coherence for example, a coherence less than 0.5
  • a high coherence for example, a coherence greater than 0.5, may be similar to a point-like sound source with a direction of arrival of +/-5 degrees.
  • FIG. 10 illustrates a time frequency (TF) domain chart that shows classification of time frequency slots for different sound sources.
  • frequency 1010 is shown a s a y axis
  • time 1030 is shown as an x axis.
  • the classification may be based on direction of arrival.
  • there are two sound sources (marked as boxes with vertical and horizontal lines), as shown on the left side of the chart.
  • Information about sound locations and coherence is found in metadata.
  • time stamp 1464008378 a new sound source appears in the recording.
  • a new set of metadata is issued in the Media Presentation Description (MPD) in metadata that provides information about the location and coherence of the new sound source.
  • the new sound source 1040 found in the audio image is shown as solid black boxes.
  • the time frequency slots not matched to any sound source in the metadata are left blank. Such time frequency slots may contain diffuse background noise and sound components without any particular direction.
  • filtering a multi-channel audio stream includes using an analysis filter bank as shown in FIG. 10 that parameterizes the multi-channel audio stream into a plurality of time and frequency defined areas (slots), and matching the direction information as indicated by direction of arrival cues in received metadata, to one or more of the plurality of time and frequency defined areas.
  • the sound sources of FIG. 10 may be the same ones illustrated earlier in FIG.
  • sound source 1 610 of FIG. 6 corresponds to the time frequency slots shown as solid black boxes in FIG. 10
  • sound source 2 620 corresponds to the time frequency slots shown with horizontal lines
  • sound source 3 630 corresponds to the time frequency slots shown with vertical lines.
  • This example categorization corresponds to direction of arrival.
  • received context data regarding user location data is compared to corresponding data from a receiving user.
  • the differences in location, orientation, and motion may be used to determine the relative motion of users and how to modify an audio image. For many embodiments, if a difference is above zero, the receiving user is experiencing the received audio content differently than the recording user, and the audio image is modified accordingly. As shown in FIG. 6, the user may have turned his or her head or the device, and the audio image is turned accordingly.
  • contextual filtering for the audio image modification phase uses modifying parameters in a time frequency slot corresponding to each detected sound source by applying an artificial level and time difference adjustment for the amount of rotation.
  • the overall level of sound sources for example, sources appearing in front of the receiving user, may be tuned to improve perception.
  • Parameters for time frequency slots not corresponding to a detected directional sound source are not adjusted or not used.
  • a filter may add artificial diffuseness to reduce signal coherence and reduce directional components present in the signal. Time frequency domain components are converted back into the time domain with a synthesis filter bank.
  • Various embodiments use different methods to estimate the direction of arrival. Described herein are systems and methods that use level and time difference-based systems and methods. Other embodiments use different beam forming systems and methods. Some embodiments use different methods to convey an audio environment and user context data in metadata segments. Some embodiments use an XML file containing a time series of location information instead of single values for each sound source and user. For some embodiments, context data is conveyed as a continuous signal similar to audio and video signals. Other embodiments exist for managing the level of identified sound sources. For example, a receiving application may emphasize sources with a high coherence and point-like direction of arrival pattern.
  • FIG. 11 a system diagram illustrates an example overall architecture and data flows for a live video streaming service.
  • Systems and methods described herein may be applied to live video streaming services that support 360-degree videos.
  • a live video recording device 1110 captures live audio/video content and transmits (relevant) contextual data via Dynamic Adaptive Streaming over HTTP (DASH) protocol 1120 to a live streaming service 1130, which sends DASH bundles 1150 to a live video receiving device 1160 requesting live streams 1140.
  • DASH Dynamic Adaptive Streaming over HTTP
  • Real Time Streaming Protocol and MPEG ISO/I EC standard DASH support real-time audio-visual content and metadata transfer between a client and server.
  • Recording clients may stream the audio-visual content over RTSP to a live streaming server.
  • a server collects an incoming stream, extracts an encoded bit stream from an RTSP stream, and bundles the encoded bit stream into a DASH media presentation.
  • a stream from a recording application is presented in a Media Presentation Description (MPD) manifest file and corresponding encoded bit streams are included in a Media Presentation Data Model.
  • a receiving client requests a live video stream, and a media composition is transmitted over HTTP in data blocks of short segments for each audio-visual stream.
  • a receiving device receives via the DASH protocol a live video stream containing live audio-video content with related metadata.
  • a system flowchart illustrates one embodiment of modification of audio data for use with a 360-degree video presentation.
  • 360-degree video 1210, stereo audio signals 1220, and directional information 1230 are received by a system.
  • a system determines a dominant sound source and generates dominant sound source 1240 and dominant source streams 1250.
  • Perceptual audio coding 1260 is applied to dominant source audio streams to produce orientation-adjusted audio data 1270.
  • Orientation-adjusted audio 1270 is combined with 360- degree video to output a user view 1280 for a receiving user.
  • a wireless transmit/receive unit may be used as a receiving user device in embodiments described herein.
  • FIG. 13A is a system diagram of an example WTRU 1302.
  • the WTRU 1302 may include a processor 1318, a transceiver 1320, a transmit/receive element 1322, a speaker/microphone 1324, a keypad 1326, a display/touchpad 1328, a non-removable memory 1330, a removable memory 1332, a power source 1334, a global positioning system (GPS) chipset 1336, and other peripherals 1338.
  • the transceiver 1320 may be implemented as a component of decoder logic 1319.
  • the transceiver 1320 and decoder logic 1319 may be implemented on a single LTE or LTE-A chip.
  • the decoder logic may include a processor operative to perform instructions stored in a non-transitory computer-readable medium. As an alternative, or in addition, the decoder logic may be implemented using custom and/or programmable digital logic circuitry.
  • the WTRU 1302 may include any sub-combination of the foregoing elements while remaining consistent with an embodiment.
  • the base stations 1314a and 1314b, and/or the nodes that base stations 1314a and 1314b may represent, such as but not limited to transceiver station (BTS), a Node-B, a site controller, an access point (AP), a home node-B, an evolved home node-B (eNodeB), a home evolved node-B (HeNB), a home evolved node-B gateway, and proxy nodes, among others, may include some or all of the elements depicted in FIG. 13A and described herein.
  • BTS transceiver station
  • AP access point
  • eNodeB evolved home node-B
  • HeNB home evolved node-B gateway
  • proxy nodes among others, may include some or all of the elements depicted in FIG. 13A and described herein.
  • the processor 1318 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like.
  • the processor 1318 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 1302 to operate in a wireless environment.
  • the processor 1318 may be coupled to the transceiver 1320, which may be coupled to the transmit/receive element 1322. While FIG. 13A depicts the processor 1318 and the transceiver 1320 as separate components, the processor 1318 and the transceiver 1320 may be integrated together in an electronic package or chip.
  • the transmit/receive element 1322 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 1314a) over the air interface 1316.
  • a base station e.g., the base station 1314a
  • the transmit/receive element 1322 may be an antenna configured to transmit and/or receive RF signals.
  • the transmit/receive element 1322 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, as examples.
  • the transmit/receive element 1322 may be configured to transmit and receive both RF and light signals.
  • the transmit/receive element 1322 may be configured to transmit and/or receive any combination of wireless signals.
  • the WTRU 1302 may include any number of transmit/receive elements 1322. More specifically, the WTRU 1302 may employ MIMO technology. Thus, in one embodiment, the WTRU 1302 may include two or more transmit/receive elements 1322 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 1316.
  • the WTRU 1302 may include two or more transmit/receive elements 1322 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 1316.
  • the transceiver 1320 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 1322 and to demodulate the signals that are received by the transmit/receive element 1322.
  • the WTRU 1302 may have multi-mode capabilities.
  • the transceiver 1320 may include multiple transceivers for enabling the WTRU 1302 to communicate via multiple RATs, such as UTRA and IEEE 802.11 , as examples.
  • the processor 1318 of the WTRU 1302 may be coupled to, and may receive user input data from, the speaker/microphone 1324, the keypad 1326, and/or the display/touchpad 1328 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit).
  • the processor 1318 may also output user data to the speaker/microphone 1324, the keypad 1326, and/or the display/touchpad 1328.
  • the processor 1318 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 1330 and/or the removable memory 1332.
  • the non-removable memory 1330 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device.
  • the removable memory 1332 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like.
  • SIM subscriber identity module
  • SD secure digital
  • the processor 1318 may access information from, and store data in, memory that is not physically located on the WTRU 1302, such as on a server or a home computer (not shown).
  • the processor 1318 may receive power from the power source 1334, and may be configured to distribute and/or control the power to the other components in the WTRU 1302.
  • the power source 1334 may be any suitable device for powering the WTRU 1302.
  • the power source 1334 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), and the like), solar cells, fuel cells, and the like.
  • the processor 1318 may also be coupled to the GPS chipset 1336, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 1302.
  • location information e.g., longitude and latitude
  • the WTRU 1302 may receive location information over the air interface 1316 from a base station (e.g., base stations 1314a, 1314b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations.
  • the WTRU 1302 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.
  • the processor 1318 may further be coupled to other peripherals 1338, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity.
  • the peripherals 1338 may include an accelerometer, an e- compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like.
  • the peripherals 1338 may include an accelerometer, an e- compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video
  • FIG. 13B depicts an example network entity 1390 that may be used within the communication system 1300 of FIG. 13A.
  • network entity 1390 includes a communication interface 1392, a processor 1394, and non-transitory data storage 1396, all of which are communicatively linked by a bus, network, or other communication path 1398.
  • Communication interface 1392 may include one or more wired communication interfaces and/or one or more wireless-communication interfaces. With respect to wired communication, communication interface 1392 may include one or more interfaces such as Ethernet interfaces, as an example. With respect to wireless communication, communication interface 1392 may include components such as one or more antennae, one or more transceivers/chipsets designed and configured for one or more types of wireless (e.g., LTE) communication, and/or any other components deemed suitable by those of skill in the relevant art. And further with respect to wireless communication, communication interface 1392 may be equipped at a scale and with a configuration appropriate for acting on the network side— as opposed to the client side— of wireless communications (e.g., LTE communications, Wi-Fi communications, and the like). Thus, communication interface 1392 may include the appropriate equipment and circuitry (including multiple transceivers) for serving multiple mobile stations, UEs, or other access terminals in a coverage area.
  • wireless communication interface 1392 may include the appropriate equipment and circuitry (including multiple transceivers) for serving
  • Processor 1394 may include one or more processors of any type deemed suitable by those of skill in the relevant art, some examples including a general-purpose microprocessor and a dedicated DSP.
  • Data storage 1396 may take the form of any non-transitory computer-readable medium or combination of such media, some examples including flash memory, read-only memory (ROM), and random-access memory (RAM) to name but a few, as any one or more types of non-transitory data storage deemed suitable by those of skill in the relevant art may be used.
  • data storage 1396 contains program instructions 1397 executable by processor 1394 for carrying out various combinations of the various network-entity functions described herein.
  • the network-entity functions described herein are carried out by a network entity having a structure similar to that of network entity 1390 of FIG. 13B. In some embodiments, one or more of such functions are carried out by a set of multiple network entities in combination, where each network entity has a structure similar to that of network entity 1390 of FIG. 13B.
  • network entity 1390 is— or at least includes— one or more of (one or more entities in) RAN 1303, (one or more entities in) RAN 1304, (one or more entities in) RAN 1305, (one or more entities in) core network 1306, (one or more entities in) core network 1307, (one or more entities in) core network 1309, base station 1314a, base station 1314b, Node-B 1340a, Node-B 1340b, Node-B 1340c, RNC 1342a, RNC 1342b, MGW 1344, MSC 1346, SGSN 1348, GGSN 1350, eNode B 1360a, eNode B 1360b, eNode B 1360c, MME 1362, serving gateway 1364, PDN gateway 1366, base station 1380a, base station 1380b, base station 1380c, ASN gateway 1382, MIP-HA 1384, AAA 1386, and gateway 1388. And certainly other network entities and/or combinations of network entities may
  • modules include hardware (e.g., one or more processors, one or more microprocessors, one or more microcontrollers, one or more microchips, one or more application-specific integrated circuits (ASICs), one or more field programmable gate arrays (FPGAs), one or more memory devices) deemed suitable by those of skill in the relevant art for a given implementation.
  • hardware e.g., one or more processors, one or more microprocessors, one or more microcontrollers, one or more microchips, one or more application-specific integrated circuits (ASICs), one or more field programmable gate arrays (FPGAs), one or more memory devices
  • Each described module may also include instructions executable for carrying out the one or more functions described as being carried out by the respective module, and those instructions may take the form of or include hardware (hardwired) instructions, firmware instructions, software instructions, and/or the like, and may be stored in any suitable non- transitory computer-readable medium or media, such as commonly referred to as RAM or ROM.
  • RAM or ROM any suitable non- transitory computer-readable medium or media, such as commonly referred to as RAM or ROM.
  • Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
  • ROM read only memory
  • RAM random access memory
  • register cache memory
  • semiconductor memory devices magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
  • a processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

Abstract

L'invention concerne des systèmes et des procédés qui se rapportent à la réception de données vidéo immersives et d'un flux audio multicanal, à la réception d'informations directionnelles concernant au moins une source sonore dominante, à la génération d'au moins un flux audio source dominant à partir du flux audio multicanal, à la détermination d'informations d'orientation de visualisation utilisateur, à la génération d'un audio à orientation ajustée par application d'un codage audio perceptuel au(x) flux audio source(s) dominant(s) sur la base des informations d'orientation de visualisation utilisateur et des informations directionnelles concernant la ou les sources sonores dominantes, et correspondant aux informations d'orientation de visualisation utilisateur, à la présentation d'un audio à orientation ajustée et d'une vue des données vidéo immersives.
PCT/US2018/017460 2017-02-15 2018-02-08 Filtrage contextuel pour audio immersif WO2018152004A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762459379P 2017-02-15 2017-02-15
US62/459,379 2017-02-15

Publications (1)

Publication Number Publication Date
WO2018152004A1 true WO2018152004A1 (fr) 2018-08-23

Family

ID=61557329

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/017460 WO2018152004A1 (fr) 2017-02-15 2018-02-08 Filtrage contextuel pour audio immersif

Country Status (1)

Country Link
WO (1) WO2018152004A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102099450B1 (ko) * 2018-11-14 2020-05-15 서울과학기술대학교 산학협력단 360°영상에서 영상과 음향의 정위 합치 방법
US11410666B2 (en) 2018-10-08 2022-08-09 Dolby Laboratories Licensing Corporation Transforming audio signals captured in different formats into a reduced number of formats for simplifying encoding and decoding operations
RU2798821C2 (ru) * 2018-10-08 2023-06-28 Долби Лабораторис Лайсэнзин Корпорейшн Преобразование звуковых сигналов, захваченных в разных форматах, в уменьшенное количество форматов для упрощения операций кодирования и декодирования

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100328419A1 (en) * 2009-06-30 2010-12-30 Walter Etter Method and apparatus for improved matching of auditory space to visual space in video viewing applications
WO2013093187A2 (fr) * 2011-12-21 2013-06-27 Nokia Corporation Lentille audio
WO2013117806A2 (fr) * 2012-02-07 2013-08-15 Nokia Corporation Signal audio spatial visuel
WO2016014254A1 (fr) * 2014-07-23 2016-01-28 Pcms Holdings, Inc. Système et procédé pour déterminer un contexte audio dans des applications de réalité augmentée
WO2017087650A1 (fr) * 2015-11-17 2017-05-26 Dolby Laboratories Licensing Corporation Suivi des mouvements de tête pour système et procédé de sortie binaurale paramétrique

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100328419A1 (en) * 2009-06-30 2010-12-30 Walter Etter Method and apparatus for improved matching of auditory space to visual space in video viewing applications
WO2013093187A2 (fr) * 2011-12-21 2013-06-27 Nokia Corporation Lentille audio
WO2013117806A2 (fr) * 2012-02-07 2013-08-15 Nokia Corporation Signal audio spatial visuel
WO2016014254A1 (fr) * 2014-07-23 2016-01-28 Pcms Holdings, Inc. Système et procédé pour déterminer un contexte audio dans des applications de réalité augmentée
WO2017087650A1 (fr) * 2015-11-17 2017-05-26 Dolby Laboratories Licensing Corporation Suivi des mouvements de tête pour système et procédé de sortie binaurale paramétrique

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Text of ISO/IEC CD 23000-20 Omnidirectional Media Application Format", 117. MPEG MEETING;16-1-2017 - 20-1-2017; GENEVA; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11),, no. N16636, 30 January 2017 (2017-01-30), XP030023307 *
"Thoughts on AR/VR and 3D Audio", 116. MPEG MEETING;17-10-2016 - 21-10-2016; CHENGDU ; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11),, no. N16385, 21 October 2016 (2016-10-21), XP030023057 *
JAN PLOGSTIES: "3D Audio AR/VR Use Case and Technology Considerations", 117. MPEG MEETING; 16-1-2017 - 20-1-2017; GENEVA; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11),, no. m39874, 11 January 2017 (2017-01-11), XP030068219 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11410666B2 (en) 2018-10-08 2022-08-09 Dolby Laboratories Licensing Corporation Transforming audio signals captured in different formats into a reduced number of formats for simplifying encoding and decoding operations
RU2798821C2 (ru) * 2018-10-08 2023-06-28 Долби Лабораторис Лайсэнзин Корпорейшн Преобразование звуковых сигналов, захваченных в разных форматах, в уменьшенное количество форматов для упрощения операций кодирования и декодирования
KR102099450B1 (ko) * 2018-11-14 2020-05-15 서울과학기술대학교 산학협력단 360°영상에서 영상과 음향의 정위 합치 방법

Similar Documents

Publication Publication Date Title
US11082662B2 (en) Enhanced audiovisual multiuser communication
CN109906616B (zh) 用于确定一或多个音频源的一或多个音频表示的方法、系统和设备
JP5990345B1 (ja) サラウンド音場の生成
US10397722B2 (en) Distributed audio capture and mixing
US8773589B2 (en) Audio/video methods and systems
US20160345092A1 (en) Audio Capture Apparatus
CN108432272A (zh) 用于回放控制的多装置分布式媒体捕获
US20130304244A1 (en) Audio alignment apparatus
KR102462067B1 (ko) Vr 오디오 처리 방법 및 대응하는 장치
WO2015008576A1 (fr) Dispositif de traitement d'informations et procédé de traitement d'informations
US20150245158A1 (en) Apparatus and method for reproducing recorded audio with correct spatial directionality
JP2022550372A (ja) オーディオビジュアルコンテンツについてバイノーラルイマーシブオーディオを作成するための方法及びシステム
US20170347218A1 (en) Method and apparatus for processing audio signal
CN110876051A (zh) 视频数据的处理,传输方法及装置,视频数据的处理系统
WO2013088208A1 (fr) Appareil d'alignement de scène audio
US11632643B2 (en) Recording and rendering audio signals
CN114067810A (zh) 音频信号渲染方法和装置
WO2018152004A1 (fr) Filtrage contextuel pour audio immersif
WO2018132385A1 (fr) Zoom audio dans un service de contenu vidéo audio naturel
US10623828B2 (en) Method and systems for generating and utilizing contextual watermarking
US10419865B2 (en) Methods and systems for rendering binaural audio content
WO2018039060A1 (fr) Systèmes et procédés d'approvisionnement de flux en direct
KR101999235B1 (ko) Mmtp기반 하이브리드 브로드캐스트 브로드밴드 서비스 제공 방법 및 시스템
WO2014016645A1 (fr) Appareil de scène audio partagée

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18708507

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18708507

Country of ref document: EP

Kind code of ref document: A1