WO2010131105A1 - Synchronization of audio or video streams - Google Patents
Synchronization of audio or video streams Download PDFInfo
- Publication number
- WO2010131105A1 WO2010131105A1 PCT/IB2010/001101 IB2010001101W WO2010131105A1 WO 2010131105 A1 WO2010131105 A1 WO 2010131105A1 IB 2010001101 W IB2010001101 W IB 2010001101W WO 2010131105 A1 WO2010131105 A1 WO 2010131105A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- stream
- signal stream
- value
- dependent
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
- H04N21/4341—Demultiplexing of audio and video streams
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/2368—Multiplexing of audio and video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/242—Synchronization processes, e.g. processing of PCR [Program Clock References]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/266—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
- H04N21/2665—Gathering content from different sources, e.g. Internet and satellite
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/27—Server based end-user applications
- H04N21/274—Storing end-user multimedia data in response to end-user request, e.g. network recorder
- H04N21/2743—Video hosting of uploaded data from client
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/482—End-user interface for program selection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8547—Content authoring involving timestamps for synchronizing content
Definitions
- the present invention relates to apparatus for the processing of audio or video signals.
- the invention further relates to, but is not limited to, apparatus for processing audio or video signals in mobile devices.
- Multiple 'feeds' may be found in sharing services for video and audio signals (such as those employed by YouTube).
- Such systems which are known and are widely used to share user generated content recorded and uploaded or up-streamed to a server and then downloaded or down-streamed to a viewing/listening user.
- Such systems rely on users recording and uploading or up-streaming a recording of an event using the recording facilities at hand to the user. This may typically be in the form of the camera and microphone arrangement of a mobile device such as a mobile phone. Often the event may be attended and recorded from more than one position by different recording users. The viewing/listening end user may then select one of the up-streamed or uploaded data to view or listen.
- the major obstacle preventing both of the above is that the user generated content recordings are typically made in an unsynchronised manner.
- each user may be recording using different sample frequencies, and/or encoding the recording a different bit rates, and/or even using different encoding formats.
- different users may be up-streaming over different parts of the network, or using different network parameters with a differing latency resulting.
- Synchronisation of acoustic sensors from documents such as "Synchronisation of Acoustic Sensors for Distributed Ad-hoc Audio Networks and its use for Blind Source Separation", proceedings of the IEEE Sixth Symposium on Multimedia Software Engineering (ISMSE'04) have indicated that synchronisation of acoustic sensors may be achieved using dedicated synchronisation signals to time stamp the recording prior to uploading.
- These synchronisation signals can be some type of beacon signal received at the sensor device, for example global positioning system data may be used as the beacon signal to provide a time stamp to be added to the recorded signal prior to uploading to the server in order that the server can synchronise the data.
- GPS receivers are itself problematic in that it may significantly increase the cost of the device, require significantly more power to operate the device or may not be permitted according to the local jurisdiction within which the devices operate.
- GPS signals may not be received well indoors and such systems as described above are not suitable for indoor operation.
- other received beacon signals such as using timing information from a wireless communications downlink may produce similarly poor results in indoor environments.
- an apparatus comprising: a frame value generator configured to generate for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream; an alignment generator configured to determine at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream; and a synchronizer configured to synchronize at least one signal stream to another signal stream dependent on the indicator values.
- the apparatus may be able to offer more than one possible signal stream and furthermore synthesise further signal streams.
- the further apparatus may therefore select and display by viewing and/or listening a signal stream or synthesized signal stream without putting significant processing load on the further apparatus.
- the apparatus may further comprise a receiver configured to receive each of the at least two signal streams from different recording apparatus.
- the alignment generator is preferably configured to generate a first indicator for each signal stream dependent on the correlation between the at least one signal value for a first signal stream and the at least one signal value for a second signal stream.
- the first indicator may comprise the ratio of the variance and mean values of the correlation between the at least one signal value for a first signal stream and the at least one signal value for a second signal stream.
- the alignment generator is preferably further configured to select the at least two streams with the lowest indicator value as a base stream, and the another signal stream is the base stream.
- the synchronizer is preferably further configured to synchronize at least one of: at least one signal stream audio stream to another signal stream audio stream dependent on the indicator values; at least one signal stream video stream to another signal stream video stream dependent on the indicator values; and at least one signal stream positional data stream to another signal stream positional data stream dependent on the indicator values.
- the apparatus may further comprise an output selector receiver configured to receive selection information indicating at least one of: a recording apparatus; a recording location; and a recording direction.
- the apparatus may further comprise an output selector processor, wherein the output selector processor may be configured to carry out at least one of: selecting one synchronized signal stream to be output dependent on the selection information; and combining at least two synchronized signal streams dependent on the selection information to form a compound signal stream to be output.
- an apparatus comprising: an input selector configured to select a display variable; a transmitter configured to transmit the display variable to a further apparatus; a receiver configured to receive a signal stream from the further apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus; and a display for displaying the signal stream.
- the apparatus may be able to select and display, by viewing and/or listening, a signal stream or synthesized signal stream without putting significant processing load on the apparatus.
- the display may comprise at least one of: a audio display for displaying audio signal components of the at least one signal stream received from a recording apparatus; and a video display for displaying video signal components of the at least one signal stream received from a recording apparatus.
- the display variable may comprise at least one of: a recording apparatus; a recording location; and a recording direction.
- a method comprising: generating for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream; determining at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream; and synchronizing at least one signal stream to another signal stream dependent on the indicator values.
- Generating the first indicator may comprise determining the ratio of the variance and mean values of the correlation between the at least one signal value for a first signal stream and the at least one signal value for a second signal stream.
- the method may further comprise selecting the at least two streams with the lowest indicator value as a base stream, and wherein the another signal stream is the base stream.
- Synchronizing may further comprise at least one of: synchronizing at least one signal stream audio stream to another signal stream audio stream dependent on the indicator values; synchronizing at least one signal stream video stream to another signal stream video stream dependent on the indicator values; and synchronizing at least one signal stream positional data stream to another signal stream positional data stream dependent on the indicator values.
- the method may further comprise receiving selection information indicating at least one of: a recording apparatus; a recording location; and a recording direction.
- the method may further comprise selecting one synchronized signal stream to be output dependent on the selection information.
- the method may further comprise combining at least two synchronized signal streams dependent on the selection information to form a compound signal stream to be output.
- a method comprising: selecting a display variable; transmitting the display variable to a further apparatus; receiving a signal stream from the further apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus; and displaying the signal stream.
- Displaying may comprise at least one of: displaying audio signal components of the at least one signal stream received from a recording apparatus; and displaying video signal components of the at least one signal stream received from a recording apparatus.
- the display variable may comprise at least one of: a recording apparatus; a recording location; and a recording direction.
- An electronic device may comprise apparatus as described above.
- a chipset may comprise apparatus as described above.
- a computer-readable medium encoded with instructions that, when executed by a computer, perform: generating for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream; determining at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream; and synchronizing at least one signal stream to another signal stream dependent on the indicator values.
- a computer-readable medium encoded with instructions that, when executed by a computer, perform: selecting a display variable; transmitting the display variable to a further apparatus; receiving a signal stream from the further apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus; and displaying the signal stream.
- an apparatus comprising: means for generating for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream; means for determining at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream; and means for synchronizing at least one signal stream to another signal stream dependent on the indicator values.
- an apparatus comprising: means for selecting a display variable; means for transmitting the display variable to a further apparatus; means for receiving a signal stream from the further apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus; and means for displaying the signal stream.
- Embodiments of the present invention aim to address the above problems.
- Figure 1 shows schematically an electronic device suitable for being employed in embodiments of the application
- Figure 2 shows schematically a multi-user free-viewpoint sharing services system which may encompass embodiments of the application
- Figure 3 shows a schematically network orientated view of the system shown in Figure 2 within which embodiments of the application may be implemented;
- Figure 4 shows schematically a method of operation of the system shown in Figure 2 within which embodiments of the application may be implemented;
- Figure 5 shows a schematic view of the server shown in Figure 3 in further detail
- Figure 6 shows schematically a method of operation of the server shown in Figures 5 according to embodiments of the application;
- Figure 7 shows schematically the synchronisation of signals in embodiments of the application
- Figure 8 shows schematically a method of operation of the server shown in Figure 5 according to further embodiments of the application.
- Figure 1 shows a schematic block diagram of an exemplary apparatus or electronic device 10, which may be used to record or listen to the audio signals and similarly to record or view the audio-visual images and data.
- the electronic device 10 may for example be a mobile terminal or user equipment of a wireless communication system.
- the electronic device 10 may comprise an audio subsystem 11.
- the audio subsystem may comprise a microphone(s) or inputs for microphones for audio signal capture and a loudspeaker(s) or outputs for loudspeaker(s) or headphones for audio signal output.
- the audio subsystem 11 may be linked via an audio analogue-to-digital converter (ADC) and digital-to-analogue converter (DAC) 14 to a processor 21.
- the electronic device 10 may further comprise a video subsystem 33.
- the video subsystem 33 may comprise a camera or input for a camera for image or moving image capture and a display or output for a display for video signal output.
- the video subsystem 33 may also be linked via a video analogue-to-digital converter (ADC) and digital-to-analogue converter (DAC) 32 to the processor 21.
- ADC video analogue-to-digital converter
- DAC digital-to-analogue converter
- the processor 21 may be further linked to a transceiver (TX/RX) 13, to a user interface
- the processor 21 may be configured to execute various program codes.
- the implemented program codes may comprise audio and/or video encoding code routines.
- the implemented program codes 23 may further comprise an audio and/or video decoding code.
- the implemented program codes 23 may be stored for example in the memory 22 for retrieval by the processor 21 whenever needed.
- the memory 22 may further provide a section 24 for storing data, for example data that has been encoded in accordance with the invention for playback.
- the memory 22 may also further provide in the same section 24 for storing data, data to be encoded prior to transmission.
- the encoding and decoding code may in embodiments of the invention be implemented in hardware or firmware.
- the user interface 15 may enable a user to input commands to the electronic device 10, for example via a keypad, and/or to obtain information from the electronic device 10, for example via the display.
- the transceiver 13 enables a communication with other electronic devices, for example via a wireless communication network.
- the transceiver 13 may in some embodiments of the invention be configured to communicate to other electronic devices by a wired connection.
- a user of the electronic device 10 may use the microphone 11 for audio signal capture that is to be transmitted to some other electronic device or apparatus or that is to be stored in the data section 24 of the memory 22.
- a corresponding application may be activated to this end by the user via the user interface 15.
- This application which may be run by the processor 21, causes the processor 21 to execute the encoding code stored in the memory 22.
- the user of the device may use the camera or video sub-system input for video signal capture for video images that are to be transmitted to some other electronic device or apparatus or to be stored in the data section 24 of the memory 22.
- a corresponding application may be activated to this end by the user via the user interface 15.
- This application which may be run by the processor 21, causes the processor 21 to execute the encoding code stored in the memory 22.
- the audio analogue-to-digital converter 14 may convert the input analogue audio signal into a digital audio signal and provides the digital audio signal to the processor 21.
- the video analogue-to-digital converter may convert an input analogue video signal into a digital signal format and provide the digital video signal to the processor 21.
- the processor 21 may then process the digital audio signal and/or digital video signal in the same way as described with reference to the description hereafter.
- the resulting audio and/or video bit stream is provided to the transceiver 13 for transmission to another electronic device.
- the coded data could be stored in the data section 24 of the memory 22, for instance for a later transmission or for a later presentation by the same electronic device 10.
- the electronic device 10 may also receive a bit stream with correspondingly encoded data from another electronic device via the transceiver 13.
- the processor 21 may execute decoding program code stored in the memory 22.
- the processor 21 may therefore decode the received data, and provide the decoded data to either of the audio or video sub systems such as audio DAC 14 or the video digital-to-analogue converter 32.
- the audio and/or video digital-to- analogue converter 14, 32 may convert the digital decoded data into analogue data and output the analogue audio signal to the loudspeakers 11, or analogue video signal to the display 33.
- the display and/or loudspeakers are themselves digital in operation, in which case the digital audio signal may be passed directly to the loudspeakers 11 and the digital video signal may be passed directly to the display 33.
- Execution of the decoding program code for audio and/or video signals maybe triggered as well by an application that has been called by the user via the user interface 15.
- the received encoded data could also be stored instead of an immediate presentation via the loudspeakers 11 and display 33 in the data section 24 of the memory 22, for instance for enabling a later presentation or a forwarding to further electronic device (not shown).
- the loudspeakers 11 may be supplemented with or replaced by a headphone set which may communicate to the electronic device 10 or apparatus wirelessly, for example by a Bluetooth profile to communicate via the transceiver 13, or using a conventional wired connection.
- Figure 2 shows a schematic overview of the system which may incorporate embodiments of the application.
- Figure 2 shows a plurality of recording electronic devices or recording apparatus 210, which may be apparatus 10 such as shown in Figure 1, configured to record or capture an activity 171 from various angles or directions as shown in Figure 2 by an associated beam 121.
- the recording apparatus or recording devices 210 closest to the activity 171 are shown in Figure 2 as recording apparatus 210a to 210g.
- Each of the closest recording apparatus 210a to 210g have an associated beam 121a to 121 g.
- Each of these recording devices or apparatus 210 may then upload or upstream the recorded signals.
- Figure 2 shows an arrow 191 representing the recorded signals which may be sent a transmission channel 101 to a server 103.
- the server 103 may then process the received recorded signals and transmit signal data associated with a 'selected viewpoint', in other words a single recorded or synthesized signal via a second transmission channel 105 to a end user or viewing apparatus or device 201a.
- the recording apparatus 210 configured to transmit the recording may be only a recording apparatus and the end user apparatus 201 configured to receive the recorded or synthesized signals associated with the selected viewpoint may be a viewing or listening apparatus only, however in other embodiments the recording apparatus 210 and/or end user apparatus 201 may each comprise both recording and viewing/listening capacity.
- Figure 3 shows schematically the component parts which may be used implementing the embodiments of the application.
- the system within which embodiments may operate may comprise recording apparatus/electronic devices 210 configured to operate as recording devices, uploading or upstreaming network/transmission channel 101, server or network apparatus 103, downloading or downstreaming network/transmission channel 105, and further end user apparatus/electronic devices 201 configured to operate as end users (viewers/listeners).
- FIG. 3 shows two recording apparatus/electronic devices 210 which may be configured to operate as recording devices, a first recording apparatus 210a and a n'th recording apparatus 21On, either of which may be connected to the server 103 via the uplink network/transmission channel 101.
- a first recording apparatus 210a and a n'th recording apparatus 21On, either of which may be connected to the server 103 via the uplink network/transmission channel 101.
- Figure 3 further shows the recording device 210 (which may be considered to be implemented by the apparatus 10 shown in Figure 1) comprising a recorder 211 configured to record and encode content in the form of a recorded signal.
- the recorded signal may be audio which may for example be captured by a microphone or microphone array, video images which may be captured by a camera, or audio-video data captured by microphone and camera.
- the recorder 211 may also perform encoding on the recorded signal data according to any suitable encoding methodology to generate an audio, video, or audio-video encoded signal.
- the operation of recording the content to form a recorded signal is shown in figure 4 by step 401.
- Figure 4 shows that more than one recording device 210 may record the event generating a recorded signal associated to the recording device's position and recording capability.
- the first recording device 210a carries out the recording as step 401a
- a second recording device (not shown) carries out the recording operation as step 401b
- the n'th recording device 21On carries out the recording operation as step 40 In
- the recording device 210 may also comprise an up-loader or transmitter 213 which formats the recorded signal values for transmission over the network/transmission channel 105. Furthermore in embodiments of the invention the up-loader or transmitter 213 may optionally encode positional information to assist the server in locating the recorded signal. This recorded signal and positional data 191 may be transmitted over the uplink transmission channel 101.
- the uploading (transmission) of the content data and optionally the positional data is shown in figure 4 by step 403. Figure 4 shows more than one recording device 210 may upload the recorded signals.
- the first recording device 210a carries out the uploading of first recording device recorded signal (and possibly positional) data 191a as step 403a
- a second recording device carries out the uploading operation of second recording device recorded signal (and possibly positional) data 191b as step 403b
- the n'th recording device 21On carries out the uploading operation of n'th recording device recorded signal (and possibly positional) data 191n as step 403n.
- the uplink network/transmission channel 101 may be a single network, for example a cellular communications link between the devices and the server or may be a channel operating across multiple channels, for example the data may pass over a channel operating over a wireless communications link to a internet gateway in the wireless communications system and then pass over an internet protocol related physical link to the server.
- the uplink network/transmission channel 101 may be a simplex network or part of a duplex or half-duplex network.
- the uplink network/communications channel 101 may comprise any one of a cellular communication network such as a third generation cellular communication system, a Wi-Fi communications network, or any suitable wireless or wired communication link.
- a cellular communication network such as a third generation cellular communication system, a Wi-Fi communications network, or any suitable wireless or wired communication link.
- the recording and uploading operations may occur concurrently or substantially concurrently so that the information received at the server may be considered to be real time or streamed data.
- the uploading operation may be carried out at a time substantially later than the recording operation and the information may be considered to be uploaded data.
- the server 103 in Figure 3 is shown in further detail in figure 5 and the operation of the server described with reference to figure 4 is described in further detail in figure 6. Where the same (or similar) components or operations are described the same reference number may be used.
- the server 103 may comprise a receiver or buffer 221 which may receive the recorded signal data (and in some embodiments the positioning data) 191 from the uplink network/communications channel.
- the receiver/buffer 221 may be any suitable receiver for receiving the recorded signal data (and in some embodiments the positioning data) according to the format used on the uplink network/communications channel 101.
- the receiver/buffer 221 may be configured to output the received recorded signal data and in some embodiments the positioning data to the synchronizer 223.
- the buffering may enable the server 103 to receive the recorded signal data from the recording devices 210 for the time reference required.
- This buffering may therefore be short term buffering, for example in real-time or near real-time streaming of the recorded signal data the buffering or storage of the recorded signal data may be in the order of seconds and the receiver/buffer may use solid state memory to buffer the recorded signal data.
- the receiver/buffer 21 1 may store the recorded signal data in a long term storage device, for example using magnetic media such as a RAID (Redundant Array of Independent Disks) storage. This long term storage may thus store the recorded signal data for an amount of time to enable several different capture devices to upload the recorded signal data at their convenience.
- the receiving/buffering operation is shown in both the system operation as shown in Figure 4 and the server operation as shown in Figure 6 as step 405.
- the server 103 further comprises a synchronizer 223.
- the synchronizer 223 receives at least two independently recorded signal data 191a, 191b,.., 191n from the receiver/buffer and outputs at least two synchronized recorded signal data signal.
- the synchronizer 223 does so by variable length framing of the recorded signal data, selecting a base recorded signal data and then aligning the remainder of the recorded signal data with the base recorded signal.
- the at least two synchronized recorded signal data are then passed to the processor/transmitter 227 for further processing.
- the synchronization operation is shown in Figure 4 by step 407. With respect to Figures 5 and 6 the configuration and operation of the synchronizer 223 may be described in further detail.
- the synchronizer 223 may comprise a variable length framer 301.
- the variable length framer may receive the at least two recorded signal data values 191 from the receiver/buffer 221.
- the variable length framer 301 may generate framed recorded signal values, by generating a single sample value from a first number of recorded signal data sample values.
- variable length framer 301 carrying out variable length framing may be according to the following equation
- vlf t (k) is the output sample value for the first number of recorded signal data samples for the i'th recorded signal data, f, the first number (otherwise known as the input mapping size), b,(k.f j +h) the input sample value for the (k.fj+h) sample.
- k.ij defines the first input sample index
- k.fj+f j -l the last input sample index.
- the index k defines the output sample or variable frame index.
- variable length framer 301 outputs N/fj output sample values each of which is formed dependent on fj adjacent input sample values.
- the index vlfjdx indicates the run time mode for the variable length framing.
- the value of vlfjdx is set to 0 where f J / / Q ⁇ 2ms , otherwise the value of vlfjdx is set to 1.
- the decision which mode is to be used depends on the duration of the f, If the duration of f, is less than 2 milliseconds the amplitude envelope calculation path may be selected, otherwise the energy envelope calculation path may be used. In other words, for small input mapping sizes it is more advantageous to track the amplitude envelope than the energy envelope. This may improve the resilience to false synchronization results.
- variable length framer 301 may then repeat the operation of variable length framing for each of the number of signals identified for the selected space to generate an output for each of the recorded signals so that the output samples for each of the recorded signals have the same number of sample values for the same time period.
- the operation of the variable length framer 301 may be such that in embodiments all of the recorded signal data are variable length framed in a serial format, in other words one after another. In some embodiments the operation of the variable length framer 301 may be such that more than one of the recorded signal data may be processed at the same time or substantially at the same time to speed up the variable length processing for the time period in question.
- the output of the variable length framer may be passed to the indicator selector 305.
- the synchronizer 223 may also comprise an indicator selector 303 configured to receive the variable length framed sample values for each of the selected space of recorded signal data and generate a time alignment indicator for each recorded data signal.
- the function may in embodiments of the invention be defined as
- T upper defines the upper limit for the delay in seconds.
- the upper limit may be set to two seconds as this has been found to be a fair value for the delay in practical recording and networking conditions.
- wSize describes the number of items used in the maximum calculation for each f j . In some embodiments, the number of items used in the maximisation calculation may be
- the above equation as performed in embodiments therefore returns the value "lag" which maximises the correlation between the signals.
- tCorr ⁇ J ⁇ k) xCorr ⁇ ⁇ vlf l ] ,vlf k J 0 ⁇ i ⁇ U, 0 ⁇ k ⁇ U, 0 ⁇ j ⁇ M may provide the correlation value.
- the indicator selector 303 may then pass the generated time alignment indicator (tlnd) values to the base signal determiner 305.
- the synchronizer 223 may also comprise a base signal determiner 305 which may be configured to receive the time alignment indicator values from the indicator selector 303 and indicate which of the received recorded signal data is suitable to synchronize the remainder of the recorded signal data to.
- the base signal determiner 305 may first generate a series of time aligned indicators from the time alignment indicator values.
- the time aligned indicators may be a time aligned index average, a time aligned index variance and a time aligned index ratio which may be generated by the base signal determiner 305 according to the following three equations. tlndAve, y 0 ⁇ i ⁇ U, 0 ⁇ j ⁇ U
- tIndVar t — • ⁇ [tlnd l k ⁇ j) - tIndAve t J ), 0 ⁇ i ⁇ U, 0 ⁇ j ⁇ U
- the base signal determiner 305 may sort the indicator tlndRatio in increasing order of importance. For example the base signal determiner 305 may sort the indicator tlndRatio so that the ratio value having the smallest value appears first, the ratio value having the second smallest value appears second and so on. The base signal determiner 305 may output the sorted indicator as the ratio vector tlndRatioSorted. The base signal determiner 305 may also record the order of the time indicator values tlndRatio by generating an index tlndRatioSortedlndex which contains the corresponding original position indices for the sorted result. Thus if the smallest ratio value was found at index 2, the next smallest at index 5, and so on the base signal determiner 305 may generate a vector with the values [ 2, 5, ...].
- the base signal determiner 305 may then use the generated indicators to determine the base signal according to the following equation:
- the determination of the base signal is shown in Figure 6 by step 4075.
- the base signal determiner 305 may also determine the time alignment factors for the other recorded signal data from the average time alignment indicator values according to the following equation:
- time _ align ⁇ i tIndAve base _ ngnal _ ldx ⁇ , 0 ⁇ i ⁇ U, i ⁇ base _ signal _ idx
- the base signal determiner 305 may then pass the base signal indicator value base_signal_idx and also the time alignment factor values time_align for the remaining recorded signals to the signal synchronizer 307.
- the synchronizer 223 may also comprise a signal synchronizer 307 configured to receive the recorded signals via the receiver/buffer 221 and the base signal indicator value and the time alignment factor values for the remaining recorded signals. The signal synchroniser 307 may then synchronize the recorded signals by adding the time alignment value to the current time indices of each of the signals.
- Figure 7 shows four recorded signals. These recorded signals may be a first signal (signal 1) 501, a second signal (signal 2) 503, a third signal (signal 3) 505 and a fourth signal (signal 4) 507.
- the signal synchronizer 307 may receive a base signal indicator value base_signal_idx with a value of 3 561, and furthermore receive time_align values for the first signal Time_align( 1 ) 551, second signal Time_align(2), third signal Time_align(3) which is equal to zero, and fourth signal Time_align(4).
- the signal synchronizer 307 may delay the first signal 501 by the Time_align(l) 551 value to generate a synchronized first signal 511.
- the signal synchronizer 307 may delay the second signal 503 by the Time_align(2) 553 value to generate a synchronized third signal 513.
- the signal synchronizer 307 may also delay the fourth signal 507 by the Time_align(4) 557 value to generate a synchronized third signal 517.
- the synchronized recorded data signals may then be output to the processor/transmitter 227.
- the apparatus of the server may be considered to comprise a frame value generator which may generate for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream.
- the same server apparatus may also comprise an alignment generator to determine at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream.
- the server apparatus may comprise a synchronizer to synchronize at least one signal stream to another signal stream dependent on the indicator values. The operation of synchronising the signals is shown in Figure 6 by operation 4079.
- the server 103 may comprise a viewpoint receiver/buffer 225.
- the viewpoint receiver/buffer 225 may be configured to receive from the end user apparatus 201 data in the form of positional or recording viewpoint information signal - in other words the apparatus may communicate a request to hear or view the event from a specific recording device or from a specified position.
- the viewpoint it would be understood that this applies to audio only as well as audio- visual data.
- the data may indicate for selection or synthesis a specific recording device from which audio or audio-visual recorded signal data is to be selected or a position such as a longitude and latitude or other geographical co-ordinate system.
- the viewpoint selection data may be received from the end user apparatus via the downlink network/transmission channel 105.
- the downlink network/transmission channel 105 may be a single network, for example a cellular communications link between the end user apparatus 201 and the server 103 or may be a channel operating across multiple channels, for example the data may pass over a channel operating over a wireless communications link to a internet gateway in the wireless communications system and then pass over an internet protocol related physical link to the server 103.
- the viewpoint selection is shown in Figure 4 by step 408.
- the downlink network/communications channel 105 may also comprise any one of a cellular communication network such as a third generation cellular communication system, a Wi- Fi communications network, or any suitable wireless or wired communication link.
- the uplink network/communications channel 101 and the downlink network/communications channel 105 are the same network/communications channel.
- the uplink network/communications channel 101 and the downlink network/communications channel 105 share parts of the same network/communications channel.
- both the downlink 105 network/communication channel is a pair of simplex channels, or a duplex or half duplex channel configured to carry information to and from the server either at the same time or substantially at the same time.
- the processor/transmitter 227 may comprise a viewpoint synthesizer or selector signal processor 309.
- the viewpoint synthesizer or selector signal processor 309 may receive the viewpoint selection information from any end user apparatus and then select or synthesize suitable audio or audio- visual data to be sent to the end user apparatus to provide the end user apparatus 201 with the content experience desired.
- the signal processor 309 selects the synchronized recorded signal data from the recording apparatus indicated.
- the signal processor 309 selects the synchronized recorded signal data which is positioned and/or directed closest to the desired position/direction.
- specific location/direction are specified a synthesis of more than one nearby synchronized recorded signal data may be generated.
- the signal processor 309 may generate a weighted averaging of the synchronized recorded signal data nearby the specific location/direction may be used to provided an estimate of the audio or audio-visual data which may have been recorded at the specified position.
- the signal processor 309 may compensate for the missing or corrupted recorded signal data by synthesizing the recorded signal data from the synchronized recorded signal data from neighbouring recording apparatus 210.
- the signal processor 309 may in some embodiments determine the nearby and neighbouring recording apparatus 210 and further identify the closest recording apparatus to the desired position by using the positional data provided by the recording devices.
- the output of the signal processor 309 in the form of desired (in other words selected recorded or synthesized) signal data 195 may be passed to the transmitter/buffer 311.
- step 409 The selection/processing of the recorded signal data is shown in figure 4 by step 409.
- the processor/transmitter 227 may further comprise a transmitter/buffer configured to transmit the desired signal data 195 via the downlink network/transmission channel 105 which has been described previously.
- the server 103 may therefore be connected via the downlink network/transmission channel 105 to end user apparatus (or devices) 201 configured to generate viewpoint or selection information and receive the desired signal data associated with the viewpoint or selection information.
- end user apparatus 201 may receive signal data from the server 103 and transmit data to the server 103 via the downlink network/transmission channel 105.
- the end user apparatus 201 such the first end user apparatus 201a may comprise a viewpoint selector and transmitter 231a.
- the viewpoint selector and transmitter 231a may use the user interface 15 where the end user apparatus may be the apparatus shown in Figure 1 to allow the user to specify the desired viewing position and/or desired recording device.
- the viewpoint selector and transmitter 231a may then encode this information in order that it may be transmitted via the downlink network/communications channel 105 to the server 103.
- the end user apparatus 201 such as the first end user apparatus 201a may also comprise a receiver 233a configured to receive the desired signal data as described above via the down-link network/communications channel 105.
- the receiver may decode the transmitted desired signal (in other words a selected synchronized recorded signal or synthesized signal from the synchronized recorded signals) to generate content data in a format suitable for viewing.
- the end user apparatus 201 such as the first end user apparatus 201a may also comprise a viewer 235a configured to display or output the desired signal data as described above.
- the end user apparatus 201 may be the apparatus as shown in Figure 1 the audio stream may be processed by the audio ADC/DAC 14 and then passes to the loudspeaker 11 and the video stream may be processed by the video ADC/DAC 32 and output via the display 33.
- the viewing/listening of the desired signal data is shown in Figure 4 by step 415. Therefore in summary the apparatus in the form of the end user apparatus may be considered to comprise an input selector configured to select a display variable.
- the display variable may be an indication of at least one of a recording apparatus, a recording location, which may or not be a marked as a recording apparatus location, and a recording direction or orientation.
- the apparatus in the form of the end user apparatus may furthermore be summarised as being considered to comprise a transmitter configured to transmit the display variable to a further apparatus, wherein the further apparatus may be the server as described previously.
- the same apparatus may be considered to comprise a receiver configured to receive a signal stream from the server apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus.
- the same, end user, apparatus may also be summarized as comprising a display for displaying the signal stream.
- End users may in embodiments select between recorded signal data from different recording devices with improved timing and cueing performance as the recorded signal data is synchronized.
- the generation of synthesized signal data using the synchronized recorded signal data allows the end user to experience the content from locations not originally recorded or improve on the recorded data from a single source to allow for deficiencies in the original signal data - such as loss of recorded signal data due to network issues, failure to record due to partial or total device failure, or poorly recorded signal data due to interference or noise.
- Figure 8 the operation of further embodiments of the server with respect to buffering and synchronization of the recorded signal data is shown. In these embodiments rather than synchronizing the recorded signal data using a single time alignment indicator further time alignment indicators may be generated for further time instances or periods.
- the operations similar to the operations shown in steps 405, and 4071 to 4079 are marked with the same references.
- the buffer/receiver 221 may receive the recorded signal data streams in step 405.
- the buffered recorded signal may be defined as b n (where the subindex n describes the time instant from which the recorded signal is buffered.
- the subindex n is an element of the set G, in other words the number of different time instants to be used to determine the base signal.
- the starting position for each buffering location may be described by Tloc n , that is the signal buffered starting from Tloc n • T seconds.
- variable length framer 301 may perform a variable length framing operation in step 4071 on each of the sub-periods using the previously described methods.
- the indicator selector 303 may calculate the time alignment indicators in step 4073 by applying the following equations to determine the time index average, the time index variance and the time index ratio for all the sub-periods according to the following equations:
- the base signal determiner 305 may in addition to the determination of the base signal and the generation of the time alignment factors may carry out an additional precursor step and make a decision on whether to include a new time instant or period to the calculations. This decision may be for example according to the following expressions:
- the base signal determiner 305 may make the above decision with a condition which would limit the number of new time instants to be added to some predefined threshold to disable a potential infinite loop of iterations being carried out.
- step 701 The decision of whether a new time location is to be added is shown in figure 8 by step 701. Where a new time location is to be added the base signal determiner 307 may add a new time period to G, in other words the process performs another check at a different time than before and the loop passes back to step 407. This addition of a new time instant to G can be seen in Figure 8 as step 703.
- the base signal determiner 305 may then perform the operation of determining the base signal based on the indicators as described previously. The determination of the base signal is shown in Figure 8 by step 4075.
- base signal determiner 305 may also determine the time alignment factors for the remaining signals as described previously and shown in Figure 8 in step 4077.
- the signal synchronizer 307 may then use this base signal determination and the time alignment factors for the remaining recorded signals to synchronize the recorded signals as described previously and shown in Figure 8 in step 4079.
- the loop is disabled or not present and time alignment indicators are determined for at least two of the sub-sets of the total time periods using the equations described above in order to improve the synchronization between recorded signals as the indicators are determined for different time periods.
- embodiments may also be applied to audio-video signals where the audio signal components of the recorded data are processed in terms of the determining of the base signal and the determination of the time alignment factors for the remaining signals and the video signal components may be synchronised using the above embodiments of the invention.
- the video parts may be synchronised using the audio synchronisation information.
- user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers.
- PLMN public land mobile network
- the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
- some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
- firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
- While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
- the embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware.
- any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
- the software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
- the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
- the data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
- Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
- the design of integrated circuits is by and large a highly automated process.
- Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
- Programs such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
- the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- Astronomy & Astrophysics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
Apparatus comprising a frame value generator configured to generate for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream, an alignment generator configured to determine at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream, and a synchronizer configured to synchronize at least one signal stream to another signal stream dependent on the indicator values.
Description
Synchronization of audio or video streams
BACKGROUND
The present invention relates to apparatus for the processing of audio or video signals. The invention further relates to, but is not limited to, apparatus for processing audio or video signals in mobile devices.
SUMMARY OF THE INVENTION
Viewing recorded or streamed audio-video or audio content is well known. Commercial broadcasters covering an event often have more than one recording device (video- camera/microphone) and a programme director will select a 'mix' where an output from a recording device or combination of recording devices is selected for transmission. Such systems are problematic and lack flexibility in that to reduce transmission resources even 'interactive' services only offer a very limited selection of possible feeds or recording positions.
Multiple 'feeds' may be found in sharing services for video and audio signals (such as those employed by YouTube). Such systems, which are known and are widely used to share user generated content recorded and uploaded or up-streamed to a server and then downloaded or down-streamed to a viewing/listening user. Such systems rely on users recording and uploading or up-streaming a recording of an event using the recording facilities at hand to the user. This may typically be in the form of the camera and microphone arrangement of a mobile device such as a mobile phone. Often the event may be attended and recorded from more than one position by different recording users. The viewing/listening end user may then select one of the up-streamed or uploaded data to view or listen.
Furthermore where there may be multiple user generated content for the same event it may be possible to generate a three dimensional rendering of the event by combining various different recordings from different users or may be possible to improve upon a single user generated content as it may be possible to reduce background noise by combining different users content in different amounts to attempt to overcome local interference.
However it currently is not possible to seamlessly switch between one recorded feed and another feed. Similarly it is currently not possible using such user generated content systems to combine the content to attempt to remove interference or improve on the listening or viewing experience.
The major obstacle preventing both of the above is that the user generated content recordings are typically made in an unsynchronised manner. In other words each user may be recording using different sample frequencies, and/or encoding the recording a different bit rates, and/or even using different encoding formats. Furthermore even in 'real-time' streaming
situations different users may be up-streaming over different parts of the network, or using different network parameters with a differing latency resulting.
The effect of which is that as each user independently records and up-streams (or uploads) the content to the server, there is a time delay associated with the process of each recording that is not constant for each recording user.
Synchronisation of acoustic sensors from documents such as "Synchronisation of Acoustic Sensors for Distributed Ad-hoc Audio Networks and its use for Blind Source Separation", proceedings of the IEEE Sixth Symposium on Multimedia Software Engineering (ISMSE'04) have indicated that synchronisation of acoustic sensors may be achieved using dedicated synchronisation signals to time stamp the recording prior to uploading. These synchronisation signals can be some type of beacon signal received at the sensor device, for example global positioning system data may be used as the beacon signal to provide a time stamp to be added to the recorded signal prior to uploading to the server in order that the server can synchronise the data. However provision of GPS receivers is itself problematic in that it may significantly increase the cost of the device, require significantly more power to operate the device or may not be permitted according to the local jurisdiction within which the devices operate. Furthermore GPS signals may not be received well indoors and such systems as described above are not suitable for indoor operation. Similarly other received beacon signals such as using timing information from a wireless communications downlink may produce similarly poor results in indoor environments.
There is provided according to the invention an apparatus comprising: a frame value generator configured to generate for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream; an alignment generator configured to determine at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream; and a synchronizer configured to synchronize at least one signal stream to another signal stream dependent on the indicator values.
Thus in embodiments the apparatus may be able to offer more than one possible signal stream and furthermore synthesise further signal streams. In such embodiments the further apparatus may therefore select and display by viewing and/or listening a signal stream or synthesized signal stream without putting significant processing load on the further apparatus.
The apparatus may further comprise a receiver configured to receive each of the at least two signal streams from different recording apparatus.
The alignment generator is preferably configured to generate a first indicator for each signal stream dependent on the correlation between the at least one signal value for a first signal stream and the at least one signal value for a second signal stream.
The first indicator may comprise the ratio of the variance and mean values of the correlation between the at least one signal value for a first signal stream and the at least one signal value for a second signal stream.
The alignment generator is preferably further configured to select the at least two streams with the lowest indicator value as a base stream, and the another signal stream is the base stream.
The synchronizer is preferably further configured to synchronize at least one of: at least one signal stream audio stream to another signal stream audio stream dependent on the indicator values; at least one signal stream video stream to another signal stream video stream dependent on the indicator values; and at least one signal stream positional data stream to another signal stream positional data stream dependent on the indicator values.
The apparatus may further comprise an output selector receiver configured to receive selection information indicating at least one of: a recording apparatus; a recording location; and a recording direction.
The apparatus may further comprise an output selector processor, wherein the output selector processor may be configured to carry out at least one of: selecting one synchronized signal stream to be output dependent on the selection information; and combining at least two synchronized signal streams dependent on the selection information to form a compound signal stream to be output.
According to a second aspect of the invention there is provided an apparatus comprising: an input selector configured to select a display variable; a transmitter configured to transmit the display variable to a further apparatus; a receiver configured to receive a signal stream from the further apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus; and a display for displaying the signal stream. Thus in embodiments the apparatus may be able to select and display, by viewing and/or listening, a signal stream or synthesized signal stream without putting significant processing load on the apparatus.
The display may comprise at least one of: a audio display for displaying audio signal components of the at least one signal stream received from a recording apparatus; and a video display for displaying video signal components of the at least one signal stream received from a recording apparatus.
The display variable may comprise at least one of: a recording apparatus; a recording location; and a recording direction.
According to a third aspect of the invention there is provided a method comprising: generating for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream; determining at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal
values for the signal stream; and synchronizing at least one signal stream to another signal stream dependent on the indicator values.
The method may further comprise receiving each of the at least two signal streams from different recording apparatus. Determining indicator values may further comprise generating a first indicator for each signal stream dependent on the correlation between the at least one signal value for a first signal stream and the at least one signal value for a second signal stream.
Generating the first indicator may comprise determining the ratio of the variance and mean values of the correlation between the at least one signal value for a first signal stream and the at least one signal value for a second signal stream.
The method may further comprise selecting the at least two streams with the lowest indicator value as a base stream, and wherein the another signal stream is the base stream.
Synchronizing may further comprise at least one of: synchronizing at least one signal stream audio stream to another signal stream audio stream dependent on the indicator values; synchronizing at least one signal stream video stream to another signal stream video stream dependent on the indicator values; and synchronizing at least one signal stream positional data stream to another signal stream positional data stream dependent on the indicator values.
The method may further comprise receiving selection information indicating at least one of: a recording apparatus; a recording location; and a recording direction. The method may further comprise selecting one synchronized signal stream to be output dependent on the selection information.
The method may further comprise combining at least two synchronized signal streams dependent on the selection information to form a compound signal stream to be output.
According to a fourth aspect of the invention there is provided a method comprising: selecting a display variable; transmitting the display variable to a further apparatus; receiving a signal stream from the further apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus; and displaying the signal stream.
Displaying may comprise at least one of: displaying audio signal components of the at least one signal stream received from a recording apparatus; and displaying video signal components of the at least one signal stream received from a recording apparatus.
The display variable may comprise at least one of: a recording apparatus; a recording location; and a recording direction.
An electronic device may comprise apparatus as described above. A chipset may comprise apparatus as described above.
According to a fifth aspect of the invention there is provided a computer-readable medium encoded with instructions that, when executed by a computer, perform: generating for
each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream; determining at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream; and synchronizing at least one signal stream to another signal stream dependent on the indicator values.
According to a sixth aspect of the invention there is provided a computer-readable medium encoded with instructions that, when executed by a computer, perform: selecting a display variable; transmitting the display variable to a further apparatus; receiving a signal stream from the further apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus; and displaying the signal stream.
According to a seventh aspect of the invention there is provided an apparatus comprising: means for generating for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream; means for determining at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream; and means for synchronizing at least one signal stream to another signal stream dependent on the indicator values.
According to an eighth aspect of the invention there is provided an apparatus comprising: means for selecting a display variable; means for transmitting the display variable to a further apparatus; means for receiving a signal stream from the further apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus; and means for displaying the signal stream.
Embodiments of the present invention aim to address the above problems.
BRIEF DESCRIPTION OF THE DRAWINGS
For better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:
Figure 1 shows schematically an electronic device suitable for being employed in embodiments of the application; Figure 2 shows schematically a multi-user free-viewpoint sharing services system which may encompass embodiments of the application;
Figure 3 shows a schematically network orientated view of the system shown in Figure 2 within which embodiments of the application may be implemented;
Figure 4 shows schematically a method of operation of the system shown in Figure 2 within which embodiments of the application may be implemented;
Figure 5 shows a schematic view of the server shown in Figure 3 in further detail;
Figure 6 shows schematically a method of operation of the server shown in Figures 5 according to embodiments of the application;
Figure 7 shows schematically the synchronisation of signals in embodiments of the application; and Figure 8 shows schematically a method of operation of the server shown in Figure 5 according to further embodiments of the application.
DETAILED DESCRIPTION OF THE DRAWINGS
The following describes in further detail suitable apparatus and possible mechanisms for the provision of effective synchronisation for audio signals and similarly audio-visual images and data. In this regard reference is first made to Figure 1 which shows a schematic block diagram of an exemplary apparatus or electronic device 10, which may be used to record or listen to the audio signals and similarly to record or view the audio-visual images and data.
The electronic device 10 may for example be a mobile terminal or user equipment of a wireless communication system.
The electronic device 10 may comprise an audio subsystem 11. The audio subsystem may comprise a microphone(s) or inputs for microphones for audio signal capture and a loudspeaker(s) or outputs for loudspeaker(s) or headphones for audio signal output. The audio subsystem 11 may be linked via an audio analogue-to-digital converter (ADC) and digital-to-analogue converter (DAC) 14 to a processor 21. The electronic device 10 may further comprise a video subsystem 33. The video subsystem 33 may comprise a camera or input for a camera for image or moving image capture and a display or output for a display for video signal output. The video subsystem 33 may also be linked via a video analogue-to-digital converter (ADC) and digital-to-analogue converter (DAC) 32 to the processor 21. The processor 21 may be further linked to a transceiver (TX/RX) 13, to a user interface
(UI) 15 and to a memory 22.
The processor 21 may be configured to execute various program codes. The implemented program codes may comprise audio and/or video encoding code routines. The implemented program codes 23 may further comprise an audio and/or video decoding code. The implemented program codes 23 may be stored for example in the memory 22 for retrieval by the processor 21 whenever needed. The memory 22 may further provide a section 24 for storing data, for example data that has been encoded in accordance with the invention for playback. The memory 22 may also further provide in the same section 24 for storing data, data to be encoded prior to transmission. The encoding and decoding code may in embodiments of the invention be implemented in hardware or firmware.
The user interface 15 may enable a user to input commands to the electronic device 10, for example via a keypad, and/or to obtain information from the electronic device 10, for example via the display. The transceiver 13 enables a communication with other electronic devices, for example via a wireless communication network. The transceiver 13 may in some embodiments of the invention be configured to communicate to other electronic devices by a wired connection.
It is to be understood again that the structure of the electronic device 10 could be supplemented and varied in many ways.
A user of the electronic device 10 may use the microphone 11 for audio signal capture that is to be transmitted to some other electronic device or apparatus or that is to be stored in the data section 24 of the memory 22. A corresponding application may be activated to this end by the user via the user interface 15. This application, which may be run by the processor 21, causes the processor 21 to execute the encoding code stored in the memory 22.
Similarly the user of the device may use the camera or video sub-system input for video signal capture for video images that are to be transmitted to some other electronic device or apparatus or to be stored in the data section 24 of the memory 22. A corresponding application may be activated to this end by the user via the user interface 15. This application, which may be run by the processor 21, causes the processor 21 to execute the encoding code stored in the memory 22.
The audio analogue-to-digital converter 14 may convert the input analogue audio signal into a digital audio signal and provides the digital audio signal to the processor 21. Similarly the video analogue-to-digital converter may convert an input analogue video signal into a digital signal format and provide the digital video signal to the processor 21.
The processor 21 may then process the digital audio signal and/or digital video signal in the same way as described with reference to the description hereafter. The resulting audio and/or video bit stream is provided to the transceiver 13 for transmission to another electronic device. Alternatively, the coded data could be stored in the data section 24 of the memory 22, for instance for a later transmission or for a later presentation by the same electronic device 10.
The electronic device 10 may also receive a bit stream with correspondingly encoded data from another electronic device via the transceiver 13. In this case, the processor 21 may execute decoding program code stored in the memory 22. The processor 21 may therefore decode the received data, and provide the decoded data to either of the audio or video sub systems such as audio DAC 14 or the video digital-to-analogue converter 32. The audio and/or video digital-to- analogue converter 14, 32 may convert the digital decoded data into analogue data and output the analogue audio signal to the loudspeakers 11, or analogue video signal to the display 33. It would be appreciated that in some embodiments the display and/or loudspeakers are themselves digital in operation, in which case the digital audio signal may be passed directly to the loudspeakers 11
and the digital video signal may be passed directly to the display 33. Execution of the decoding program code for audio and/or video signals maybe triggered as well by an application that has been called by the user via the user interface 15.
The received encoded data could also be stored instead of an immediate presentation via the loudspeakers 11 and display 33 in the data section 24 of the memory 22, for instance for enabling a later presentation or a forwarding to further electronic device (not shown).
In some embodiments of the invention the loudspeakers 11 may be supplemented with or replaced by a headphone set which may communicate to the electronic device 10 or apparatus wirelessly, for example by a Bluetooth profile to communicate via the transceiver 13, or using a conventional wired connection.
Although the above apparatus has been described with respect to being suitable for both the upstream operations, in other words recording the event and transmitting the recording, and the downstream operations, in other words receiving the recording and playing the received recording, it would be understood that in some embodiments of the application separate apparatus may perform the upstream and downstream operations.
Figure 2 shows a schematic overview of the system which may incorporate embodiments of the application. Figure 2 shows a plurality of recording electronic devices or recording apparatus 210, which may be apparatus 10 such as shown in Figure 1, configured to record or capture an activity 171 from various angles or directions as shown in Figure 2 by an associated beam 121. The recording apparatus or recording devices 210 closest to the activity 171 are shown in Figure 2 as recording apparatus 210a to 210g. Each of the closest recording apparatus 210a to 210g have an associated beam 121a to 121 g.
Each of these recording devices or apparatus 210 may then upload or upstream the recorded signals. Figure 2 shows an arrow 191 representing the recorded signals which may be sent a transmission channel 101 to a server 103. The server 103 may then process the received recorded signals and transmit signal data associated with a 'selected viewpoint', in other words a single recorded or synthesized signal via a second transmission channel 105 to a end user or viewing apparatus or device 201a. As indicated above the recording apparatus 210 configured to transmit the recording may be only a recording apparatus and the end user apparatus 201 configured to receive the recorded or synthesized signals associated with the selected viewpoint may be a viewing or listening apparatus only, however in other embodiments the recording apparatus 210 and/or end user apparatus 201 may each comprise both recording and viewing/listening capacity.
With respect to Figures 3 and 4, the system shown in Figure 2 is described in further detail. Figure 3 shows schematically the component parts which may be used implementing the embodiments of the application.
The system within which embodiments may operate may comprise recording apparatus/electronic devices 210 configured to operate as recording devices, uploading or upstreaming network/transmission channel 101, server or network apparatus 103, downloading or downstreaming network/transmission channel 105, and further end user apparatus/electronic devices 201 configured to operate as end users (viewers/listeners).
The example shown in Figure 3 shows two recording apparatus/electronic devices 210 which may be configured to operate as recording devices, a first recording apparatus 210a and a n'th recording apparatus 21On, either of which may be connected to the server 103 via the uplink network/transmission channel 101. To further understand concepts of the application an example from recording at the recording apparatus 210 to displaying a recorded or synthesized signal associated with a selected viewpoint at the end user apparatus 201 is described hereafter.
Figure 3 further shows the recording device 210 (which may be considered to be implemented by the apparatus 10 shown in Figure 1) comprising a recorder 211 configured to record and encode content in the form of a recorded signal. The recorded signal may be audio which may for example be captured by a microphone or microphone array, video images which may be captured by a camera, or audio-video data captured by microphone and camera. The recorder 211 may also perform encoding on the recorded signal data according to any suitable encoding methodology to generate an audio, video, or audio-video encoded signal. The operation of recording the content to form a recorded signal is shown in figure 4 by step 401. Figure 4 shows that more than one recording device 210 may record the event generating a recorded signal associated to the recording device's position and recording capability. Thus the first recording device 210a carries out the recording as step 401a, a second recording device (not shown) carries out the recording operation as step 401b, and the n'th recording device 21On carries out the recording operation as step 40 In.
The recording device 210 may also comprise an up-loader or transmitter 213 which formats the recorded signal values for transmission over the network/transmission channel 105. Furthermore in embodiments of the invention the up-loader or transmitter 213 may optionally encode positional information to assist the server in locating the recorded signal. This recorded signal and positional data 191 may be transmitted over the uplink transmission channel 101. The uploading (transmission) of the content data and optionally the positional data is shown in figure 4 by step 403. Figure 4 shows more than one recording device 210 may upload the recorded signals. Thus the first recording device 210a carries out the uploading of first recording device recorded signal (and possibly positional) data 191a as step 403a, a second recording device carries out the uploading operation of second recording device recorded signal (and possibly positional) data 191b as step 403b, and the n'th recording device 21On carries out
the uploading operation of n'th recording device recorded signal (and possibly positional) data 191n as step 403n.
It would be appreciated that any number of recording devices 210 may be connected to the server 103. Furthermore it would be appreciated that in embodiments of the application the uplink network/transmission channel 101 may be a single network, for example a cellular communications link between the devices and the server or may be a channel operating across multiple channels, for example the data may pass over a channel operating over a wireless communications link to a internet gateway in the wireless communications system and then pass over an internet protocol related physical link to the server. The uplink network/transmission channel 101 may be a simplex network or part of a duplex or half-duplex network.
The uplink network/communications channel 101 may comprise any one of a cellular communication network such as a third generation cellular communication system, a Wi-Fi communications network, or any suitable wireless or wired communication link. In some embodiments the recording and uploading operations may occur concurrently or substantially concurrently so that the information received at the server may be considered to be real time or streamed data. In other embodiments the uploading operation may be carried out at a time substantially later than the recording operation and the information may be considered to be uploaded data.
The server 103 in Figure 3 is shown in further detail in figure 5 and the operation of the server described with reference to figure 4 is described in further detail in figure 6. Where the same (or similar) components or operations are described the same reference number may be used.
The server 103 may comprise a receiver or buffer 221 which may receive the recorded signal data (and in some embodiments the positioning data) 191 from the uplink network/communications channel. The receiver/buffer 221 may be any suitable receiver for receiving the recorded signal data (and in some embodiments the positioning data) according to the format used on the uplink network/communications channel 101. The receiver/buffer 221 may be configured to output the received recorded signal data and in some embodiments the positioning data to the synchronizer 223. The buffering may enable the server 103 to receive the recorded signal data from the recording devices 210 for the time reference required. This buffering may therefore be short term buffering, for example in real-time or near real-time streaming of the recorded signal data the buffering or storage of the recorded signal data may be in the order of seconds and the receiver/buffer may use solid state memory to buffer the recorded signal data. However where the recording devices 210 themselves store and upload the recorded signal data at a later time the receiver/buffer 21 1 may store the recorded signal data in a long term storage device, for example using magnetic media such as a RAID (Redundant Array of Independent Disks) storage. This
long term storage may thus store the recorded signal data for an amount of time to enable several different capture devices to upload the recorded signal data at their convenience.
For the following example we may define the vector bt , as the received and buffered i'th recorded signal data for a time period of length T seconds. Furthermore where the sample rate of the i'th recorded signal data is S Hz, the number of time samples N within bt may then be defined by the following equation N = [S ■ T J .
The receiving/buffering operation is shown in both the system operation as shown in Figure 4 and the server operation as shown in Figure 6 as step 405.
The server 103 further comprises a synchronizer 223. The synchronizer 223 receives at least two independently recorded signal data 191a, 191b,.., 191n from the receiver/buffer and outputs at least two synchronized recorded signal data signal.
The synchronizer 223 does so by variable length framing of the recorded signal data, selecting a base recorded signal data and then aligning the remainder of the recorded signal data with the base recorded signal. The at least two synchronized recorded signal data are then passed to the processor/transmitter 227 for further processing.
The synchronization operation is shown in Figure 4 by step 407. With respect to Figures 5 and 6 the configuration and operation of the synchronizer 223 may be described in further detail.
The synchronizer 223 may comprise a variable length framer 301. The variable length framer may receive the at least two recorded signal data values 191 from the receiver/buffer 221. The variable length framer 301 may generate framed recorded signal values, by generating a single sample value from a first number of recorded signal data sample values.
An example of the variable length framer 301 carrying out variable length framing may be according to the following equation
where vlft (k) is the output sample value for the first number of recorded signal data samples for the i'th recorded signal data, f, the first number (otherwise known as the input mapping size), b,(k.fj+h) the input sample value for the (k.fj+h) sample. For each mapping
or frame k.ij defines the first input sample index and k.fj+fj-l the last input sample index. The index k defines the output sample or variable frame index.
Thus as described previously for a time period T where there are N input sample values, the variable length framer 301 outputs N/fj output sample values each of which is formed dependent on fj adjacent input sample values.
The index vlfjdx indicates the run time mode for the variable length framing. In some
embodiments the value of vlfjdx is set to 0 where f J/ /Q < 2ms , otherwise the value of vlfjdx is set to 1. The run-time mode may indicate the calculation path for the variable length framing operation. This is, whether the output value of vlft \k) is calculated from the amplitude envelope directly (vlf_idx == 1) or from the sign adjusted energy envelope (vlf_idx != 1). The decision which mode is to be used depends on the duration of the f, If the duration of f, is less than 2 milliseconds the amplitude envelope calculation path may be selected, otherwise the energy envelope calculation path may be used. In other words, for small input mapping sizes it is more advantageous to track the amplitude envelope than the energy envelope. This may improve the resilience to false synchronization results.
The variable length framer 301 may then repeat the operation of variable length framing for each of the number of signals identified for the selected space to generate an output for each of the recorded signals so that the output samples for each of the recorded signals have the same number of sample values for the same time period. The operation of the variable length framer 301 may be such that in embodiments all of the recorded signal data are variable length framed in a serial format, in other words one after another. In some embodiments the operation of the variable length framer 301 may be such that more than one of the recorded signal data may be processed at the same time or substantially at the same time to speed up the variable length processing for the time period in question. The output of the variable length framer may be passed to the indicator selector 305.
The operation of variable length framing is shown in Figure 6 by step 4071. The synchronizer 223 may also comprise an indicator selector 303 configured to receive the variable length framed sample values for each of the selected space of recorded signal data and generate a time alignment indicator for each recorded data signal. The indicator selector 303 may for example generate the time alignment indicator tlnd for the i'th signal and for all variable time frame sample values j from 0 to M using the following equation. t/n^M()t) = maxr{v//, ;,v//i ;} 0≤ i <U, 0≤ k < U, 0 ≤ j < M
where maxr maximises the correlation between the given signals with respect to the delay T . This maximisation function locates the delay T where the signals are best time aligned. The function may in embodiments of the invention be defined as
maxr{x, y] = max;αs (xCorrlag ), 0 < lag < upper '
where Tupper defines the upper limit for the delay in seconds. In suitable embodiments, the upper limit may be set to two seconds as this has been found to be a fair value for the delay in practical recording and networking conditions. Furthermore, wSize, describes the number of items used in the maximum calculation for each fj. In some embodiments, the number of items used in the maximisation calculation may be
about Twindow= 2.5s which corresponds to wSizβj = Twιndow • in samples for each fj. The
above equation as performed in embodiments therefore returns the value "lag" which maximises the correlation between the signals. Furthermore the equation: tCorrι J{k) = xCorrτ{vlfl ],vlfkJ 0 ≤ i < U, 0 ≤ k < U, 0 ≤ j < M may provide the correlation value.
The indicator selector 303 may then pass the generated time alignment indicator (tlnd) values to the base signal determiner 305.
The calculation of time alignment indicator values is shown in Figure 6 in step 4073. The synchronizer 223 may also comprise a base signal determiner 305 which may be configured to receive the time alignment indicator values from the indicator selector 303 and indicate which of the received recorded signal data is suitable to synchronize the remainder of the recorded signal data to.
The base signal determiner 305 may first generate a series of time aligned indicators from the time alignment indicator values. For example the time aligned indicators may be a time aligned index average, a time aligned index variance and a time aligned index ratio which may be generated by the base signal determiner 305 according to the following three equations.
tlndAve, y 0 ≤ i < U, 0 ≤ j < U
tIndVart ] = — • ∑ [tlndl k {j) - tIndAvet J ), 0 ≤ i < U, 0 ≤ j < U
M k=0 '
The base signal determiner 305 may sort the indicator tlndRatio in increasing order of importance. For example the base signal determiner 305 may sort the indicator tlndRatio so that the ratio value having the smallest value appears first, the ratio value having the second smallest value appears second and so on. The base signal determiner 305 may output the sorted indicator as the ratio vector tlndRatioSorted. The base signal determiner 305 may also record the order of the time indicator values tlndRatio by generating an index tlndRatioSortedlndex which contains the corresponding original position indices for the sorted result. Thus if the smallest ratio value was found at index 2, the next smallest at index 5, and so on the base signal determiner 305 may generate a vector with the values [ 2, 5, ...].
The base signal determiner 305 may then use the generated indicators to determine the base signal according to the following equation:
base _ signal _ idx = tIndRatioSortedIndices(θ) time _ align(base _ signal _ idx) = 0
The determination of the base signal is shown in Figure 6 by step 4075. The base signal determiner 305 may also determine the time alignment factors for the other recorded signal data from the average time alignment indicator values according to the following equation:
time _ align{i ) = tIndAvebase _ ngnal _ldx ι , 0 ≤ i < U, i ≠ base _ signal _ idx
The determination of the time alignment values for the remaining signals is shown in Figure 6 by step 4077.
The base signal determiner 305 may then pass the base signal indicator value base_signal_idx and also the time alignment factor values time_align for the remaining recorded signals to the signal synchronizer 307.
The synchronizer 223 may also comprise a signal synchronizer 307 configured to receive the recorded signals via the receiver/buffer 221 and the base signal indicator value and the time alignment factor values for the remaining recorded signals. The signal synchroniser 307 may then synchronize the recorded signals by adding the time alignment value to the current time indices of each of the signals.
This operation may be shown with respect to Figure 7. Figure 7 shows four recorded signals. These recorded signals may be a first signal (signal 1) 501, a second signal (signal 2) 503, a third signal (signal 3) 505 and a fourth signal (signal 4) 507. After being processed by the variable length framer 301 , the indicator selector 303 and the base signal determiner 305 the signal synchronizer 307 may receive a base signal indicator value base_signal_idx with a value of 3 561, and furthermore receive time_align values for the first signal Time_align( 1 ) 551, second signal Time_align(2), third signal Time_align(3) which is equal to zero, and fourth signal Time_align(4).
As the third signal is the base signal and therefore has no time alignment value no time delay is added to the signal sample values and the synchronized third signal 515 is output. The signal synchronizer 307 may delay the first signal 501 by the Time_align(l) 551 value to generate a synchronized first signal 511. The signal synchronizer 307 may delay the second signal 503 by the Time_align(2) 553 value to generate a synchronized third signal 513. The signal synchronizer 307 may also delay the fourth signal 507 by the Time_align(4) 557 value to generate a synchronized third signal 517.
The synchronized recorded data signals may then be output to the processor/transmitter 227.
Thus in summary the apparatus of the server may be considered to comprise a frame value generator which may generate for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream. The same server apparatus may also comprise an alignment generator to determine at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream. Furthermore the server apparatus may comprise a synchronizer to synchronize at least one signal stream to another signal stream dependent on the indicator values. The operation of synchronising the signals is shown in Figure 6 by operation 4079.
The server 103 may comprise a viewpoint receiver/buffer 225. The viewpoint receiver/buffer 225 may be configured to receive from the end user apparatus 201 data in the form of positional or recording viewpoint information signal - in other words the apparatus may communicate a request to hear or view the event from a specific recording device or from a specified position. Although this is discussed hereafter as the viewpoint it would be understood that this applies to audio only as well as audio- visual data. Thus in embodiments the data may indicate for selection or synthesis a specific recording device from which audio or audio-visual
recorded signal data is to be selected or a position such as a longitude and latitude or other geographical co-ordinate system.
The viewpoint selection data may be received from the end user apparatus via the downlink network/transmission channel 105. It would be appreciated that in embodiments of the application the downlink network/transmission channel 105 may be a single network, for example a cellular communications link between the end user apparatus 201 and the server 103 or may be a channel operating across multiple channels, for example the data may pass over a channel operating over a wireless communications link to a internet gateway in the wireless communications system and then pass over an internet protocol related physical link to the server 103.
The viewpoint selection is shown in Figure 4 by step 408.
The downlink network/communications channel 105 may also comprise any one of a cellular communication network such as a third generation cellular communication system, a Wi- Fi communications network, or any suitable wireless or wired communication link. In some embodiments the uplink network/communications channel 101 and the downlink network/communications channel 105 are the same network/communications channel. In other embodiments the uplink network/communications channel 101 and the downlink network/communications channel 105 share parts of the same network/communications channel. Furthermore in embodiments both the downlink 105 network/communication channel is a pair of simplex channels, or a duplex or half duplex channel configured to carry information to and from the server either at the same time or substantially at the same time.
The processor/transmitter 227 may comprise a viewpoint synthesizer or selector signal processor 309. The viewpoint synthesizer or selector signal processor 309 may receive the viewpoint selection information from any end user apparatus and then select or synthesize suitable audio or audio- visual data to be sent to the end user apparatus to provide the end user apparatus 201 with the content experience desired.
Thus in some embodiments, where the viewpoint selection information may identify a specific recording apparatus or device 201 the signal processor 309 selects the synchronized recorded signal data from the recording apparatus indicated. In other embodiments, where the viewpoint selection information may identify a specific location or direction, the signal processor 309 selects the synchronized recorded signal data which is positioned and/or directed closest to the desired position/direction. In other embodiments where specific location/direction are specified a synthesis of more than one nearby synchronized recorded signal data may be generated. For example the signal processor 309 may generate a weighted averaging of the synchronized recorded signal data nearby the specific location/direction may be used to provided an estimate of the audio or audio-visual data which may have been recorded at the specified position.
Furthermore in other embodiments where a recording device 201 suffers a failure of a recording component, or recorded signal data is missing or corrupted the signal processor 309 may compensate for the missing or corrupted recorded signal data by synthesizing the recorded signal data from the synchronized recorded signal data from neighbouring recording apparatus 210.
The signal processor 309 may in some embodiments determine the nearby and neighbouring recording apparatus 210 and further identify the closest recording apparatus to the desired position by using the positional data provided by the recording devices.
The output of the signal processor 309 in the form of desired (in other words selected recorded or synthesized) signal data 195 may be passed to the transmitter/buffer 311.
The selection/processing of the recorded signal data is shown in figure 4 by step 409.
The processor/transmitter 227 may further comprise a transmitter/buffer configured to transmit the desired signal data 195 via the downlink network/transmission channel 105 which has been described previously. The server 103 may therefore be connected via the downlink network/transmission channel 105 to end user apparatus (or devices) 201 configured to generate viewpoint or selection information and receive the desired signal data associated with the viewpoint or selection information.
Although two end user devices 201 are shown in Figure 3, a first end user apparatus 201a and a second end user apparatus 201m, it would be appreciated that any suitable number of end user apparatus may receive signal data from the server 103 and transmit data to the server 103 via the downlink network/transmission channel 105.
The end user apparatus 201 such the first end user apparatus 201a may comprise a viewpoint selector and transmitter 231a. The viewpoint selector and transmitter 231a may use the user interface 15 where the end user apparatus may be the apparatus shown in Figure 1 to allow the user to specify the desired viewing position and/or desired recording device. The viewpoint selector and transmitter 231a may then encode this information in order that it may be transmitted via the downlink network/communications channel 105 to the server 103.
The end user apparatus 201 such as the first end user apparatus 201a may also comprise a receiver 233a configured to receive the desired signal data as described above via the down-link network/communications channel 105. The receiver may decode the transmitted desired signal (in other words a selected synchronized recorded signal or synthesized signal from the synchronized recorded signals) to generate content data in a format suitable for viewing.
The reception of the desired signal data is shown in Figure 4 by step 413. Furthermore the end user apparatus 201 such as the first end user apparatus 201a may also comprise a viewer 235a configured to display or output the desired signal data as described above. For example where the end user apparatus 201 may be the apparatus as shown in Figure 1 the
audio stream may be processed by the audio ADC/DAC 14 and then passes to the loudspeaker 11 and the video stream may be processed by the video ADC/DAC 32 and output via the display 33. The viewing/listening of the desired signal data is shown in Figure 4 by step 415. Therefore in summary the apparatus in the form of the end user apparatus may be considered to comprise an input selector configured to select a display variable. As described above the display variable may be an indication of at least one of a recording apparatus, a recording location, which may or not be a marked as a recording apparatus location, and a recording direction or orientation.
The apparatus in the form of the end user apparatus may furthermore be summarised as being considered to comprise a transmitter configured to transmit the display variable to a further apparatus, wherein the further apparatus may be the server as described previously.
Furthermore the same apparatus may be considered to comprise a receiver configured to receive a signal stream from the server apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus. The same, end user, apparatus may also be summarized as comprising a display for displaying the signal stream.
Thus in embodiments the ability to mix between recording devices is made possible. End users may in embodiments select between recorded signal data from different recording devices with improved timing and cueing performance as the recorded signal data is synchronized. Furthermore the generation of synthesized signal data using the synchronized recorded signal data allows the end user to experience the content from locations not originally recorded or improve on the recorded data from a single source to allow for deficiencies in the original signal data - such as loss of recorded signal data due to network issues, failure to record due to partial or total device failure, or poorly recorded signal data due to interference or noise. With respect to Figure 8, the operation of further embodiments of the server with respect to buffering and synchronization of the recorded signal data is shown. In these embodiments rather than synchronizing the recorded signal data using a single time alignment indicator further time alignment indicators may be generated for further time instances or periods.
In these further embodiments the operations similar to the operations shown in steps 405, and 4071 to 4079 are marked with the same references. Thus the buffer/receiver 221 may receive the recorded signal data streams in step 405.
In the server 103 shown with respect to these embodiments the buffered recorded signal may be defined as bn ( where the subindex n describes the time instant from which the recorded signal is buffered. The subindex n is an element of the set G, in other words the number of different time instants to be used to determine the base signal. The starting position for each
buffering location may be described by Tlocn , that is the signal buffered starting from Tlocn • T seconds.
The variable length framer 301 may perform a variable length framing operation in step 4071 on each of the sub-periods using the previously described methods.
The indicator selector 303 may calculate the time alignment indicators in step 4073 by applying the following equations to determine the time index average, the time index variance and the time index ratio for all the sub-periods according to the following equations:
where tlndt ι y describes the tlndt } for the t* time instant.
The base signal determiner 305 may in addition to the determination of the base signal and the generation of the time alignment factors may carry out an additional precursor step and make a decision on whether to include a new time instant or period to the calculations. This decision may be for example according to the following expressions:
[Yes, tIndThr = \ add _ new _ time _ location = <
[No, otherwise
UndRatio{\) accCorr(tIndRatioSortedIndices(θ)) tIndRatio(θ) accCorr(tIndRatioSortedIndices(l)) or tlndThr = I1 1, accCorr(tIndRatioSortedIndices(\ )) accCorr(tIndRatioSortedIndices(θ))
0, otherwise where
where tCorrt l } describes the tCorη for the t* time instant.
In some embodiments of the invention the base signal determiner 305 may make the above decision with a condition which would limit the number of new time instants to be added to some predefined threshold to disable a potential infinite loop of iterations being carried out.
The decision of whether a new time location is to be added is shown in figure 8 by step 701. Where a new time location is to be added the base signal determiner 307 may add a new time period to G, in other words the process performs another check at a different time than before and the loop passes back to step 407. This addition of a new time instant to G can be seen in Figure 8 as step 703.
When no further time location/instants are to be added the base signal determiner 305 may then perform the operation of determining the base signal based on the indicators as described previously. The determination of the base signal is shown in Figure 8 by step 4075.
Furthermore the base signal determiner 305 may also determine the time alignment factors for the remaining signals as described previously and shown in Figure 8 in step 4077.
The signal synchronizer 307 may then use this base signal determination and the time alignment factors for the remaining recorded signals to synchronize the recorded signals as described previously and shown in Figure 8 in step 4079.
In some embodiments of the invention the loop is disabled or not present and time alignment indicators are determined for at least two of the sub-sets of the total time periods using the equations described above in order to improve the synchronization between recorded signals as the indicators are determined for different time periods.
Although the above has been described with regards to audio signals, or audio-visual signals it would be appreciated that embodiments may also be applied to audio-video signals where the audio signal components of the recorded data are processed in terms of the determining of the base signal and the determination of the time alignment factors for the remaining signals and the video signal components may be synchronised using the above embodiments of the invention. In other words the video parts may be synchronised using the audio synchronisation information.
It shall be appreciated that the term user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers.
Furthermore elements of a public land mobile network (PLMN) may also comprise apparatus as described above.
In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be
illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.
Claims
1. Apparatus comprising: a frame value generator configured to generate for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream; an alignment generator configured to determine at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream; and a synchronizer configured to synchronize at least one signal stream to another signal stream dependent on the indicator values.
2. The apparatus as claimed in claim 1, further comprising a receiver configured to receive each of the at least two signal streams from different recording apparatus.
3. The apparatus as claimed in claims 1 and 2, wherein the alignment generator is configured to generate a first indicator for each signal stream dependent on the correlation between the at least one signal value for a first signal stream and the at least one signal value for a second signal stream.
4. The apparatus as claimed in claim 3 wherein the first indicator comprises the ratio of the variance and mean values of the correlation between the at least one signal value for a first signal stream and the at least one signal value for a second signal stream.
5. The apparatus as claimed in claims 1 to 4, wherein the alignment generator is further configured to select the at least two streams with the lowest indicator value as a base stream, and the another signal stream is the base stream.
6. The apparatus as claimed in claims 1 to 5, wherein the synchronizer is further configured to synchronize at least one of: at least one signal stream audio stream to another signal stream audio stream dependent on the indicator values; at least one signal stream video stream to another signal stream video stream dependent on the indicator values; and at least one signal stream positional data stream to another signal stream positional data stream dependent on the indicator values.
7. The apparatus as claimed in claims 1 to 6, further comprising an output selector receiver configured to receive selection information indicating at least one of: a recording apparatus; a recording location; and a recording direction.
8. The apparatus as claimed in claim 7, further comprising an output selector processor, wherein the output selector processor may be configured to carry out at least one of: selecting one synchronized signal stream to be output dependent on the selection information; and combining at least two synchronized signal streams dependent on the selection information to form a compound signal stream to be output.
9. An apparatus comprising: an input selector configured to select a display variable; a transmitter configured to transmit the display variable to a further apparatus; a receiver configured to receive a signal stream from the further apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus; and a display for displaying the signal stream.
10. The apparatus as claimed in claim 9, wherein the display comprises at least one of: a audio display for displaying audio signal components of the at least one signal stream received from a recording apparatus; and a video display for displaying video signal components of the at least one signal stream received from a recording apparatus.
11. The apparatus as claimed in claims 9 and 10, wherein the display variable comprises at least one of: a recording apparatus; a recording location; and a recording direction.
12. A method comprising: generating for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream; determining at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream; and synchronizing at least one signal stream to another signal stream dependent on the indicator values.
13. The method as claimed in claim 12, further comprising receiving each of the at least two signal streams from different recording apparatus.
14. The method as claimed in claims 12 and 13, wherein determining indicator values further comprises generating a first indicator for each signal stream dependent on the correlation between the at least one signal value for a first signal stream and the at least one signal value for a second signal stream.
15. The method as claimed in claim 14 wherein generating the first indicator comprises determining the ratio of the variance and mean values of the correlation between the at least one signal value for a first signal stream and the at least one signal value for a second signal stream.
16. The method as claimed in claims 12 to 15, further comprising selecting the at least two streams with the lowest indicator value as a base stream, and wherein the another signal stream is the base stream.
17. The method as claimed in claims 12 to 16, wherein synchronizing further comprises at least one of: synchronizing at least one signal stream audio stream to another signal stream audio stream dependent on the indicator values; synchronizing at least one signal stream video stream to another signal stream video stream dependent on the indicator values; and synchronizing at least one signal stream positional data stream to another signal stream positional data stream dependent on the indicator values.
18. The method as claimed in claims 12 to 17, further comprising receiving selection information indicating at least one of: a recording apparatus; a recording location; and a recording direction.
19. The method as claimed in claim 18, further comprising selecting one synchronized signal stream to be output dependent on the selection information.
20. The method as claimed in claim 18, further comprising combining at least two synchronized signal streams dependent on the selection information to form a compound signal stream to be output.
21. A method comprising: selecting a display variable; transmitting the display variable to a further apparatus; receiving a signal stream from the further apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus; and displaying the signal stream.
22. The method as claimed in claim 21, wherein displaying comprises at least one of: displaying audio signal components of the at least one signal stream received from a recording apparatus; and displaying video signal components of the at least one signal stream received from a recording apparatus.
23. The method as claimed in claims 21 and 22, wherein the display variable comprises at least one of: a recording apparatus; a recording location; and a recording direction.
24. An electronic device comprising apparatus as claimed in claims 1 to 11.
25. A chipset comprising apparatus as claimed in claims 1 to 1 1.
26. A computer-readable medium encoded with instructions that, when executed by a computer, perform: generating for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream; determining at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream; and synchronizing at least one signal stream to another signal stream dependent on the indicator values.
27. A computer-readable medium encoded with instructions that, when executed by a computer, perform: selecting a display variable; transmitting the display variable to a further apparatus; receiving a signal stream from the further apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus; and displaying the signal stream.
28. Apparatus comprising: means for generating for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream; means for determining at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream; and means for synchronizing at least one signal stream to another signal stream dependent on the indicator values.
29. Apparatus comprising: means for selecting a display variable; means for transmitting the display variable to a further apparatus; means for receiving a signal stream from the further apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus; and means for displaying the signal stream.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0908153.0 | 2009-05-12 | ||
GB0908153A GB2470201A (en) | 2009-05-12 | 2009-05-12 | Synchronising audio and image data |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010131105A1 true WO2010131105A1 (en) | 2010-11-18 |
Family
ID=40833878
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2010/001101 WO2010131105A1 (en) | 2009-05-12 | 2010-05-12 | Synchronization of audio or video streams |
Country Status (2)
Country | Link |
---|---|
GB (1) | GB2470201A (en) |
WO (1) | WO2010131105A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013088208A1 (en) * | 2011-12-15 | 2013-06-20 | Nokia Corporation | An audio scene alignment apparatus |
WO2014193593A3 (en) * | 2013-05-28 | 2015-01-22 | Google Inc. | Automatically synchronizing multiview video recordings between two or more content recording devices |
EP2832112A4 (en) * | 2012-03-28 | 2015-12-30 | Nokia Technologies Oy | Determining a Time Offset |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017092007A1 (en) * | 2015-12-03 | 2017-06-08 | SZ DJI Technology Co., Ltd. | System and method for video processing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5640388A (en) * | 1995-12-21 | 1997-06-17 | Scientific-Atlanta, Inc. | Method and apparatus for removing jitter and correcting timestamps in a packet stream |
JP2004032012A (en) * | 2002-06-21 | 2004-01-29 | Sony Corp | Multi-viewpoint image recording apparatus, method of synchronous processing for multi-viewpoint image frame, and computer program |
WO2008066930A2 (en) * | 2006-11-30 | 2008-06-05 | Dolby Laboratories Licensing Corporation | Extracting features of video & audio signal content to provide reliable identification of the signals |
WO2008148732A1 (en) * | 2007-06-08 | 2008-12-11 | Telefonaktiebolaget L M Ericsson (Publ) | Timestamp conversion |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IL119504A (en) * | 1996-10-28 | 2000-09-28 | Elop Electrooptics Ind Ltd | Audio-visual content verification method and system |
US6654933B1 (en) * | 1999-09-21 | 2003-11-25 | Kasenna, Inc. | System and method for media stream indexing |
US20030105794A1 (en) * | 2001-11-09 | 2003-06-05 | Jasinschi Radu S. | Systems for sensing similarity in monitored broadcast content streams and methods of operating the same |
DK1504445T3 (en) * | 2002-04-25 | 2008-12-01 | Landmark Digital Services Llc | Robust and invariant sound pattern matching |
US20040143675A1 (en) * | 2003-01-16 | 2004-07-22 | Aust Andreas Matthias | Resynchronizing drifted data streams with a minimum of noticeable artifacts |
US7907211B2 (en) * | 2003-07-25 | 2011-03-15 | Gracenote, Inc. | Method and device for generating and detecting fingerprints for synchronizing audio and video |
DE102005014477A1 (en) * | 2005-03-30 | 2006-10-12 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for generating a data stream and generating a multi-channel representation |
DE102006036562B4 (en) * | 2006-08-04 | 2014-04-10 | Hewlett-Packard Development Co., L.P. | Method and system for transmitting data streams related to one another and / or for synchronizing data streams related to one another |
-
2009
- 2009-05-12 GB GB0908153A patent/GB2470201A/en not_active Withdrawn
-
2010
- 2010-05-12 WO PCT/IB2010/001101 patent/WO2010131105A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5640388A (en) * | 1995-12-21 | 1997-06-17 | Scientific-Atlanta, Inc. | Method and apparatus for removing jitter and correcting timestamps in a packet stream |
JP2004032012A (en) * | 2002-06-21 | 2004-01-29 | Sony Corp | Multi-viewpoint image recording apparatus, method of synchronous processing for multi-viewpoint image frame, and computer program |
WO2008066930A2 (en) * | 2006-11-30 | 2008-06-05 | Dolby Laboratories Licensing Corporation | Extracting features of video & audio signal content to provide reliable identification of the signals |
WO2008148732A1 (en) * | 2007-06-08 | 2008-12-11 | Telefonaktiebolaget L M Ericsson (Publ) | Timestamp conversion |
Non-Patent Citations (3)
Title |
---|
DATABASE INSPEC [online] Database accession no. 10178666 * |
PATENT ABSTRACTS OF JAPAN * |
RADHAKRISHNAN R. ET AL: "Audio and video signatures for synchronization", 2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, 26 August 2008 (2008-08-26), pages 1549 - 1552, XP031313030 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013088208A1 (en) * | 2011-12-15 | 2013-06-20 | Nokia Corporation | An audio scene alignment apparatus |
EP2832112A4 (en) * | 2012-03-28 | 2015-12-30 | Nokia Technologies Oy | Determining a Time Offset |
WO2014193593A3 (en) * | 2013-05-28 | 2015-01-22 | Google Inc. | Automatically synchronizing multiview video recordings between two or more content recording devices |
US9646650B2 (en) | 2013-05-28 | 2017-05-09 | Google Inc. | Automatically syncing recordings between two or more content recording devices |
US10008242B2 (en) | 2013-05-28 | 2018-06-26 | Google Llc | Automatically syncing recordings between two or more content recording devices |
Also Published As
Publication number | Publication date |
---|---|
GB2470201A (en) | 2010-11-17 |
GB0908153D0 (en) | 2009-06-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10236031B1 (en) | Timeline reconstruction using dynamic path estimation from detections in audio-video signals | |
US20130226324A1 (en) | Audio scene apparatuses and methods | |
US20130304244A1 (en) | Audio alignment apparatus | |
US10097943B2 (en) | Apparatus and method for reproducing recorded audio with correct spatial directionality | |
CN101960865A (en) | Apparatus for capturing and rendering a plurality of audio channels | |
US20130297053A1 (en) | Audio scene processing apparatus | |
WO2013088208A1 (en) | An audio scene alignment apparatus | |
KR20170009650A (en) | Method and apparatus for processing audio signal | |
TW201145887A (en) | Data feedback for broadcast applications | |
US20150146874A1 (en) | Signal processing for audio scene rendering | |
US20150089051A1 (en) | Determining a time offset | |
US9195740B2 (en) | Audio scene selection apparatus | |
US20150310869A1 (en) | Apparatus aligning audio signals in a shared audio scene | |
WO2010131105A1 (en) | Synchronization of audio or video streams | |
US20150302892A1 (en) | A shared audio scene apparatus | |
US20150271599A1 (en) | Shared audio scene apparatus | |
GB2575509A (en) | Spatial audio capture, transmission and reproduction | |
CN103180907B (en) | audio scene device | |
WO2016139392A1 (en) | An apparatus and method to assist the synchronisation of audio or video signals from multiple sources | |
Dang et al. | : A Universal Timeline-Synchronizing Solution for Live Streaming | |
EP3540735A1 (en) | Spatial audio processing | |
WO2014016645A1 (en) | A shared audio scene apparatus | |
WO2015086894A1 (en) | An audio scene capturing apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10774605 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10774605 Country of ref document: EP Kind code of ref document: A1 |