WO2021111988A1 - 動画再生装置、動画再生システム、および動画再生方法 - Google Patents
動画再生装置、動画再生システム、および動画再生方法 Download PDFInfo
- Publication number
- WO2021111988A1 WO2021111988A1 PCT/JP2020/044099 JP2020044099W WO2021111988A1 WO 2021111988 A1 WO2021111988 A1 WO 2021111988A1 JP 2020044099 W JP2020044099 W JP 2020044099W WO 2021111988 A1 WO2021111988 A1 WO 2021111988A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- moving image
- image
- output
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/038—Cross-faders therefor
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
- H04N21/43072—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/437—Interfacing the upstream path of the transmission network, e.g. for transmitting client requests to a VOD server
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8547—Content authoring involving timestamps for synchronizing content
Definitions
- the present invention relates to a moving image reproduction device, an image reproduction system, and a moving image reproduction method for reproducing a moving image using a network.
- the distribution server is a play that defines the data of a segment that divides a moving image into a predetermined length of several seconds to several tens of seconds, and the storage location and playback order of the data.
- the client first acquires a playlist and plays a moving image by requesting the server for necessary segment data (see, for example, Patent Document 1).
- Similar standards include MPEG-DASH (Dynamic Adaptive Streaming over HTTP) and CMAF (Common Media Application Format) (see, for example, Patent Documents 2 and 3).
- Video streaming distribution is based on enabling multiple clients to view a common video.
- flexibility is being realized according to the individual circumstances of each client, such as playback by random access and switching of image quality level according to the network environment. In the future, it is desired to enable more diverse operations according to the tastes and intentions of each viewer.
- the present invention has been made in view of these problems, and an object of the present invention is to provide a technique for suitably realizing a user operation for a moving image to be streamed.
- An aspect of the present invention relates to a moving image playback device.
- This video playback device acquires audio data from one of a plurality of moving images and a data acquisition unit that acquires data of a plurality of moving images representing the same space, which is stream-transferred from a server, and another moving image. It is characterized by including a data separation unit for acquiring image data from the data and an output control unit for synchronizing and outputting the audio data and the image data.
- This video playback system includes a server that stream-transfers data of a plurality of moving images representing the same space, and a video playback device that outputs a moving image to a display using the data of the plurality of moving images.
- a data acquisition unit that acquires data of a plurality of moving images from a server
- a data separation unit that acquires audio data from one of a plurality of moving images and acquires image data from another moving image, and audio data. It is characterized in that it is provided with an output control unit that synchronizes and outputs the image data.
- the video playback device obtains data of a plurality of moving images representing the same space, which are stream-transferred from the server, and obtains audio data from one of the plurality of moving images, and another. It is characterized by including a step of acquiring image data from a moving image and a step of synchronizing audio data and image data and outputting them to a display.
- a user operation for a moving image to be streamed can be suitably realized.
- FIG. 1 it is a figure schematically showing an example in which the time axis of a plurality of moving images constituting one content is deviated. It is a figure which illustrates the flow of the output of an image and an audio when the moving image reproduction apparatus in this embodiment adjusts the output timing so as to correspond to the time lag of the moving image provided by a moving image distribution server.
- FIG. 1 illustrates a moving image playback system to which this embodiment can be applied.
- the illustrated moving image reproduction system has a configuration in which a plurality of moving image reproducing devices 10a, 10b, 10c, ... Are connected to the moving image distribution server 12 via the network 8.
- the video playback devices 10a, 10b, 10c, ... Are client terminals operated by the user, respectively, and the input devices 14a, 14b, 14c ...
- the displays 16a, 16b, 16c, ... Are wired or wireless. Connected with.
- the moving image playback devices 10a, 10b, 10c, ... are collectively referred to as the moving image playback device 10,
- the input devices 14a, 14b, 14c ... are collectively referred to as the input device 14,
- the displays 16a, 16b, 16c, ... are collectively referred to as the display 16. May be done.
- the moving image playback device 10, the input device 14, and the display 16 may each have a separate housing as shown in the figure, or two or more of them may be integrally provided.
- it may be a portable terminal integrally provided with a moving image reproducing device 10, an input device 14, and a display 16.
- the display 16 may be a general flat-plate display such as a television receiver, or a wearable display such as a head-mounted display. In any case, the display 16 includes a display panel for displaying an image and a speaker for outputting sound. However, the speaker may be provided separately from the display 16.
- the moving image playback device 10 may be any of a personal computer, a game machine, a content playback device, and the like.
- the scale of the network 8 is not limited, such as the Internet or LAN (Local Area Network).
- the video playback device 10 requests the video distribution server 12 to distribute the moving image based on the user operation, and the video distribution server 12 streams the requested moving image. ..
- the communication protocol used, the form of the moving image playback device 10, the configuration of the moving image distribution server 12, and the like are not particularly limited.
- the moving image distribution server 12 may distribute a recorded moving image, or may deliver a moving image being photographed or being created live. At this time, the moving image distribution server 12 may connect to another content providing server, acquire moving image data, and then transmit the moving image data to the moving image playback device 10.
- FIG. 2 is a diagram for explaining an example of a moving image streamed and distributed to the moving image playback device 10 in the present embodiment.
- a plurality of cameras 20a and 20b are installed in the concert hall 18, and the state of the concert is photographed from different directions and delivered as a moving image.
- a plurality of moving images having different fields of view are acquired on a common time axis.
- the video distribution server 12 distributes any one of such a plurality of moving images to the moving image playback device 10, and receives at any time from the moving image playback device 10 the switching of the distribution target to the moving images having different fields of view. That is, in the present embodiment, it is possible to switch the display to an image in another field of view at an arbitrary timing during video reproduction. As a result, the user who is watching the moving image on the video playback device 10 side can freely switch between the video shot mainly of the performer he / she wants to see and the video that gives a bird's-eye view of the entire venue according to the progress of the concert. be able to.
- the display target and display purpose are not particularly limited.
- a video of a sports competition or various events may be used, and not only a live-action image but also computer graphics representing a virtual space from another field of view may be used.
- FIG. 3 is a diagram for explaining the data structure of the moving image to be distributed in the present embodiment.
- the "moving image” shall include an image and an sound.
- a group of a plurality of switchable moving images is referred to as "content”.
- the field of view displayed on the moving image reproducing device 10 can be switched at an arbitrary timing. At this time, even if the image on the display is switched, seamless switching is realized by continuing to output one moving image for the sound.
- a moving image for audio reproduction and a plurality of moving images for display are prepared separately, and the former audio is output while one of the latter images is displayed.
- the display target is switched to one of the latter images.
- any one of the plurality of moving images for display may also be used for audio reproduction.
- the main focus will be on the aspect of separately preparing a moving image for audio reproduction. In this case, since the moving image itself is not displayed, the data size can be suppressed by expressing it at a low bit rate.
- the video distribution server 12 can transmit data using the same protocol as before.
- the data of each moving image is held and transmitted in a state of being divided at predetermined time intervals of about several seconds to several tens of seconds.
- each of the divided data will be referred to as "segment data”.
- the video distribution server 12 generates a plurality of segment data divided into each moving image and a playlist which is definition information of each segment data.
- the playlist 132 of the moving image for audio reproduction defines each storage location, reproduction time, reproduction order, etc. of the segment data 136 for audio reproduction.
- the playlists 134a and 134b of the plurality of display moving images define the storage location, reproduction time, reproduction order, and the like of the segment data 138a and 138b for display, respectively.
- each playlist is static data. In the case of a moving image being shot, new segment data is generated over time, and each playlist is updated accordingly.
- the video distribution server 12 further associates these moving images and generates an index file 130 that defines them as one content.
- the index file 130 describes information related to each moving image, such as the content of the content, the storage location of the playlists 132, 134a, and 134b of the plurality of moving images prepared as the content, and the visual field information.
- Several standards such as HLS, MPEG-DASH, and CMAF have been put into practical use as a technique for time-division distribution of moving images, and any of them may be adopted in the present embodiment. It will be understood by those skilled in the art that the names and description formats of the files to be prepared vary depending on the standard.
- the video distribution server 12 indexes the moving image data for audio reproduction and the moving image data for displaying the requested visual field among the contents specified by the moving image playback device 10. It is identified and transmitted by tracing the file 130 and the playlists 132, 134a, 134b. Specifically, the video distribution server 12 first transmits a playlist of necessary moving images to the video playback device 10, and receives a transmission request for specifying segment data at the required time, thereby transmitting the segment data to the video playback device 10. Send to 10.
- Each segment data includes image data and audio data at predetermined time intervals.
- the video distribution server 12 packetizes the segment data to be transmitted in chronological order and transmits it.
- image data packets and audio data packets are sequentially transmitted in the form of multiple streams.
- the moving image data string (stream) 138 for audio reproduction is continuously transmitted, while the moving image data strings 139a and 139b for display are switched according to the switching operation by the user. Will be sent.
- FIG. 4 shows the configuration of a system for realizing synchronization of image and sound.
- the video distribution server 12 side generates a PTS (Presentation Time Stamp) that defines the output timing of images and sounds based on its own STC (System Time Clock), and assigns it to each segment data.
- PTS Presentation Time Stamp
- the video distribution server 12 also generates PCR (Program Clock Reference) representing a counter value in a predetermined cycle based on STC.
- the system coding unit 140 of the video distribution server 12 generates a multiplexed stream consisting of a sequence of packets including segment data of images and sounds, each PTS, and PCR in a predetermined cycle, and transmits the multiplexed stream to the video playback device 10. To do.
- the system decoding unit 142 of the moving image playback device 10 separates the data from the transmitted multiplexing stream. Then, the STC reproduction unit 144 reproduces the STC by adjusting the frequency of the oscillator so that the reception time of each packet and the counter value indicated by PCR are associated with each other.
- the moving image playback device 10 realizes the operation on the time axis shared with the moving image distribution server 12. Specifically, the moving image reproduction device 10 adjusts the output timing of each data by the buffers 146a and 146b so that the corresponding images and sounds are output in the PTS on the time axis. If PTS on the same time axis is given to all the moving images that make up one content, by adjusting the output timing based on that, the image and sound will be displayed without deviation even if they are different moving images. Can be output to. In this figure, the data coding process and the decoding process are not shown.
- FIG. 5 shows the internal circuit configuration of the moving image playback device 10.
- the moving image playback device 10 includes a CPU (Central Processing Unit) 23, a GPU (Graphics Processing Unit) 24, and a main memory 26. Each of these parts is connected to each other via a bus 30.
- An input / output interface 28 is further connected to the bus 30.
- the input / output interface 28 includes a peripheral device interface such as USB or IEEE1394, a communication unit 32 that establishes communication with the video distribution server 12 and is composed of a wired or wireless LAN network interface, and a storage unit such as a hard disk drive or non-volatile memory. 34, an output unit 36 that outputs data to the display 16, an input unit 38 that inputs data from the input device 14, and a recording medium driving unit 40 that drives a removable recording medium such as a magnetic disk, an optical disk, or a semiconductor memory are connected.
- a peripheral device interface such as USB or IEEE1394
- a communication unit 32 that establishes communication with the video distribution server 12 and is composed
- the CPU 23 controls the entire moving image playback device 10 by executing the operating system stored in the storage unit 34.
- the CPU 23 also executes various programs read from the removable recording medium, loaded into the main memory 26, or downloaded via the communication unit 32.
- the GPU 24 has a geometry engine function and a rendering processor function, performs drawing processing according to a drawing command from the CPU 23, and outputs the drawing process to the output unit 36.
- the main memory 26 is composed of a RAM (Random Access Memory) and stores programs and data required for processing.
- the video distribution server 12 may have the same circuit configuration.
- FIG. 6 shows the configuration of the functional blocks of the video playback device 10 and the video distribution server 12.
- Each functional block shown in the figure can be realized by the CPU 23, GPU 24, main memory 26, etc. shown in FIG. 5 in terms of hardware, and the information processing function and image drawing loaded from the recording medium into the memory in terms of software. It is realized by a program that exerts various functions such as functions, data input / output functions, and communication functions. Therefore, it is understood by those skilled in the art that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof, and is not limited to any of them.
- the video distribution server 12 includes a request acquisition unit 50 for acquiring a request from the video playback device 10, a data preparation unit 52 for preparing data according to the request, a data storage unit 54 for storing content data, and prepared.
- a data transmission unit 56 that transmits data to the moving image playback device 10 is included.
- the request acquisition unit 50 acquires a moving image transmission request including switching of display targets from the moving image playback device 10. Therefore, the request acquisition unit 50 may transmit information necessary for selecting a moving image, such as selectable content and information relating to the field of view of the image representing the selectable content, to the moving image playback device 10 in advance.
- the data preparation unit 52 cooperates with the request acquisition unit 50 to prepare data according to the content of the request acquired from the video playback device 10. For example, the data preparation unit 52 acquires an index file corresponding to the selected content and specifies an image (field of view) option representing the index file. By giving this information to the request acquisition unit 50, a transfer request for a moving image with a designated field of view is further obtained from the moving image playback device 10. In response to this, the data preparation unit 52 acquires the corresponding playlist and transmits it to the moving image playback device 10 via the request acquisition unit 50 to receive a request for necessary segment data.
- the data preparation unit 52 sets the moving image for audio reproduction and the segment data of the moving image for display in the field of view specified by the moving image playback device 10 at the beginning of each playlist. Acquire in order from the one listed in.
- the data preparation unit 52 continues to acquire the subsequent segment data for the moving image for audio reproduction, and for the moving image for display, the playlist of the moving image after switching. Acquire the segment data after the corresponding time described in.
- the data storage unit 54 stores index files, playlists of a plurality of moving images, and segment data thereof for each content.
- each data is appropriately compressed and coded.
- those data are updated at any time, but the update means is not shown.
- the moving image that is the source of the data may be acquired from another server (not shown) or the like, and the acquisition timing is not particularly limited.
- the data transmission unit 56 sequentially packetizes the segment data prepared by the data preparation unit 52 and transmits it to the moving image playback device 10. At this time, the data transmission unit 56 imparts PTS to the image and audio data in a predetermined unit such as a segment data unit as described above, and also periodically imparts PCR. In addition, the data transmission unit 56 may appropriately add information given in general streaming transfer.
- the moving image playback device 10 includes an input information acquisition unit 60 for acquiring the contents of user operations, a data acquisition unit 62 for acquiring a stream of moving images, a data separation unit 64 for separating data from the stream, and an image decoding unit for decoding image data. 66, a voice decoding unit 68 for decoding voice data, and an output control unit 70 for controlling the output of a moving image are included.
- the input information acquisition unit 60 acquires the contents of user operations such as selection of contents, selection of display images, and switching of display targets in the middle from the input device 14, and requests necessary data from the video distribution server 12.
- the input information acquisition unit 60 may acquire information related to selectable contents and images representing the selectable contents in advance from the video distribution server 12 and display them as options on the display 16 via the output control unit 70.
- the input information acquisition unit 60 also acquires a moving image for audio reproduction and a playlist of moving images selected for display from the video distribution server 12, and corresponds to the time when the playback start operation or the display switching operation is performed. Request segment data from the video distribution server 12.
- the data acquisition unit 62 continuously acquires the moving image data stream-transferred from the video distribution server 12 in response to the user operation.
- the data includes moving image data for audio reproduction and moving image data for display.
- Each moving image stream includes image data and audio data to which PTS is added in a predetermined unit.
- the data separation unit 64 separates such multiplexed data for each moving image, and further separates the image data and the audio data.
- the stream transmitted from the video distribution server 12 contains information for identifying such data. A method that has been put into practical use can be applied to data separation using the information.
- the image decoding unit 66 decodes the image data included in the moving image to be displayed selected by the user among the separated data.
- the audio decoding unit 68 decodes the audio data included in the moving image for audio reproduction among the separated data.
- the output control unit 70 sequentially outputs the decoded image and sound to the display 16 at an appropriate timing.
- the output control unit 70 includes a PTS detection unit 72, a time adjustment unit 74, an image processing unit 76, and an output unit 78.
- the PTS detection unit 72 detects the PTS added to the image data and the audio data to be output.
- the time adjustment unit 74 adjusts the output timing so that there is no discrepancy between the image and the sound.
- image and audio data are transmitted at approximately the same time, and are based on immediate output except for adjusting a slight time difference between packets.
- the images and sounds included in different moving images are synchronized and output, and the display is allowed to be switched to the images included in another moving image on the way. Even in this switching, the continuity as one content can be expressed by keeping the images before and after the switching as uninterrupted as possible while maintaining the synchronization of the image and the sound.
- the time adjustment unit 74 delays the output of the audio of the moving image for audio reproduction by a predetermined time with respect to the data acquisition timing from the video distribution server 12, and then outputs the image so as to match the PTS.
- the difference between the PTS of the image displayed when the image switching operation is performed and the PTS at the beginning of the image after switching transmitted from the video distribution server 12 can be reduced, and the audio that continues to be output. It is possible to prevent deviation from the first image. Specific examples of time adjustment will be described later.
- the image processing unit 76 performs processing to fade out the displayed image and fade in the image after switching according to the image switching operation. As a result, seamless display change is realized for the switching operation.
- the output unit 78 outputs the image of the moving image for display to the display panel of the display 16, and outputs the sound of the moving image for sound reproduction to the speaker of the display 16.
- FIG. 7 illustrates the flow of image and audio output in the present embodiment.
- the horizontal direction of the figure indicates the passage of time, and each rectangle indicates the time length of the segment data.
- the numbers in the rectangle represent PTS, and the same PTS is simply represented by natural numbers of the same value, but the format of the actual PTS is not limited.
- the uppermost stage is a flow of moving image data being played on the moving image distribution server 12, and a common PTS for the system clock is similarly given to segment data of all moving images representing one content.
- the moving image reproduced in this way is distributed to a plurality of moving image playback devices 10 in parallel.
- the moving image distribution server 12 transmits the moving image data from the segment of PTS "1" that is being played when the request is received. Start. In the illustrated example, first, only the moving image data for audio reproduction is transmitted. Depending on the timing of the transmission request, data transfer, decoding processing, etc., the time t1 at which the segment data of PTS “1” can be output in the moving image playback device 10 is the playback start time t0 of the corresponding data in the moving image distribution server 12. More delayed.
- the sound or the image is immediately output at the time t1 when the output becomes possible, but the time adjustment unit 74 in the present embodiment pauses the sound output in the state where the output becomes possible, and then temporarily stops the sound output.
- the pause time (t2-t1) is the time of one segment data, for example, 3 seconds.
- the video distribution server 12 first transmits only the moving image data for audio output, so that at the time t2, only the audio of the PTS “1” is output from the moving image playback device 10.
- the data of the moving image for display is transmitted from the segment of PTS "2" being played at that time as shown by the arrow 152. ..
- the time t3 at which the first image of the PTS “2” can be output due to the timing of the transmission request, data transfer, decoding processing, etc. is the playback start time of the corresponding data on the video distribution server 12. More delayed.
- the time adjustment unit 74 pauses in a state where the first image of the PTS "2" can be output, and at the time t4 when the output of the audio data of the PTS "2" is started, the same PTS "2" The output of the first image of "2" is started.
- the moving image can be represented in a state where there is no difference between the image and the sound. Further, as shown in the figure, even if the image data is acquired later, it is possible to make it in time for the previously output sound.
- the image processing unit 76 fades in to reduce the sudden feeling.
- the moving image playback device 10 is used for the newly selected second display.
- the moving image is requested from the moving image distribution server 12, and the output of the first image displayed up to that point is stopped at an appropriate timing.
- the image processing unit 76 fades out the first image.
- the image of PTS "4" fades out.
- the data of the moving image for display after switching is transmitted from the segment of PTS “6” being played at the time of request.
- the time t6 at which the first image of PTS “6” among the second moving images can be output by delaying the audio output of the segment of PTS “1” in the moving image playback device 10 for a predetermined time is set. It is before the output start time t7 of the audio data of PTS "6".
- the time adjustment unit 74 suspends the output of the second image of the PTS "6" until the time t7 when the output of the audio data of the PTS "6" is started. Then, at time t7, the pause is released, and the image processing unit 76 and the output unit 78 start output while fading in the second image of the PTS “6”. After that, each time the operation to switch the display target is performed, the displayed image is faded out, the switched image is paused in a state where it can be output, and then the same PTS sound is output. Fade in.
- FIG. 8 is a flowchart showing a processing procedure of the moving image playback device 10 when switching the display image.
- This flowchart is started in a state where the sound obtained from the moving image for sound reproduction and the image obtained from the moving image for display are output to the display 16.
- the input information acquisition unit 60 waits for a user operation to switch the display image (N in S10).
- the output control unit 70 fades out the image being output and stops the output (S12).
- the input information acquisition unit 60 requests the video distribution server 12 to switch the display moving image to be transmitted, and the data acquisition unit 62 requests the video distribution server 12 of the switched moving image data.
- Data acquisition is started from the segment that was being reproduced at that time (S14).
- the image decoding unit 66 starts decoding the segment data of the image extracted from the moving image data (S16).
- the output control unit 70 suspends the output of the decoded image (S18) and detects the PTS (S20).
- the output control unit 70 compares the detected PTS with the PTS of the voice being output. While the two are different, the image output is left suspended (N in S22). As a result, a blackout state in which nothing is displayed on the display 16 may occur for a short time.
- the output control unit 70 releases the pause of the image output (S24) and displays the image while fading in (S26). Strictly speaking, the suspension of S24 may be released immediately before the timing at which the PTSs of the image and the sound match is predicted.
- the time for delaying the audio output was fixed to one segment time.
- this control method there is a gap of at least one segment between the playback time on the video distribution server 12 and the display time on the video playback device 10, so that the data is transmitted from the video distribution server 12 after switching. It is guaranteed that the beginning of the image data of is a segment after the segment displayed at the time of the switching operation. As a result, it is possible to prevent the display start of the image after switching from being delayed in time for the sound.
- FIG. 9 illustrates the flow of image and audio output in the case where switching takes time in the control method of FIG. 7.
- the representation of the figure is the same as in FIG. 7, but in this example, the delay of the output time in the moving image playback device 10 is larger than that in the case of FIG. 7 with respect to the playback time of the moving image in the moving image distribution server 12.
- the time t9 at which the data of the segment can be output on the video playback device 10 is the video distribution server 12. It has already been delayed by about one segment from the playback start time t8 in. Even in such a case, if the audio output is started after further giving a fixed delay time for one segment, as a result, the video distribution server 12 and the video playback device 10 have a deviation of two segments in the video playback. It will occur.
- the moving image playback device 10 has no choice but to acquire data from the segment of PTS "6" in the second moving image data after switching.
- the time from the time t12 when the image can be output to the output start time t13 of the audio data of PTS "6" becomes longer than the time for one segment.
- FIG. 10 illustrates the flow of image and audio output when an image other than the displayed image is always in a state where it can be output.
- the representation of the figure is the same as that of FIG.
- the moving image data for audio reproduction is acquired from the video distribution server 12, a predetermined delay time is provided, and then the audio is output.
- the first display moving image data is acquired and the timing corresponding to the audio is set.
- the flow of starting the output of the image of 1 is the same as that of FIG. 7.
- the second display moving image is also acquired in parallel regardless of the image switching operation.
- the input information acquisition unit 60 of the moving image playback device 10 is a segment of the second display moving image that is not displayed at the timing of segment switching in the moving image being output.
- the video distribution server 12 is requested to transmit in units. Then, the moving image reproduction device 10 continues to output the first image until the image switching operation is performed, and speculatively decodes the second image data so that it can be output.
- the first image of PTS "3" in the second image can be output at time t14, and the first image of PTS "4" can be output at time t15.
- the output of each image is paused until the output of the same PTS sound is started, and the data is discarded unless the user operation to switch the image is performed during the segment output period immediately before that.
- the switching operation is performed during the output period of the segment immediately before the PTS of the paused second image, the pause is released and the output of the second image is started.
- the output control unit 70 fades out the image of PTS “4” in the first image before switching to stop it, and then fades in the second image after switching.
- the data of the moving image that is not the display target is also acquired in parallel, and the first image is always prepared for each segment, so that the output time difference between the video distribution server 12 and the video playback device 10 can be determined. Instead, it is possible to switch at the beginning of each segment. As a result, the time required for switching can be minimized.
- the propriety of application is determined according to the number of moving images to be selected, the communication environment, the processing performance of the moving image playback device 10, and the like.
- FIGS. 11 and 12 illustrate the flow of image and audio output when the delay time given to the audio output is adaptively determined.
- the representation of the figure is the same as that of FIG. In this method, as the initial processing, the fastest timing at which the start data of the segment can be acquired and the time from the request for the moving image for display to the time when the output becomes possible are actually measured.
- a plurality of functional blocks for requesting a moving image for audio reproduction, decoding the audio data, and detecting PTS are prepared. That is, a plurality of sets of functional blocks including an input information acquisition unit 60, a data acquisition unit 62, a data separation unit 64, a voice decoding unit 68, a PTS detection unit 72, and a time adjustment unit 74 are provided. As a result, the detection unit of the fastest timing at which the PTS is switched in the moving image playback device 10 is subdivided.
- two functional blocks are prepared and are referred to as "audio first reproduction" and "audio second reproduction", respectively.
- the block that performs the first audio reproduction requests the data of the moving image for audio reproduction as shown by the arrow 158a
- the data is transmitted from the video distribution server 12 from the segment of PTS "1" currently being reproduced. ..
- the block that performs the first audio reproduction decodes the beginning thereof, pauses the block, and detects the PTS.
- the block that performs the second audio reproduction requests the data of the moving image for audio reproduction as shown by the arrow 158b, and when the video distribution server 12 transmits the segment data being reproduced at that time, the beginning of the segment data is decoded. Then pause and detect PTS.
- the PTS detected at this point is also "1". By alternately repeating such processing, the detected PTS is eventually switched. In the figure, the PTS is switched to "2" in the data transmitted in response to the request of arrow 158c.
- the functional blocks of the first audio reproduction and the second audio reproduction repeat the next data request while discarding the paused audio data until the PTS switching occurs. By preparing two or more such functional blocks, the switching timing can be detected with fine particle size.
- the moving image playback device 10 requests the moving image for display from the moving image distribution server 12 as shown by the arrow 158d, and acquires the segment data being played back from the moving image distribution server 12 at the time of the request. In the figure, it is transmitted from the data of PTS "2".
- the image decoding unit 66 of the moving image playback device 10 starts decoding the transmitted data, and the output control unit 70 pauses in a state where the head image can be output.
- the time adjustment unit 74 measures the time td from the data request to the video distribution server 12 until the image can be output by the internal timer.
- FIG. 11 shows only the initial processing
- the moving image playback device 10 subsequently performs the moving image output processing as shown in FIG. That is, since the moving image for audio reproduction can be acquired in the state where the time difference from the video distribution server 12 is minimized by the initial processing, the time adjusting unit 74 temporarily suspends the data of the head PTS "2" for a predetermined time. , Stop.
- the stop time at this time is a value obtained by adding the time td from the data request to the ability to output the image, which is acquired in the initial processing, to the output time of one segment data.
- the moving image data for display is also acquired in the initial processing, and the output unit 78 outputs the image of the PTS "2" paused by the time adjustment unit 74 together with the audio output of the PTS "2".
- switching can always be realized in a short time without preparing all moving images that are not to be displayed as shown in FIG. That is, as shown in "second image generation" in the figure, regardless of the timing of the display switching operation, the PTS image after switching is completed during the period when the PTS sound before switching is output. The image can be output at the timing of the next PTS switching.
- the video playback device 10 switches to the audio output of the same PTS "6" after switching.
- the image output of is in time. Therefore, it is possible to switch in the minimum time with the same processing load regardless of the number of moving images constituting one content and the high resolution. It should be noted that some margin may actually be added to the delay time provided for the audio output.
- the image processing unit 76 performs processing to fade in and fade out, and the like is the same as described above.
- the time td from the request for data to the video distribution server 12 until the data can be output is measured only for the first display image, but the same measurement is performed for all the display moving images constituting the content. May be done. For example, when the image size and bit rate of the moving image for display are different, it is conceivable that the time from requesting the data to being able to output is different. In this case, by adopting the longest time of the time td measured for each moving image and adding it to the delay time given to the audio output, it is possible to guarantee that the image after switching is in time for the audio.
- the time td from requesting the data to the video distribution server 12 until the data can be output may fluctuate even if the moving image is the same. Therefore, the time td may be measured a plurality of times or may be measured periodically, and the longest time may be adopted and added to the delay time given to the audio output. For example, if the condition of the network 8 deteriorates during the streaming transfer, the delay time of the audio output may be adjusted in the increasing direction even during the output of the moving image so that the switched image is in time for the audio. Even in such a case, the measurement target may be one moving image or all moving images.
- the output time of one segment included in the delay time given to the audio output can be obtained from the playlist transmitted from the video distribution server 12, but can also be actually measured by the video playback device 10.
- the PTS switching is detected twice, and the time difference between the two is acquired as the output time of one segment.
- the more the output time and the number of times of measurement of the time td from the request of the data to the video distribution server 12 until the data can be output are increased, the longer the initial processing becomes. Therefore, it is possible to shorten the time until the start of playback by acquiring those values in advance and only reading them when playing back the moving image.
- FIG. 13 schematically shows an example in which the time axes of a plurality of moving images constituting one content are deviated.
- the horizontal axis of the figure represents the passage of time, and the flow of the playback time of the audio reproduction video, the first display video, and the second display video on the video distribution server 12 is shown by the rectangular length. It is shown as the playback time of each PTS.
- the first display moving image is delayed by time D1 and the second display moving image is advanced by time D2 with respect to the time axis of the audio reproduction moving image.
- this time lag is acquired by the video distribution server 12.
- the video distribution server 12 stores, for example, the amount of deviation and the direction of deviation of images of other display moving images in a storage area accessible from the moving image playback device 10, based on the sound of the moving image for audio reproduction. ..
- the input information acquisition unit 60 of the moving image playback device 10 requests the moving image distribution server 12 to transmit the moving image data and information related to the time lag of all the moving moving images for display in response to the selection of the content by the user. Then, the data acquisition unit 62 acquires information related to the time lag in addition to the stream of the moving image.
- FIG. 14 illustrates the flow of image and audio output when the moving image playback device 10 adjusts the output timing so as to correspond to the time lag of the moving image provided by the moving image distribution server 12.
- the representation of the figure is the same as that of FIG. 7, but the playback flow in the moving image distribution server 12 shown at the top is the same as that of the moving image for audio reproduction. Therefore, the moving image reproduction device 10 acquires the moving image for audio reproduction by the same procedure as in FIGS. 7, 10 and 12, and suspends the output of the head data to give a delay of a predetermined time.
- the voice of PTS "1" is output after a delay of a predetermined time.
- the moving image playback device 10 further acquires the first display moving image, pauses the first data in a state where it can be output, and then outputs the data.
- the time adjusting unit 74 determines the timing at that time. Adjust according to the time lag of the moving image acquired in advance. As shown in FIG. 13, if the first display moving image is delayed by the time D1, the time adjusting unit 74, as shown in the figure, only takes the time D1 after the output of the sound of the same PTS “2” is started. The output of the image of PTS "2" is started with a delay.
- the same time adjustment is performed. That is, as shown in FIG. 13, if the second display moving image advances by the time D2, the time adjusting unit 74, as shown in the figure, has the time D2 rather than the start of the output of the sound of the same PTS “6”. The output of the image of PTS "6" is started at the earliest timing. With these adjustments, it is possible to continue to output the sound and display of different moving images strictly without deviation. Note that this method has the same effect not only on the deviation caused when the moving image is taken or generated, but also on the deviation caused by the decoding process of the moving image.
- a plurality of moving images representing one content are targeted for streaming transfer, and in a video playback device which is a client terminal, the sound of one moving image and the image of another moving image are transferred. Output in combination.
- the sound is not interrupted, and the continuity as one content can be maintained.
- delaying the output of the sound in the moving image playback device by about one segment even if it takes time to transfer or decode the image data after switching, it is possible to make it in time for the sound that continues to be output.
- the PTS switching of the acquired segment is detected at fine time intervals so as to suppress fluctuations in the time lag between playback on the video distribution server and output on the video playback device. Then, by counting from the timing at which the PTS after switching is obtained at the fastest speed and giving a delay time to the audio output, the image of the acquired segment is displayed next to the audio after the segment before switching, regardless of the timing of the switching operation. Can be output in synchronization.
- the present invention can be used for various information processing devices such as video playback devices, mobile terminals, personal computers, television receivers, and video distribution servers, and systems including any of them.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Information Transfer Between Computers (AREA)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/777,601 US12231712B2 (en) | 2019-12-03 | 2020-11-26 | Moving image reproduction apparatus, moving image reproduction system, and moving image reproduction method |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2019-218746 | 2019-12-03 | ||
| JP2019218746A JP7365212B2 (ja) | 2019-12-03 | 2019-12-03 | 動画再生装置、動画再生システム、および動画再生方法 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021111988A1 true WO2021111988A1 (ja) | 2021-06-10 |
Family
ID=76220579
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2020/044099 Ceased WO2021111988A1 (ja) | 2019-12-03 | 2020-11-26 | 動画再生装置、動画再生システム、および動画再生方法 |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US12231712B2 (https=) |
| JP (1) | JP7365212B2 (https=) |
| WO (1) | WO2021111988A1 (https=) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114095621A (zh) * | 2021-11-18 | 2022-02-25 | 浙江博采传媒有限公司 | 4d照扫音频同步方法、装置及存储介质 |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20250299539A1 (en) * | 2023-03-20 | 2025-09-25 | The Works At Wyomissing, Llc | Systems and methods for remotely playing arcade games |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2004077825A1 (ja) * | 2003-02-27 | 2004-09-10 | Matsushita Electric Industrial Co. Ltd. | データ処理装置および方法 |
| JP2008005112A (ja) * | 2006-06-21 | 2008-01-10 | Matsushita Electric Ind Co Ltd | ストリームエンコーダ及びストリームデコーダ |
| JP2017225044A (ja) * | 2016-06-16 | 2017-12-21 | Kddi株式会社 | コンテンツ配信システムのクライアント装置、コンテンツの取得方法及びプログラム |
Family Cites Families (18)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090089352A1 (en) * | 2007-09-28 | 2009-04-02 | Yahoo!, Inc. | Distributed live multimedia switching mechanism and network |
| US8442424B2 (en) * | 2008-09-26 | 2013-05-14 | Deep Rock Drive Partners Inc. | Interactive live political events |
| US8713602B2 (en) * | 2010-07-01 | 2014-04-29 | Comcast Cable Communications, Llc | Alternate source programming |
| US8640181B1 (en) * | 2010-09-15 | 2014-01-28 | Mlb Advanced Media, L.P. | Synchronous and multi-sourced audio and video broadcast |
| JP2012105236A (ja) * | 2010-11-15 | 2012-05-31 | Renesas Electronics Corp | 同期制御装置、同期制御方法 |
| US8805158B2 (en) * | 2012-02-08 | 2014-08-12 | Nokia Corporation | Video viewing angle selection |
| US20140098185A1 (en) * | 2012-10-09 | 2014-04-10 | Shahram Davari | Interactive user selected video/audio views by real time stitching and selective delivery of multiple video/audio sources |
| WO2014100374A2 (en) * | 2012-12-19 | 2014-06-26 | Rabbit, Inc. | Method and system for content sharing and discovery |
| US9609373B2 (en) * | 2013-10-25 | 2017-03-28 | Avago Technologies General Ip (Singapore) Pte. Ltd. | Presentation timeline synchronization across audio-video (AV) streams |
| US10754511B2 (en) * | 2013-11-20 | 2020-08-25 | Google Llc | Multi-view audio and video interactive playback |
| US10462524B2 (en) * | 2015-06-23 | 2019-10-29 | Facebook, Inc. | Streaming media presentation system |
| JP6609468B2 (ja) | 2015-12-07 | 2019-11-20 | 日本放送協会 | 受信装置、再生時刻制御方法、及びプログラム |
| US11012719B2 (en) * | 2016-03-08 | 2021-05-18 | DISH Technologies L.L.C. | Apparatus, systems and methods for control of sporting event presentation based on viewer engagement |
| US20200322406A1 (en) | 2016-05-24 | 2020-10-08 | Sharp Kabushiki Kaisha | Systems and methods for signaling scalable video in a media application format |
| JP6359074B2 (ja) | 2016-12-01 | 2018-07-18 | 株式会社インフォシティ | コンテンツ配信システム |
| JP6560696B2 (ja) * | 2017-01-30 | 2019-08-14 | Kddi株式会社 | データのセグメント受信を制御するクライアント、プログラム及び方法 |
| US10616624B2 (en) * | 2017-03-01 | 2020-04-07 | Rhinobird Inc. | Multi-angle video synchronization and multi-angle video interface |
| US10965862B2 (en) * | 2018-01-18 | 2021-03-30 | Google Llc | Multi-camera navigation interface |
-
2019
- 2019-12-03 JP JP2019218746A patent/JP7365212B2/ja active Active
-
2020
- 2020-11-26 WO PCT/JP2020/044099 patent/WO2021111988A1/ja not_active Ceased
- 2020-11-26 US US17/777,601 patent/US12231712B2/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2004077825A1 (ja) * | 2003-02-27 | 2004-09-10 | Matsushita Electric Industrial Co. Ltd. | データ処理装置および方法 |
| JP2008005112A (ja) * | 2006-06-21 | 2008-01-10 | Matsushita Electric Ind Co Ltd | ストリームエンコーダ及びストリームデコーダ |
| JP2017225044A (ja) * | 2016-06-16 | 2017-12-21 | Kddi株式会社 | コンテンツ配信システムのクライアント装置、コンテンツの取得方法及びプログラム |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114095621A (zh) * | 2021-11-18 | 2022-02-25 | 浙江博采传媒有限公司 | 4d照扫音频同步方法、装置及存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP7365212B2 (ja) | 2023-10-19 |
| US12231712B2 (en) | 2025-02-18 |
| JP2021090118A (ja) | 2021-06-10 |
| US20220408140A1 (en) | 2022-12-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12348799B2 (en) | Advanced trick-play modes for streaming video | |
| CN103190092B (zh) | 用于流数字内容的同步重放的系统和方法 | |
| US10930318B2 (en) | Gapless video looping | |
| CA2976437C (en) | Methods and apparatus for reducing latency shift in switching between distinct content streams | |
| US9942622B2 (en) | Methods and systems for synchronizing media stream presentations | |
| CN111316652A (zh) | 使用对齐编码内容片段的个性化内容流 | |
| CN115398924B (zh) | 用于媒体播放的方法、设备及存储介质 | |
| EP3850860B1 (en) | Adaptive switching in a whole home entertainment system | |
| US20180227586A1 (en) | Method and system for media synchronization | |
| CN114697712A (zh) | 一种媒体流的下载方法、装置、设备及存储介质 | |
| WO2021111988A1 (ja) | 動画再生装置、動画再生システム、および動画再生方法 | |
| US11128914B2 (en) | Client side stitching of content into a multimedia stream | |
| CN114223211A (zh) | 信息处理装置和信息处理方法 | |
| JP2022066944A (ja) | 情報処理装置、コンピュータプログラムおよび情報処理システム | |
| US11856242B1 (en) | Synchronization of content during live video stream | |
| JP6987567B2 (ja) | 配信装置、受信装置及びプログラム | |
| JP6941093B2 (ja) | 異種ネットワーキング環境におけるメディアレンダリングの同期化 | |
| CN103053170B (zh) | 用以在串流重放期间提供特技播放的系统和方法 | |
| JP2016174273A (ja) | 画像処理装置、画像処理システム、及び、プログラム | |
| US20250386065A1 (en) | Cloud-Based Video Splitter | |
| KR101810883B1 (ko) | 라이브 스트리밍 시스템 및 그의 스트리밍 클라이언트 | |
| JP2019213125A (ja) | 動画ストリーム受信装置及びプログラム | |
| WO2020050058A1 (ja) | コンテンツ配信システムおよびコンテンツ配信方法、並びにプログラム | |
| WO2016002497A1 (ja) | 情報処理装置および方法、配信システム、並びにプログラム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20897327 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20897327 Country of ref document: EP Kind code of ref document: A1 |