Detailed Description
Some embodiments of the invention will now be described in detail with reference to the drawings, wherein like reference numerals are used to refer to like or similar elements throughout the several views. These examples are only a part of the present invention and do not disclose all possible embodiments of the present invention. Rather, these embodiments are merely exemplary of the method and multimedia file generating apparatus of the present invention as set forth in the claims.
FIG. 1 is a block diagram of a multimedia file generating apparatus according to an embodiment of the present invention, which is for convenience of illustration only and is not intended to limit the present invention. First, fig. 1 first describes all the components and configuration relationships of the multimedia file generating apparatus, and the detailed functions will be disclosed together with fig. 2.
Referring to fig. 1, the multimedia file generating apparatus 10 may be any electronic apparatus with computing capability, such as a desktop computer, a notebook computer, a server, etc., but the present invention is not limited thereto. The multimedia file generating apparatus 10 includes a processor 110 and a storage device 120, and the functions thereof are as follows:
the storage device 120 is, for example, any type of fixed or removable Random Access Memory (RAM), read-only memory (ROM), flash memory (flash memory), or the like or a combination thereof. In the present embodiment, the storage device 120 is used for recording a movie acquisition module 121, a location acquisition module 122, a file creation module 123, and a file embedding module 124.
The Processor 110 is, for example, a Central Processing Unit (CPU), or other Programmable general purpose or special purpose Microprocessor (Microprocessor), Digital Signal Processor (DSP), Programmable controller, Application Specific Integrated Circuit (ASIC), Programmable Logic Device (PLD), or other similar devices or combinations thereof, and is connected to the storage Device 120.
In the present embodiment, the module stored in the storage device 120 is, for example, a computer program, and can be loaded by the processor 110 to execute the method for generating a multimedia file according to the present embodiment.
Fig. 2 is a flowchart illustrating a multimedia file generating method according to an embodiment of the present invention, and the method flow of fig. 2 can be implemented by the elements of the multimedia file generating apparatus 10 of fig. 1. Referring to fig. 1 and fig. 2, the following describes detailed steps of the method for generating a multimedia file according to the present embodiment in combination with various elements and devices of the multimedia file generating device 10 in fig. 1.
In step S201, the film obtaining module 121 obtains a panorama film associated with a time axis, wherein the panorama film includes at least one image object. Here, the movie retrieving module 121 may retrieve the panoramic movie from an image acquiring module (not shown) of the multimedia file generating apparatus 10 itself or from other electronic devices. A panoramic film, which may also be referred to as a 360-degree film, is composed of video frames corresponding to different time stamps (timestamps) on a time axis, and the video frames are 360-degree images stored in a specific format. The above-mentioned specific format is, for example, an Equiangular format or the like. It should be noted that, in the embodiment of the present invention, the panorama film includes at least one image object generated by shooting at least one object, that is, the image object is presented in a video frame of the panorama film. The image object in the panoramic film is, for example, a human face, but the present invention is not limited to this, and may be other kinds of image objects.
In step S202, the position obtaining module 122 obtains a plurality of object positions of the image object relative to the time axis. In one embodiment, the object positions of the image object can be visually observed by a film editor in advance and manually edited to generate the image object. In other words, the position obtaining module 122 can obtain a plurality of object positions of the image object in a three-dimensional coordinate system by allowing a film editor to watch the panoramic film with naked eyes and labeling the object positions of the image object. Alternatively, in one embodiment, the object positions of the graphical object relative to the timeline may be automatically generated by an object detection and recognition algorithm of the image processing technique. In other words, by using the object detection and identification algorithm to track a specific image object in the panoramic film, the position obtaining module 122 can obtain a plurality of object positions of the image object relative to different time intervals in a three-dimensional coordinate system. The object position of the image object may be represented by the spherical coordinates of a spherical coordinate system, for example.
In an embodiment, the object positions of the image object respectively correspond to a plurality of time intervals on a time axis. That is, the object positions of the image objects can be sampled according to a fixed or unfixed time interval. Referring to fig. 3A, fig. 3A is a schematic diagram illustrating positions of a plurality of objects corresponding to a plurality of time intervals according to an embodiment of the invention. For an image object, the position obtaining module 122 can obtain the object position (r1, θ 1, ψ 1) corresponding to the time interval P1, the object position (r2, θ 2, ψ 2) corresponding to the time interval P2, and the object position (r3, θ 3, ψ 3) corresponding to the time interval P3. It should be noted that the time lengths of the time intervals P1-P3 may be the same or different, and the invention is not limited thereto.
In addition, in an embodiment, the number of image objects in the panorama film may be more than two. Thus, the at least one image object in the panoramic film may include a first image object and a second image object. Correspondingly, the object positions relative to the time axis will include a plurality of first object positions of the first graphical object and a plurality of second object positions of the second graphical object. Referring to fig. 3B, fig. 3B is a schematic diagram illustrating positions of a plurality of objects corresponding to a plurality of time intervals according to an embodiment of the invention. For the first image object, the position obtaining module 122 may obtain the object position (r4, θ 4, ψ 4) corresponding to the time interval P1 and the object position (r5, θ 5, ψ 5) corresponding to the time interval P2. For the second image object, the position obtaining module 122 may obtain the object position (r6, θ 6, ψ 6) corresponding to the time interval P1 and the object position (r7, θ 7, ψ 7) corresponding to the time interval P2.
Then, returning to the flow of fig. 2, in step S203, the document making module 123 makes the object positions into an object position document. Specifically, the file creating module 123 may compile the object positions corresponding to the time intervals on the time axis into an object position file in a preset file format. In one embodiment, the object location file may be generated in a manner similar to the generation of the movie subtitle file. Referring to fig. 4, fig. 4 is a diagram illustrating an example of an object location file according to an embodiment of the invention. The object location file 40 records object locations of two image objects in the panorama, named object name a and object name B, respectively, and the object locations are recorded at regular time intervals. The example shown in fig. 4 is a time interval of 1 second, but the invention is not limited thereto. For example, at time 00:01.000, the object position of the image object named "object name a" is (r6, θ 6, ψ 6), and the object position of the image object named "object name B" is (r7, θ 7, ψ 7). At time 00:02.000, the object position of the image object named "object name a" is (r8, θ 8, ψ 8), and the object position of the image object named "object name B" is (r9, θ 9, ψ 9).
In addition, in an embodiment, the file creating module 123 may map the object positions recorded as the three-dimensional position coordinates into two-dimensional position coordinates, and record the two-dimensional position coordinates in the object position file. In general, each video frame in a panoramic film is stored by mapping a panoramic image into a two-dimensional image, such as in the Equiangular format. The object positions recorded as a plurality of three-dimensional position coordinates (e.g., spherical coordinates) can also be mapped to two-dimensional position coordinates in a two-dimensional coordinate system and stored, so as to reduce the data size of the object position file.
Then, in step S204, the file embedding module 124 generates at least one data track of the multimedia file according to the object location file to generate a multimedia file including the panoramic film and having the object location recorded therein. Specifically, fig. 5 is a schematic diagram of a multimedia file architecture according to an embodiment of the present invention. The multimedia file 50 includes a header 51 and multimedia data 52, and the multimedia data 52 includes multimedia data that can be classified into a plurality of data tracks. In other words, the multimedia file 50 may include a plurality of data tracks. The header 51 records therein a description of the characteristics of these tracks and the number of these tracks, which may include a video track 521, an audio track 522, a subtitle track 523, and an object position track 524. Wherein, the video data track is used for classifying the video data; the audio data tracks are used for classifying the audio data, and different audio data tracks can represent different languages; the subtitle data track is used to classify subtitle data, and different subtitle data tracks may represent subtitles in different languages.
In one embodiment, when the object position file includes a plurality of first object positions of the first image object and a plurality of second object positions of the second image object (as shown in the example of fig. 4), the file embedding module 124 may generate a first data track corresponding to the first image object and embed the first object position (e.g., (r4, θ 4, ψ 4), (r6, θ 6, ψ 6), (r6, θ 6, ψ 6) in the object position file) into the first data track. On the other hand, the file embedding module 124 may generate a second data track corresponding to the second image object, and embed the second object position in the object position file (e.g., (r5, θ 5, ψ 5), (r7, θ 7, ψ 7), (r9, θ 9, ψ 9) of fig. 4) into the second data track. That is, the number of object position data tracks is determined by the number of labeled view image objects, and the object position of each image object is recorded by the corresponding object position data track. That is, different object position data tracks may represent position information of different image objects.
It is noted that, compared to the conventional multimedia file, the multimedia file 50 of the present embodiment further includes an object position data track 524 for recording the object position. The file embedding module 124 can establish at least one data track (i.e., the object location data track 524) of the multimedia file 50 according to the object location file, such as embedding the data in the object location file 40 shown in fig. 4 into the object location data track 524 of the multimedia file 50. Herein, embedding specific data into at least one data track of the multimedia file 50 represents embedding specific data into data blocks of the data track in the multimedia file 50. Furthermore, the header 51 records the description of the characteristics of the object location data tracks and the number of the object location data tracks. In this way, the player for playing the multimedia file 50 can obtain the position information of one or more image objects in the panorama from the object position data track 524, in addition to playing the panorama in the multimedia file 50.
After describing how to generate the multimedia file recorded with the object positions of the image objects in the panoramic film, the following embodiments will describe how to play the panoramic film according to the multimedia file of the present disclosure.
Fig. 6 is a block diagram of a multimedia file playing apparatus according to an embodiment of the present invention, which is for convenience of illustration only and is not intended to limit the present invention. First, fig. 6 first describes all the components and configuration relationships of the multimedia file playing apparatus, and the detailed functions will be disclosed together with fig. 7.
Referring to fig. 6, the multimedia file playing device 60 may be any electronic device with computing capability and image display capability, such as a desktop computer, a notebook computer, a smart phone, a tablet, and the like, which is not limited in the present invention. The multimedia file playing device 60 includes a processor 610, a storage device 620 and a screen 630.
The storage device 620 can be any type of fixed or removable random access memory, read only memory, flash memory, or the like, or any combination thereof. In the present embodiment, the storage device 620 is used for recording a movie receiving module 621, a data track extracting module 622, an interface providing module 623, and a movie playing module 624. In one embodiment, the module may be implemented as a software player.
The processor 610 is, for example, a central processing unit or other programmable general purpose or special purpose microprocessor, digital signal processor, programmable controller, application specific integrated circuit, programmable logic device, or the like, or a combination thereof, coupled to the storage device 620.
The screen 630 is used for displaying the image outputted by the multimedia file playing apparatus 60 for the user to watch. In the present embodiment, the multimedia file playing device 60 is, for example, a Liquid Crystal Display (LCD), a Light-Emitting Diode (LED) Display, a Field Emission Display (FED), or other types of displays.
In the present embodiment, the module stored in the storage device 620 is, for example, a computer program, and can be loaded by the processor 610 to execute the method for playing a multimedia file according to the present embodiment.
Fig. 7 is a flowchart illustrating a method for playing a multimedia file according to an embodiment of the present invention, and the method flowchart of fig. 7 can be implemented by the elements of the multimedia file playing apparatus 60 of fig. 6. Referring to fig. 6 and fig. 7, the following describes detailed steps of the method for playing a multimedia file according to the present embodiment in conjunction with various elements and devices of the multimedia file generating device 60 in fig. 6.
In step S701, the movie receiving module 621 receives a multimedia file including a panorama movie associated with a time axis. The movie receiving module 621 may receive a multimedia file including a panoramic movie via a wired or wireless network, and may also read the multimedia file stored in the storage device 620 or other external storage devices. In step S702, the track extraction module 622 extracts a first track of the multimedia file to obtain a plurality of first object positions of a first image object in the panorama film relative to a time axis. Specifically, the track extraction module 622 can demultiplex (demux) the multimedia file to obtain the multimedia data corresponding to each track. In one embodiment, the tracks of the multimedia file may include a video track, an audio track, a subtitle track, and an object position track. The track extraction module 622 can extract multimedia data classified into an object position track from the multimedia file, where the multimedia data classified into the object position track is a plurality of first object positions of a first image object in the panorama relative to a time axis. The object positions in the object position data track are described in detail in the foregoing embodiments, and are not described herein again. Similarly, the track extraction module 622 can also extract the video data classified into the video tracks from the multimedia file and decode the video data to obtain a plurality of video frames of the panorama film.
Then, in step S703, when the panorama film is played, the interface providing module 623 displays a picture corresponding to the first image object and shown on the screen 630. Specifically, the interface providing module 623 can provide a user interface of the player, which can include a frame playing area and a playing control column. It should be noted that by parsing the number of object location tracks in the header (e.g., the header 51 shown in fig. 5) of the multimedia file, the interface providing module 623 can know how many image objects are marked in advance in the movie content of the panorama film. Thus, while playing the panoramic film, the interface providing module 623 may display the image of the image object labeled in advance on the screen 630. The graphic representations can be any shape of interactive objects, and the names or representative patterns of the corresponding image objects are presented in each graphic representation to quickly guide the user to the emphasis on the panoramic image. In addition, each icon can be displayed on the edge of the playing frame or in the playing control column of the player, so as to avoid affecting the user's watching of the panoramic film.
The processor 610 then continuously detects whether the user selects any icon, and responds to the detection of the selection operation applied to a certain icon by the user. Therefore, in step S704, in response to detecting the selection operation applied to the icon, the movie playing module 624 determines a playing angle for playing the panoramic movie according to the position of the first object recorded in the first track, and plays the frame including the first image object based on the playing angle. That is, when the user selects the icon corresponding to the first image object, the film playing module 624 can obtain the current object position of the first image object in the panorama film from the object position data track. Then, the film playing module 624 can determine the playing angle of view according to the current object position of the first image object, and the playing frame will be shifted from the preset area of the panoramic film to the first area where the first image object is located, so that the user can quickly view the selected key object.
It is noted that the first object position of the selected first image object may change. Taking fig. 3B as an example, the first object position of the first image object may be changed from (r4, θ 4, ψ 4) to (r6, θ 6, ψ 6). If the playing angle is not adjusted, the first image object may disappear from the playing frame. In an embodiment, the movie playing module 624 may switch the playing angle according to the changed position of the first object again in response to recognizing the change of the position of the first object. Taking fig. 3B as an example, in response to the first object position of the first image object changing from (r4, θ 4, ψ 4) to (r6, θ 6, ψ 6), movie playback module 624 switches the playback view angle from the first view angle to the second view angle. Correspondingly, the playing frame is adjusted from the original first area to the second area where the first image object is located. That is, the movie playback module 624 plays the first area of the panoramic movie at the first viewing angle during the time interval P1, and then plays the second area of the panoramic movie at the second viewing angle during the time interval P2. In this way, the user can continuously view the selected key object without manually adjusting the play angle.
It is appreciated that the number of image objects may be more than two. In one embodiment, in addition to extracting the first track of the multimedia file, the track extraction module 622 can also extract a second track of the multimedia file to obtain a plurality of second object positions of the second image object in the panorama film relative to the time axis. Thus, when the panorama film is played, the interface providing module 623 will also display another picture corresponding to the second image object on the screen 630. Then, in response to detecting the selection operation applied to another illustration, the movie playing module 624 switches the playing angle according to the position of the second image object recorded in the second data track, and plays the frame including the second image object based on the switched playing angle.
For example, fig. 8A and 8B are schematic diagrams illustrating an exemplary playing of a multimedia file according to an embodiment of the invention. Referring to fig. 8A, when the multimedia file playing apparatus 60 plays the multimedia file generated by the present disclosure, the panoramic film is played along with the time axis. The user can adjust the playing angle of the panoramic film by operating the virtual control button 82. The multimedia file playing device 60 can obtain the description characteristics of the object position data track and the number of the object position data tracks according to the header of the multimedia file, so as to obtain the number of the image objects labeled in advance, the object names, and the like. In the present example, assuming that the number of image objects labeled in advance is 3, the multimedia playing device 60 will display three icons I1-I3 on the frame 80, and the three icons I1-I3 respectively show the representative names 'A', 'B' and 'C' of the three image objects.
Assuming that the user wants to watch the image object 83 corresponding to the icon I1 (i.e. the key character a), in response to detecting the user's selection operation with respect to the icon I1, the multimedia file playing apparatus 60 determines the playing perspective for playing the panoramic film according to the object position of the image object 83 recorded in the object position data track, so as to play the frame 80 including the image object 83 according to the just determined playing perspective. In this example, the selected image object 83 will be located in the middle of the frame 80. Then, assuming that the user wants to view the image object 84 corresponding to the icon I2 (i.e. the key character B), in response to detecting the user's selection operation with respect to the icon I2, the multimedia file playing apparatus 60 switches the playing perspective according to the object position of the image object 84 recorded in the object position data track, and plays the frame 86 including the image object 84 based on the switched playing perspective. In the present example, after switching the playback perspective, the selected image object 84 is located in the middle of the frame 86.
In summary, in the embodiment of the present invention, the multimedia file including the panoramic film further includes an object position data track recorded with the position information of the image object. The multimedia file generating device embeds the object position of the image object into the multimedia file, so that the multimedia file playing device can instantly know the object position of the specific image object according to the object position data track during playing the panoramic film. Therefore, the multimedia playing device of the user does not need to have strong computing capability to identify and track the image object. In addition, after the user selects the image object of interest, the multimedia file playing device can dynamically adjust the playing angle of the panoramic film according to the object position of the image object, so as to achieve the playing function of tracking the specific image object. Therefore, the user does not need to manually adjust the playing angle of view to ensure that the user can watch the interested image object, thereby greatly improving the convenience of watching the 360-degree film. The invention can also enable the user to quickly browse the key points in the panoramic film, so that the user can have direct and quick operation and viewing experience when watching the panoramic film.
Although the present invention has been described with reference to the above embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention.