WO2020140478A1

WO2020140478A1 - Method for playing audio, video, and picture data

Info

Publication number: WO2020140478A1
Application number: PCT/CN2019/106073
Authority: WO
Inventors: 李庆成; 鹿毅忠
Original assignee: 李庆成; 鹿毅忠
Priority date: 2019-01-03
Filing date: 2019-09-17
Publication date: 2020-07-09
Also published as: CN111402935B; CN111402935A

Abstract

A method for playing audio, video, and picture data, comprising: downloading and parsing audio, video, and picture data to obtain an upper-layer picture audio, an upper-layer picture, and/or an upper-layer picture alignment parameter and/or an upper-layer audio alignment parameter in upper-layer audio and picture data; and playing the upper-layer picture audio or the upper picture automatically or upon receipt of a command of playing the upper-layer picture audio or the upper picture, and playing the corresponding upper picture or the corresponding upper picture audio when a playback time indicated by the upper-layer picture alignment parameter or the upper-layer audio alignment parameter elapses or upon receipt of the command of playing the upper-layer picture audio or the upper picture. A picture is played together with a specific audio segment in a corresponding manner, so that audio content and picture content are perfectly matched or associated; furthermore, chained or layered playback and embedded playback of audio and picture data and film and television data can be conveniently achieved.

Description

Play method of audio view data

Technical field

The invention relates to a media playback technology, in particular to a method for cross-playing multiple media; belongs to the Internet media technology.

Background technique

Media playback methods include audio, video, animation, pictures, and other individual playback methods, as well as audio and picture combination playback methods. When a combination of existing audio and pictures is played, usually when one piece of audio is played, one or more pictures are successively played in a certain order. Although this type of playback can achieve both audio and video, it also has a major disadvantage: audio playback is basically irrelevant to the way the picture is presented. That is to say: while each picture is being played, although the order of playing can be controlled by the viewer order or in reverse order, the audio can only be played in order and cannot be reversed like switching pictures. In this way, when the audio and the picture are played together, the content of the audio and the content of the picture cannot be matched. This defect makes the existing media playback method of combined audio and picture playback unavailable in many online trainings, lectures, and exchanges, or the user experience is poor.

In addition, in some scenes, it is very meaningful to combine the combination of audio and picture playback with short video or animation. For example: in the scenario of network teaching of mechanical principles, on the one hand, teachers need to combine static mechanical drawings to explain relevant professional content; on the other hand, the demonstration of the corresponding mechanical structure in the working state of the movement process is more helpful for students to teachers The understanding of the theoretical content taught. However, the existing media playback method can only use the aforementioned combination of audio and picture playback, or can only use video or animation presentation alone, and cannot organically combine the two and nest each other. If you only use the combination of audio and pictures to play, it will often make the teacher's content too boring, resulting in unsatisfactory student learning; but if you only use video or animation playback to teach, then on the one hand, video or animation The production cost is high. On the other hand, it also needs better network transmission quality and higher bandwidth resources during playback. The cost is high, and for areas where the network environment is not very good or very stable, this network teaching method will also be restricted.

Purpose of the invention

The main purpose of the present invention is to provide a method for playing audiovisual data. With this method, on the one hand, when playing audiographic data, any picture can correspond to a specific paragraph in the aforementioned audio Play to achieve perfect matching or association of audio content and picture content; on the other hand, when playing video data, you can make necessary preparations for switching or nesting other audiovisual data and video data at any time.

The purpose of the present invention is achieved using the following technical solutions:

Download the audio view data and parse it to obtain upper layer picture audio, upper layer picture and/or upper layer picture alignment parameters and/or upper layer audio alignment parameters in the upper layer audiogram data;

Play the upper layer picture audio or upper layer picture, and when the playback time indicated by the upper layer picture alignment parameter or upper layer audio alignment parameter arrives, or when receiving a command to play the upper layer picture audio or upper layer picture, play the corresponding The upper layer picture or the upper layer picture audio;

or,

Download the audiovisual data and parse it to obtain the upper layer video data or the upper layer animation data, the lower layer sound image logo and/or the lower layer film logo in the upper layer video data;

Automatically, or when receiving the upper layer video data or the upper layer animation data, playing the upper layer video data or the upper layer animation data.

Using the above-mentioned method of the present invention, people can pre-associate the pictures to be played and corresponding audio passages to obtain the alignment (association) parameters between them during playback; during playback, according to the pre-formed and The parameters downloaded together with these audiographic data or video data to control the playback of the relevant audiographic data or video data can enable any picture to be played corresponding to a specific audio paragraph described above, and realize audio content Perfect match or association with the picture content; on the other hand, it can also easily realize the audio or video data, film and television data chain or layered playback and embedded playback.

In the following, the technical solutions of the present invention will be disclosed in more detail in conjunction with various specific embodiments.

Specific implementation

Before introducing the specific embodiments of the present invention in detail, it is necessary to make a specific description of some data objects and terms involved in the present invention. When researching and developing various technical solutions of the present invention, the present inventor systematically sorted out the various data objects involved in the present invention, thus establishing and defining the following data objects:

1. Audio-view data: Audio-view data mainly includes audio-visual data, video data, audio-visual identification of the same layer and video identification of the same layer.

2. Audiographic data: There are two main types of audiographic data.

The first type of audiogram data is composed of a static picture and a piece of audio to be played together with the picture; the static picture is collectively referred to as a picture in the present invention; and the audio is collectively referred to as the present invention Audio for pictures. In addition, the audiogram data is also designed with data that the inventors call alignment parameters; the alignment parameters are divided into picture alignment parameters and audio alignment parameters according to their different functions.

The second type of audiogram data is composed of multiple static pictures and multiple pieces of audio corresponding to multiple static pictures; these static pictures are also collectively referred to as pictures in the present invention; and these audio pictures are provided in the present invention Also referred to collectively as picture audio. In addition, since there are many pictures and audios, the alignment parameters designed in the audiogram data correspond to them, and there are also many; the number corresponds to the number of pictures or the number of audios; the second The alignment parameters in the typed audiogram data are the same as the first type of audiogram data, and are also divided into picture alignment parameters and audio alignment parameters.

In the present invention, the audiogram data is a complete data object, which can be composed of any existing data formats of pictures, audio and information, or in a specific scheme, by relevant technical personnel according to specific needs. They reconstruct an integrated data object in a completely new format. In any case, in the present invention, as long as the data object has the above-mentioned data components, it is called phonetic data.

3. Film and television data: Film and television data is mainly composed of video data or animation data. In addition, the film and television data is also designed with two types of data: the lower layer audiogram logo and the lower layer film and television logo. Generally speaking, the video data can be designed to have only one video data or animation data; of course, multiple video data or animation data can also be set. In that case, you need to be more careful in designing the playback software, and there will be data logic errors on the one hand. However, for the technical solution of the present invention, only more combinations are added.

4. The same layer audiograph logo and the same layer video logo.

The same layer audiogram identifier is used to tell the playback device: after the downloaded audiogram data or video data, is there any audiogram data that needs to be downloaded and played later; for this reason, at least in the same layer audiogram identifier To include the same layer audio image download parameters and the same layer audio image playback parameters, the same layer audio image download parameters are used to indicate how the subsequent audio image data is downloaded, and the same layer audio image playback parameters are used to tell the playback device when to play the downloaded Subsequent audiogram data.

The same layer of film and television logo is used to tell the playback device: after the downloaded audiovisual data or film and television data, is there any film and television data that needs to be downloaded and played later; for this reason, at least the same layer of film and television logo must contain the same The film and television download parameters of the same layer and the film and television playback parameters of the same layer. The film and television download parameters of the same layer are used to indicate how to download the subsequent film and television data.

5. The meaning of upper layer, lower layer and same layer.

Based on various specific implementations thereafter, it can be seen that a very valuable solution of the technical solution of the present invention is that it can achieve the following technical effects: multiple audio view data of the present invention can be combined into an audio view data “chain ", multiple audio-view data on this "chain" can be played sequentially; on the other hand, any audio-view data on the aforementioned audio-view data "chain" can be inserted to play another audio-view data, and The audio view data inserted and played may be one audio view data in the "chain" of audio view data in the next layer. Obviously, there is a structural relationship between the same layer and the upper and lower layers of the audio view data. For the convenience of description in various subsequent specific embodiments of the present invention, the inventor introduced the "upper layer", "lower layer" and "same layer" And other concepts, which are used as attributions of audiovisual data, video data, and corresponding data content, and the meaning of their description is just as explained in the foregoing content.

In the first specific embodiment of the present invention, it is mainly directed to the specific scheme of playing audiogram data. In this scheme: first, it is necessary to download audiogram data with audiogram data as the main body; when this audiogram data is downloaded After reaching the playback device, it needs to be parsed; the specific analysis scheme needs to be carried out according to the specific format of the aforementioned audiogram data. Through parsing, you can obtain pictures and picture audio that can be used for playback from the audiogram data. Next, you can start playing the picture audio in the audiogram data, and display the picture together.

In the second specific embodiment of the present invention, as mentioned above, in some cases, unlike the audiographic data in the foregoing first specific embodiment, the number of picture audios or pictures in the audiographic data is also The following three situations will appear: a. one picture audio and multiple pictures; b. multiple picture audio and one picture; c. multiple picture audio and multiple pictures. At this time, it is necessary to use the alignment parameter to instruct the playback device how to play such audiogram data. Of course, based on the foregoing three cases, the alignment parameters are not only divided into picture alignment parameters and audio alignment parameters, but also divided into three cases of one-to-many, many-to-one, and many-to-many.

When the audio image data contains a picture audio and multiple pictures, the picture audio will be played first when playing, and at the same time or after the picture audio is played, multiple pictures will be displayed one after another. Corresponding to this situation, the alignment parameters configured in the audiogram data are multiple picture alignment parameters, which correspond to multiple pictures one by one, and are used to indicate when the corresponding picture starts when the picture audio is played. display.

When the audio image data contains a picture and multiple picture audios (although this may be rare), when playing, the picture is first displayed, and at the same time or after the picture is displayed, multiple pictures are played one after another Audio. Corresponding to this situation, the alignment parameters configured in the audiogram data are multiple audio alignment parameters, which are in one-to-one correspondence with multiple picture audios, and are used to indicate when the corresponding picture audio is displayed in the picture. Start playing. This kind of audio image data contains a picture and multiple image audios, which can often be used in the case where the underlying audio view data needs to be inserted and played, or it can be used when it is referenced by other audio views.

When the audio image data includes multiple picture audios and multiple pictures, whether to start playing picture audio or pictures first during playback needs to be determined according to picture alignment parameters or audio alignment parameters. Of course, the number of these picture alignment parameters or audio alignment parameters corresponds to the number of pictures or picture audios, respectively, and is used to indicate the timing of the corresponding picture or picture audio playback, respectively.

In addition, in some cases, the picture alignment parameter can be set to the index or flag of the corresponding picture audio; in this way, when playing a certain picture, the playback device can find the download according to the index or flag of the picture audio corresponding to the picture Go to the audio section of the playback device and perform the steps to play the audio section. Similarly, the audio alignment parameter can be set to the index or logo of the corresponding picture; in this way, when playing an audio, the playback device can find the image downloaded to the playback device according to the index or logo of the corresponding picture of the audio and perform display The steps of the picture. The advantage of this is that for the playback of audiovisual data, whether it is automatic or user-controlled, according to the aforementioned method, the played picture and the picture audio can always accurately correspond, and the picture will never appear again. The display has nothing to do with audio playback. Really achieve both sound and picture.

It should be additionally noted that the aforementioned way of setting the alignment parameter as an index or a mark can be used not only for playback control of audiographic data, but also for playback control of video data. The only difference is where these alignment parameters are set; this point will also be disclosed in subsequent specific embodiments of the present invention. In addition, the aforementioned audiovisual data, as well as film and television data, cover various situations such as the upper layer, the same layer, and the lower layer. I will not repeat them here.

As mentioned before, the specific embodiments of the first and second categories of the present invention achieve such a technical effect: when playing a sound image data composed of more than one audio and more than one picture, due to the introduction of the picture audio pair The quasi-parameters or picture alignment parameters can make any piece of picture audio can be associated with the corresponding picture to play, so that the played picture and audio have the correlation required by the user; making the audio in the prior art The defect that the picture cannot be connected is overcome. What is more meaningful is that this correspondence between pictures and audio makes the audiogram data as a new form of digital media, which can be easily produced and played flexibly.

The third embodiment of the present invention is the case where the audio and video data is video data. In this case, of course, the audio view data is first downloaded and parsed. The video data of this audio view data usually contains a video data or an animation data. In fact, whether it is video data or animation data, There is not much difference in the visual experience of the viewers. They are just a combination of a series of dynamic images and audio and audio. It's just that there is a difference in storage format between video data and animation data. After parsing video data or animation data from movie data, you can start playing them.

At the same time or after performing the above operations, a necessary step of the present invention is to parse out the lower-layer audiograph logo and the lower-layer video logo carried in the downloaded audio-view data. As mentioned above, in many specific embodiments of the present invention, when a sound view data of the upper layer is played, a scheme of playing the sound view data of the lower layer may be inserted. For the realization of this kind of solution, the identifier of the audio view data to be inserted and played needs to be set in advance in the audio view data. The identification of the audiovisual data may be the identification of the audiographic data, the identification of the video data, or both, depending on the type of data to be inserted. These logos are called the lower audio image logo and the lower video logo, respectively, and are used to inform the playback device which audio view data needs to be inserted. In addition, it should be noted that: these lower-layer audiographic logos and lower-layer video logos may be one or more, respectively.

The third specific embodiment of the present invention achieves such a technical effect: due to the introduction of the lower layer audiograph logo and the lower layer movie logo, you can insert and play a lower layer audiogram data or video at any time while playing the current film and television data data. This provides a means of referencing and playing background information and knowledge for the production and playback of certain knowledge-based audiovisual data; it also realizes a multi-level structure of audiovisual data.

All the above specific embodiments of the present invention are the most basic types of specific technical solutions of the present invention. After all the audio view data is downloaded, there may be three playing conditions. The first is automatic playback after downloading. The second is to play according to the instructions of the corresponding alignment parameters, and the third is to start playing when a command to play picture audio or pictures, or a command to play video data or animation data is received.

In all the specific embodiments of the present invention described above, whether it is audiovisual data, video data, and the respective components contained therein, it can be regarded as upper layer data or identification. Therefore, in the text of the present invention, the attributive "upper layer" is added in front of them to indicate their specific positions in the audiovisual data playback technical solution of the present invention.

As mentioned above, in the audiogram data of the present invention, there can be a. one picture audio and multiple pictures, b. multiple picture audio and one picture, c. multiple picture audio and multiple pictures and d. one picture There are four situations such as audio and a picture, which makes the generation of audiogram data very flexible. People can generate various types of audiographic data according to the generation, playback, mutual reference, and nested playback of audiographic data. Based on this background, there are cases where multiple mutually independent audiographic data are played in series. The present invention refers to these mutually independent and serially played audiogram data as: audiogram data of the same layer. Under the overall audiogram data, the structural relationship between them is the same layer relationship.

For this reason, based on any one of the aforementioned specific embodiments of the first, second, and third categories of the present invention, in the fourth specific embodiment of the present invention, it is necessary to set the same layer audiogram identifier in the audioview data. When the same layer audio view data set with the same layer audio image identifier is downloaded, the playback device can parse and extract them from it. In the same layer audiograph identification, the same layer audiograph download parameter and the same layer audiograph playback parameter are generally set. The same layer audiograph download parameter is used to instruct the playback device how to download the corresponding same layer audiograph data, for example: The layer audiogram download parameter can be directly a link address to point to the Internet address of the same layer audiogram data to be downloaded; for another example: it can also be a code string, after the playback device obtains this code string, it can be sent to a fixed The server sends a download request with this code string, and the server generates or queries corresponding audiogram data of the same layer according to the request with this code string, and performs further download operations with the corresponding playback device. The audio layer playback parameters of the same layer are used to indicate when and in what manner the playback device plays the downloaded audio layer data of the same layer. Obviously, in one sound view data, except for the first downloaded sound image data, other sound image data on the same layer as the first downloaded sound image data is the same layer sound image data. There can be multiple audiogram data of the same layer. Correspondingly, the audiogram identifiers of the same layer are correspondingly set to be multiple, and they correspond to the audiogram data of the same layer.

In the foregoing fourth embodiment of the present invention, only the case of audiogram data of the same layer is involved. In fact, the same layer and serial playback relationship with the first downloaded and played audiogram data and the aforementioned audiogram data of the same layer also have the same layer of video data. That is to say: in a "chain" of audiovisual data, there can also be a situation in which audiovisual data of the same layer and video data of the same layer are serially mixed in sequence.

For this reason, based on any one of the foregoing specific embodiments of the first, second, and third categories of the present invention, in the specific embodiment of the fifth category of the present invention, it is also possible to set the same layer of video identification in the audio view data. After the audio data of the same layer set with the film and television identification of the same layer is downloaded, the playback device can parse and extract them. In the same layer of film and television logo, the same layer of film and television download parameters and the same layer of film and television playback parameters are set. The same layer of film and television download parameters are used to instruct the playback device how to download the corresponding film and television data of the same layer, for example: the same layer of film and television download parameters can Directly is a link address to point to the Internet address of the downloaded film and television data of the same layer; another example: it can also be a code string, after the playback device obtains this code string, it can send this code string to a fixed server For the download request of the server, the server generates or queries the corresponding film and television data of the same layer according to the request with this code string, and performs further download operations with the corresponding playback device. The same layer video playback parameters are used to indicate when and in what manner the playback device plays the downloaded same layer video data. Obviously, in the audio view data, in addition to the first downloaded audio image data, other video data on the same layer as the first downloaded video data is the same layer of video data. There can be multiple film and television data on the same layer. Correspondingly, the film and television identifications on the same layer are correspondingly set to be multiple, and they correspond to the film and television data on the same layer.

The foregoing specific embodiments of the fourth and fifth categories of the present invention are used together in many cases. These situations are actually in a sound view data, except for the sound picture data that is downloaded and played first, Thereafter, there are one or more audiogram data of the same layer, and one or more video data of the same layer; the order of the discharge of the audiogram data of the same layer and the video data of the same layer can be arbitrary.

Embodiment 6 of the present invention: In the foregoing embodiments 4 and 5 of the present invention, it is only concerned that the first one to be downloaded and played is audiogram data. In fact, for the first case where the video data is downloaded and played, the same layer of audiovisual data and/or the same layer of video data are also downloaded and played in sequence. The specific embodiment of the sequential downloading and playing of the audiovisual data of the same layer and/or the video data of the same layer corresponds to the audiovisual data of the same layer and/or video data of the same layer in the fourth and fifth specific embodiments of the present invention. The download and playback scheme is the same.

The above-mentioned specific embodiments of the fourth, fifth, and sixth categories of the present invention. On the one hand, one or more audio data of the same layer, one or more video data of the same layer are serially played, organized for audio view data, Downloading and playing provide a very flexible advantage; on the other hand, this technical solution can also be used to reasonably configure and use network bandwidth. Since the playback relationship between the audiovisual data of the same layer and the number of movies and videos of the same layer is serial, in the case of limited network bandwidth resources, different audiographic data and video of the same layer can be downloaded in batches in accordance with the order of playback. Data; this can make more efficient use of bandwidth resources and bring users a better playback experience.

As described in the previous specific embodiments 4, 5, and 6 of the present invention, although the audiovisual data and/or video data of the same layer are downloaded according to the audiographic identification and/or video identification of the same layer, for the download The playback of the same layer of audiovisual data and/or the same layer of video data to the playback device also requires a proper operation process, rather than simply downloading and starting to play. Therefore, based on any one of the foregoing specific embodiments of the present invention, the following specific solutions are provided in the seventh specific embodiment of the present invention:

When the playback time indicated by the same layer audio image playback parameter arrives, you need to stop playing the upper layer image audio or upper layer image, or stop playing the upper layer video data or upper layer animation data; then play the same layer image audio or same layer image, And when the playback time indicated by the same layer picture alignment parameter or the same layer audio alignment parameter arrives, the corresponding same layer picture or the same layer picture audio is played. The purpose of this is: if the upper layer audiogram data or the upper layer video data is still playing when the playback time indicated by the same layer audiogram playback parameters arrives, you need to terminate the upper layer audiogram data or the upper layer video data Play, then you can start playing the same layer of audiographic data; thus avoiding the upper layer audiographic data or upper layer video data and the same layer of audiographic data playing at the same time.

In addition, sometimes, when the audiogram data of the same layer has been downloaded to the playback device, and the upper layer audiogram data or the upper layer video data has not been played, the user wants to immediately play the same layer audiogram data or the same layer video data , This situation should also be common. The seventh embodiment of the present invention provides a way for the user to directly intervene in the playback of audiovisual data of the same layer or video data of the same layer, that is: upon receiving a user command to start the playback of audiographic data or video data of the same layer When, stop playing the audio of the upper layer picture or the upper layer picture; or, stop playing the upper layer video data or the upper layer animation data;

It is basically the same as the operation of playing the audiovisual data of the same layer. When playing the video data of the same layer, it is divided into the same two situations. For these two situations, in the specific embodiments of the foregoing categories 1-6 of the present invention On the basis of any one, the eighth specific implementation of the present invention is as follows:

When the playback time indicated by the same layer of video playback parameters arrives, stop playing the upper layer image audio or upper layer picture; or, stop playing the upper layer video data or upper layer animation data; and then start playing the downloaded same layer video data or same layer animation data .

In addition, when receiving a user command to start playing the same layer of video data, stop playing the upper layer image audio or upper layer picture; or, stop playing the upper layer video data or upper layer animation data; and then start playing the downloaded layer video data or the same layer Animation data.

The foregoing technical solutions of the specific embodiments of the 7th and 8th embodiments of the present invention mainly provide a solution for playing the audiovisual data of the same layer or the video data of the same layer; all of them only involve starting the playback of the audiographic data of the same layer or the video of the same layer. Data operation; in some cases, after the same layer of audiovisual data or the same layer of video data is played, the playback operation stops. However, as mentioned above, in some cases, in addition to the currently playing audiovisual data or audiovisual data of the same layer, there are other audiographic data or audiovisual data of the same layer. In the audio view data, There are also multiple audiogram identifications of the same layer and/or video identifications of the same layer to indicate the playback device: there are multiple audiographic data of the same layer or video data of the same layer. At this time, it is necessary to handle how to switch from the current audiovisual data or video data playback to other audiovisual data of the same layer or video data of the same layer.

In the technical solution of the ninth specific embodiment of the present invention, based on the foregoing situation of playing the same layer audiogram data of the specific embodiments of the present invention of the seventh and eighth embodiments, the following solution is further provided: User command to stop playing the same layer audiogram data currently being played, or the end of the current layer audiogram data playback; according to the aforementioned same layer audiogram logo and/or the same layer video logo, sequentially play other layer audiogram data or Video data on the same layer.

Similarly, in the technical solution of the tenth embodiment of the present invention, based on the current situation of the same layer of the seventh and eighth embodiments of the present invention playing the same layer of video data, the following solutions are further provided: Receive a user command to stop playing the same layer of film and television data currently being played, or when the current layer of film and television data is finished playing; according to the aforementioned same layer audiograph logo and/or the same layer film and television logo, sequentially play other same layer audiovisual data or Video data on the same layer.

The foregoing specific embodiments of the 9th and 10th categories of the present invention provide how to implement multiple audiogram data of the same layer and multiple audiogram data of the same layer in the audio view data when there are multiple audiogram data and/or video data of the same layer. /Or the same layer of film and television data playback program, so that the aforementioned audio view data "chain" has a rich playback method. In addition, it is precisely because of the provision of the specific embodiments of the 9th and 10th categories of the present invention, as well as the production and expression of audio view programs based on audio view data, there are clips and non-linearities compared to traditional film and television programs. The editor has richer and more flexible expressions.

It should be additionally pointed out that for any audiovisual data and video data, there are the aforementioned three concepts of "upper layer", "lower layer" and "same layer".

Specifically, for an "upper layer audiogram data", according to the instructions of the same layer audiogram identifier and the same layer movie and television label in the audio view data, there may be "same layer audiogram data" and "same layer audiovisual data" And the aforementioned "upper layer audiogram data" is also "same layer audiogram data" relative to its "same layer audiogram data" and "same layer audiovisual data". Similarly, for an "upper film and television data", according to the instructions of the same layer audiograph and audiovisual data in the audio view data, there may be "same layer audiograph data" and "same layer audiovisual data". Compared with its "same layer audiovisual data" and "same layer video data", its "upper layer video data" is also "same layer video data".

In addition, in several specific implementations thereafter, "lower layer audiographic data" and "lower layer video data" will also appear. The so-called "lower layer audiographic data" and "lower layer audiovisual data" are relative to "upper layer audiographic data" and "upper layer audiovisual data". That is: "lower layer audiographic data" and "lower layer video data" are the lower layers of "upper layer audiogram data" and "upper layer video data". However, just like any “upper layer audiographic data” or “upper layer audiovisual data” will have “same layer audiographic data” and/or “same layer audiovisual data”, any “lower layer audiographic data” or “lower layer audiovisual data” The data will also have "same layer audiographic data" and/or "same layer audiovisual data"; and, any "lower layer audiographic data" or "underlayer audiovisual data" relative to its "same layer audiographic data" and "Same layer video data" is also "same layer audiovisual data" or "same layer video data". It can be seen from this: "Same layer audiographic data" and "Same layer audiovisual data" of "Upper layer audiographic data" or "Upper layer audiovisual data" and "Same layer audiographic data" of "Lower layer audiographic data" or "Lower layer audiovisual data" The relationship between "data" and "same layer video data" is also the relationship between "upper layer audiographic data" or "upper layer audiovisual data" and "lower layer audiographic data" or "lower layer audiovisual data".

The various specific implementations thereafter refer to the lower layer "lower layer audiographic data" and/or "lower layer video data".

In order to be able to insert and play the lower layer audiogram data or the lower layer video data when playing an upper layer audiogram data, in the audio view data of the present invention, it is also necessary to set the lower layer audiogram data and/or the lower layer video symbol; in the lower layer The audiogram identification includes at least the lower layer audiogram download parameters and the lower layer audiogram playback parameters; where the lower layer audiogram download parameters are used to instruct the playback device how to download the corresponding audiogram data, and the lower layer audiogram playback parameters are used to instruct the playback device When to start playing the corresponding audiovisual data; the lower layer movie logo contains at least the lower layer movie download parameters and the lower layer movie playback parameters; where the lower layer movie download parameters are used to instruct the playback device how to download the corresponding movie data, while the lower layer movie The playback parameter is used to indicate when the playback device starts playing the corresponding video data.

For this reason, in the twelfth embodiment of the present invention, on the basis of the first, second, and third embodiments of the present invention, it also includes the operation of parsing the audiovisual data and parsing it to obtain the lower-layer video logo. If there is the aforementioned lower film and television logo in the audio view data, it means that when the upper film and television data is played, one or more lower film and television data needs to be inserted and played; thus, the playback device will download the corresponding to the lower film and television download parameters according to the lower film and television logo The lower layer film and television data; wherein: the data content and structure of the lower layer film and television data and the upper layer film and television data are the same, at least by the lower layer video data or lower layer animation data.

The twelfth specific embodiment of the present invention provides a specific technical solution for downloading lower-layer video data based on the lower-level film and television logo on the basis of the specific embodiments of the first, second, and third types of the present invention. , Further inserting and playing the lower layer video data provides data preparation operation.

In the specific embodiments of the 11th and 12th categories of the present invention, the lower layer audiogram download parameters and the lower layer video download parameters are similar to the aforementioned same layer audiogram download parameters or the same layer video download parameters, which can be directly a link address to It points to the Internet address of the downloaded lower layer audiovisual data or lower layer film and television data; it can also be a code string. After the playback device obtains this code string, it can send a download request with this code string to a fixed server, and the server side According to the request with this code string, generate or query the corresponding lower layer audiographic data or lower layer film and television data, and do further download operations with the corresponding playback device.

In the 13th embodiment of the present invention, based on the aforementioned 11th embodiment, since in the 11th embodiment of the present invention, the corresponding lower layer audiogram data is downloaded for the playback of the lower layer audiogram data; The following operations may be further performed: when the playback time indicated by the playback parameters of the lower-layer audiogram reaches, or, when a user command to start playing the lower-layer audiogram data is received, the playback of the audio or upper-layer image of the currently-playing upper-layer image is suspended, And generate a corresponding upper layer audiovisual data suspension playing mark; or, stop playing the upper layer video data or upper layer animation data, and generate a corresponding upper layer film and television data suspension playing mark;

Since the playback of the lower layer audiogram data is inserted when the upper layer audiogram data or the upper layer video data is played, after the playback of the lower layer audiogram data ends, you need to return to the original insertion point and start from the insertion point Continue to play the upper layer audiogram data or upper layer video data that have not been played later, so you need to record the aforementioned insertion point before playing the lower layer audiogram data to ensure that you can return. Therefore, the thirteenth embodiment of the present invention provides a solution to generate a corresponding playback tag for the upper layer audiogram data or a playback tag for the upper layer video data. It should be additionally noted that, as mentioned above, relative to the lower layer audiographic data of the present invention, it may have multiple upper layer audiographic data and/or upper layer video data; therefore, in some cases, the lower layer audio is inserted and played The time point of the image data may be exactly between the playback of the two upper layer data, that is: after the playback of an upper layer audiogram data and before the playback of an upper layer audiogram data or upper layer video data; or, in an upper layer After playing the video data, and before playing the upper layer audiovisual data or the upper layer video data. In this case, it also belongs to the aforementioned insert play. Therefore, it is also necessary to generate a corresponding upper layer audiograph data termination playback mark or upper layer video data termination playback mark.

Simultaneously with or after the aforementioned stop play flag is generated, the playback of the lower layer picture audio or lower layer picture is performed, and when the playback time indicated by the lower layer picture alignment parameter or lower layer audio alignment parameter arrives, the corresponding lower layer picture or lower layer picture is played Audio.

In the 14th specific embodiment of the present invention, based on the aforementioned 11th specific embodiment, since in the 11th specific embodiment of the present invention, the corresponding lower layer video data is downloaded for the playback of the lower layer video data; therefore, it can be further Perform the following operations: when the playback time indicated by the lower-layer video playback parameters arrives, or when a user command to start playing the lower-layer video data is received, stop playing the audio or the upper-layer image currently playing, and generate the corresponding The upper layer audiovisual data suspends the playback mark; or, suspends the playback of the upper layer video data or the upper layer animation data, and generates the corresponding upper layer film and television data suspends the playback mark;

Because the playback of the lower layer video data is inserted when the upper layer audiovisual data or the upper layer video data is played, after the playback of the lower layer video data ends, you need to return to the original insertion point and continue to play from the insertion point There is no upper layer audiographic data or upper layer video data that has not been played later, so you need to record the aforementioned insertion point before playing the lower layer audiographic data to ensure that you can return. Therefore, the 14th embodiment of the present invention provides a solution for generating a corresponding playback tag for the upper layer audiogram data or a playback tag for the upper layer video data. It should be additionally noted that, as mentioned above, relative to the lower-layer video data of the present invention, it may have multiple upper-layer audiographic data and/or upper-layer video data; therefore, in some cases, the lower-layer video data is inserted and played The time may be exactly between the playback of the two upper layer data, that is: after the playback of an upper layer audiogram data, and before the playback of an upper layer audiogram data or upper layer video data; or, in an upper layer video data After the playback ends, and before the playback of an upper layer audiovisual data or upper layer video data. In this case, it also belongs to the aforementioned insert play. Therefore, it is also necessary to generate a corresponding upper layer audiograph data termination playback mark or upper layer video data termination playback mark.

Simultaneously with or after the aforementioned stop play flag is generated, an operation of playing the lower layer video data or the lower layer animation data in the lower layer movie data is performed.

The specific embodiments of categories 13 and 14 of the present invention are based on the foregoing specific embodiments of categories 1, 2, and 3 of the present invention, and a technical solution for inserting and playing lower layer audiographic data or lower layer video data is added. These two types of technical solutions can also introduce technical solutions combining the specific embodiments of the foregoing categories 4 to 12 of the present invention. Therefore, it can bring the following more meaningful technical effects:

As mentioned above, the specific embodiments of categories 1-12 of the present invention provide a variety of solutions for the same layer of audiographic data and/or the same layer of video data playback, which can play only one audiographic data or video at a time. The data can also play a video data link composed of multiple audiovisual data and video data connected before and after.

The specific embodiments of categories 13 and 14 of the present invention are based on the foregoing specific embodiments of categories 1-12 of the present invention, and a technical solution for inserting and playing lower layer audiographic data or lower layer video data is added. As mentioned before: no matter the lower layer audiographic data or the lower layer video data, they can also have their own same layer audiographic data or the same layer video data; in this way, the specific embodiments of the 13th and 14th categories of the present invention are the present invention Provides upper and lower layers, each layer can have multiple insertion points, and each insertion point can be inserted into the lower layer "chained" audiovisual data. This provides the technical solutions of the present invention with various application fields and situations, and provides extremely rich solution support.

The fifteenth embodiment of the present invention provides the lower layer audiographic data or the lower layer film and television to terminate playback and the technical solution to be executed thereafter. As mentioned above, the playback of the lower layer audiographic data or lower layer video data is inserted when the upper layer audiographic data is played. Therefore, whether a single lower layer audiographic data or video data is played, or multiple lower layer audiographic images are played The "same layer audio view data link" composed of data or lower layer video data, as long as the "chain" is played in sequence, when the "chain" playback ends; the following operations need to be performed: based on the upper layer audiograph data, the playback mark is aborted Continue to play the upper layer audiovisual data; or, based on the upper layer film and television data, stop playing the mark to continue playing the upper layer film and television data.

The 16th embodiment of the present invention provides the lower layer audiographic data or the lower layer film and television to terminate playback and the technical solution to be executed thereafter.

Another case that is different from the specific embodiment of the 15th category of the present invention is: whether it is playing a single lower layer audiographic data or lower layer video data, or playing is composed of multiple lower layer audiographic data or lower layer video data "Same layer audio view data link", when any lower layer audio image data is played in this "chain", when the playback device receives a user command to terminate playing the lower layer audio image data or the lower layer video data, it is based on the upper layer audio image data The playback stop mark continues to play the upper layer audiovisual data; or, the playback stop mark based on the upper layer video data continues to play the upper layer video data.

The technical solutions provided by the 15th and 16th specific embodiments of the present invention ensure that the lower layer audiographic data or video data of the present invention can be returned to play the upper layer audiographic data or video data after the playback is completed. This makes the aforementioned specific embodiments of the present invention more complete.

In all the specific embodiments mentioned above, they are respectively related to: upper layer image alignment parameters, upper layer audio alignment parameters, lower layer audio image identification, lower layer video identification, lower layer audio image download parameters, lower layer audio image playback parameters, same layer audio image The information set in the audio view data such as the logo, the same layer of film and television logo, the same layer of audio image download parameters, the same layer of audio image playback parameters, the same layer of picture alignment parameters, the same layer of audio alignment parameters, etc. The way of existence in the data can be separated from the audio, pictures, videos, animations and other data, for example: the information is separately constructed into a packet (stream), and then the packet (stream) and audio, pictures , Video, animation and other data are combined together. You can also embed this information into data such as audio, pictures, videos, animations, etc., so that the information is integrated with these data such as audio, pictures, videos, animations, etc. In this way, when these audios, pictures, videos, and animations are transmitted, the information can be transmitted along with them. The international patent application numbered PCT/CN2016/087445 discloses a technical solution for embedding data into audio data. There is also space for corresponding audio data in video and animation data. Therefore, embedding data in video and animation data is actually a technical solution for embedding data in audio data in video or animation data. In addition, some image formats, video formats, and animation formats also retain some optional fields to allow users to store their own data; therefore, the aforementioned information can also be stored in such fields so that they As part of the picture data, it is transmitted along with the picture. In addition, in some specific cases, you can use some technical solutions similar to the aforementioned international patent applications. Like embedding data in audio, you can embed this information in the content fields of pictures, videos, and animations instead of writing To the reserved field.

In view of this, on the basis of all the foregoing specific embodiments, the 17th specific embodiment of the present invention also includes such technical content: when parsing the upper layer picture and/or upper layer picture audio, from the upper layer picture and/or Parse or extract the upper layer picture alignment parameters and/or upper layer audio alignment parameters embedded in the upper layer picture audio. In addition, for the lower layer audio image logo and the lower layer video logo embedded in the upper layer image, upper layer image audio and/or upper layer video data or upper layer animation data, audio data and/or private data, you can select the corresponding upper layer image, upper layer image Audio and/or upper layer video data or upper layer animation data are extracted or parsed from audio data and/or private data. The private data and the optional fields reserved in the picture format, video format and animation format mentioned above are reserved for users.

On the basis of all the foregoing specific embodiments, the 18th specific embodiment of the present invention also includes such technical content: when parsing the upper layer picture and/or upper layer picture audio, from the upper layer picture and/or upper layer picture audio Parse or extract the same layer audiograph logo and/or the same layer video logo embedded in it. In addition, the alignment parameters and/or audio alignment parameters of the same layer image embedded in the same layer image and the same layer image audio can be extracted or parsed from the corresponding same layer image and the same layer image audio .

On the basis of all the foregoing specific embodiments, the 19th specific embodiment of the present invention also includes such technical content: when parsing the upper layer picture and/or upper layer picture audio, from the upper layer picture and/or upper layer picture audio Analyze the download parameters and playback parameters of the lower layer audiograph embedded in it. In addition, the lower layer picture alignment parameters and/or lower layer audio alignment parameters embedded in the lower layer picture and the lower layer picture audio can be extracted or parsed from the corresponding lower layer picture and lower layer picture audio.

The aforementioned three specific implementations of the 17th, 18th, 19th, etc. of the present invention are mainly used to support the analysis and extraction of parameters and information used to indicate the operation of the playback device; and these parameters and information can be embedded into the upper layer in various ways. In the picture, picture audio, video or animation data of the same layer and the lower layer. This allows these parameters and information to be carried by the corresponding pictures, pictures, audio, video, or animation data in an appropriate way without any additional transmission. It not only ensures the convenience of transmission but also realizes the timely and effective playback control.

Claims

A method for playing audio view data, including:

Download the audio view data and parse it to obtain upper layer picture audio, upper layer picture and/or upper layer picture alignment parameters and/or upper layer audio alignment parameters in the upper layer audiogram data;

Automatically, or upon receiving a command to play the audio or picture of the upper layer picture, play the audio or picture of the upper layer picture, and the play time indicated by the alignment parameter of the upper picture or the audio alignment parameter of the upper layer arrives Or, when receiving the command to play the audio of the upper layer picture or the upper layer picture, play the corresponding audio of the upper layer picture or the upper layer picture;

or,

Download the audiovisual data and parse it to obtain the upper layer video data or the upper layer animation data, the lower layer sound image logo and/or the lower layer film logo in the upper layer video data;

Automatically, or upon receiving a command to play the upper layer video data or upper layer animation data, play the upper layer video data or upper layer animation data.
The method of claim 1, further comprising:

Parsing the audiovisual data to obtain audiogram identification and/or film and television identification of the same layer, the audiogram identification of the same layer at least includes download parameters of audiograms of the same layer and playback parameters of audiograms of the same layer; the same layer The movie and TV data identification contains at least the same layer of movie download parameters and the same layer of movie and TV playback parameters;

Download the same-layer audiogram data corresponding to the same-layer audiogram download parameters based on the same-layer audiogram identifier; wherein: the same-layer audiogram data consists of at least the same-layer audio picture, the same-layer picture, and/or the same-layer audiogram data Picture alignment parameters and/or audio alignment parameters of the same layer;

and / or,

Download the same layer of film and television data corresponding to the same layer of film and television download parameters based on the same layer of film and television identification; the same layer of film and television data consists of at least the same layer of video data or the same layer of animation data.
The method of claim 2, further comprising:

When the playback time indicated by the playback parameters of the audiograms of the same layer arrives, or when a user command to start playing the audiogram data of the same layer is received, the playback of the audio of the upper layer picture or the upper layer picture is terminated; or, the termination Play the upper layer video data or upper layer animation data;

Play the same layer picture audio or the same layer picture, and when the play time indicated by the same layer picture alignment parameter or the same layer audio alignment parameter arrives, play the corresponding same layer picture or same layer picture audio ;

or,

When the playback time indicated by the video playback parameters of the same layer arrives, or when a user command to start playing the video data of the same layer is received, the playback of the audio or picture of the upper layer picture is terminated; or, the playback place is terminated Describe the upper layer video data or upper layer animation data;

Play the same layer of video data or same layer of animation data.
The method according to claim 3, further comprising:

When a user command to stop playing the current layer audiogram data is received, or the current layer audiogram data is finished playing; according to the same layer audiogram identifier and/or the same layer video identifier, other sequencers are played in sequence Layer audiogram data or film and television data of the same layer;

or,

When a user command to stop playing the current film and television data of the same layer is received, or the current playback of the film and television data of the same layer ends; according to the audiogram identification of the same layer and/or the film and television identification of the same layer, other audios of the same layer are sequentially played Picture data or video data on the same layer.
The method of claim 1, further comprising:

Analyze the audiovisual data to obtain a lower-level audiographic logo and/or a lower-level audiovisual logo. The lower-level audiographic logo includes at least a lower-level audiographic image download parameter and a lower-level audiographic image playback parameter; the lower-level audiovisual image includes at least a lower-level videographic image Download parameters and lower-layer video playback parameters;

Download the lower layer audiogram data corresponding to the lower layer audiogram download parameters based on the lower layer audiogram identifier; wherein: the lower layer audiogram data is at least composed of a combination of lower layer image audio, lower layer image, and/or lower layer audiogram alignment parameters Wherein: the lower layer audiogram alignment parameters at least include: lower layer image alignment parameters and/or lower layer audio alignment parameters;

and / or,

Download the lower-layer movie data corresponding to the lower-layer movie download parameters based on the lower-layer movie logo; the lower-layer movie data is composed of at least lower-layer video data or lower-layer animation data.
The method of claim 5, further comprising:

When the playback time indicated by the playback parameters of the lower-layer audiogram reaches, or when a user command to start playing the lower-layer audiogram data is received, the playback of the audio or the upper-layer picture audio is suspended, and the corresponding upper-layer is generated The audiovisual data suspends the playback mark; or, suspends the playback of the upper layer video data or the upper layer animation data, and generates a corresponding upper layer video data suspension playback mark;

Playing the lower layer picture audio or the lower layer picture, and when the play time indicated by the lower layer picture alignment parameter or the lower layer audio alignment parameter arrives, playing the corresponding lower layer picture or the lower layer picture audio;

or,

When the playback time indicated by the lower-layer video playback parameters arrives, or when a user command to start playing the lower-layer video data is received, the playback of the upper-layer picture audio or the upper-layer picture is suspended, and a corresponding upper-layer audio picture is generated The data stop playing mark; or, stop playing the upper layer video data or the upper layer animation data, and generate a corresponding upper layer video data stop playing mark;

Play the lower layer video data or the lower layer animation data.
The method of claim 6, further comprising:

When a user command to stop playing the lower layer audiogram data is received, or the playback of the lower layer audiogram data ends; the playback pause flag is continued based on the upper layer audiogram data to continue playing the upper layer audiogram data; or, based on the The upper layer movie data suspends the playback mark to continue playing the upper layer movie data;

or,

When a user command to stop playing the lower layer video data is received, or the playback of the lower layer video data ends; based on the upper layer audio image data, the playback stop mark continues to play the upper layer audio image data; or, based on the upper layer video The data suspension play flag continues to play the upper layer video data.
The method according to any one of claims 1-7, further comprising:

Parsing the upper layer picture alignment parameters and/or the upper layer audio alignment parameters embedded therein from the upper layer picture and/or upper layer picture audio; and/or,

Parsing the lower layer audiograph logo embedded in the upper layer picture, the upper layer picture audio and/or the upper layer video data or the upper layer animation data audio data and/or private data, and/or the The lower film and television logo.
The method according to any one of claims 2-4, further comprising:

Parsing the same-layer audiographic logo and/or the same-layer video logo embedded in the upper-layer picture and/or the upper-layer picture audio; and/or,

Parsing the same-layer picture alignment parameters and/or the same-layer audio alignment parameters embedded therein from the same-layer picture and/or the same-layer picture audio.
The method according to any one of claims 5-7, further comprising:

Parsing the download parameters and playback parameters of the lower layer audiogram embedded in the upper layer image and/or the upper layer image audio; and/or,

Parsing the lower layer picture alignment parameter and/or the lower layer audio alignment parameter embedded therein from the lower layer picture and/or the lower layer picture audio.