CN116194913A

CN116194913A - Information processing method, encoder, decoder, storage medium and apparatus

Info

Publication number: CN116194913A
Application number: CN202180055507.5A
Authority: CN
Inventors: 于浩平
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-08-21
Filing date: 2021-02-05
Publication date: 2023-05-30
Also published as: WO2022037026A1

Abstract

The embodiment of the application discloses an information processing method, an encoder, a decoder, a storage medium and equipment, wherein the method comprises the following steps: analyzing the code stream to obtain at least one narrative bystander; the at least one narrative bystander information is optionally presented one or more times while the digital visual media is being played by the player. Thus, the richness of the digital visual media can be increased by adding the narration bypass under the condition of not changing the original visual content of the digital visual media, and the viewing effect of the digital visual media can be improved.

Description

Information processing method, encoder, decoder, storage medium and apparatus

Cross Reference to Related Applications

The present application claims priority from a prior U.S. provisional patent application entitled "A method for narrating digital image(s) or video," filed on even 21 in the name of Haoping Yu at 08, 2020, application number 63/068,527, the entire contents of which are incorporated herein by reference.

Technical Field

The embodiment of the application relates to the technical field of digital video processing, in particular to an information processing method, an encoder, a decoder, a storage medium and a device.

Background

As social media applications have grown in popularity, people have begun to learn to communicate through pictures and videos, in addition to direct face-to-face or through telephone communications. Wherein, the picture or video can capture visual information, but not necessarily telling a complete story alone; at this time people also need to send supplementary words through digital visual media to provide background information or to express and reflect their emotions about the subject matter in the pictures and video. Thus, combining pictures and video with text has become a way to express information and emotion.

However, when pictures or videos are being saved in their particular media format, all of the words that are sent in the emotion and background of the visual content are ignored, resulting in the pictures and videos becoming very boring. Although specific video editing software can be adopted at present to add emotion and background words related to visual contents into pictures or videos, the method can lead to the original visual contents to be changed, is not beneficial to sharing and exchanging of a plurality of users, and reduces the watching effect of digital visual media.

Disclosure of Invention

The embodiment of the application provides an information processing method, an encoder, a decoder, a storage medium and a storage device, which can increase the richness of digital visual media by adding narration side white under the condition of not changing the original visual content of the digital visual media, thereby improving the watching effect of the digital visual media.

The technical scheme of the embodiment of the application can be realized as follows:

in a first aspect, an embodiment of the present application provides an information processing method, applied to a decoder, including:

analyzing the code stream to obtain at least one narrative bystander;

the at least one narrative bystander information is optionally presented one or more times while the digital visual media is being played by the player.

In a second aspect, an embodiment of the present application provides an information processing method, applied to an encoder, including:

determining at least one narrative bystander information to be added;

the at least one narrative bypass message is embedded in a target media file or bitstream of the digital visual media in a preset manner without altering the original visual content of the digital visual media.

In a third aspect, embodiments of the present application provide an encoder including a determining unit and an encoding unit; wherein,

the determining unit is configured to determine at least one piece of narrative bystander information to be added;

the encoding unit is configured to embed the at least one narrative bystander information in a target media file or bit stream of the digital visual media in a preset manner without changing the original visual content of the digital visual media.

In a fourth aspect, embodiments of the present application provide an encoder comprising a first memory and a first processor; wherein,

a first memory for storing a computer program capable of running on the first processor;

a first processor for performing the method as described in the second aspect when the computer program is run.

In a fifth aspect, embodiments of the present application provide a decoder, including a decoding unit and a playing unit; wherein,

the decoding unit is configured to analyze the code stream and acquire at least one narrative bystander;

the playing unit is configured to present the at least one narrative bystanding information one or more times selectable when the digital visual media is played by the player.

In a sixth aspect, embodiments of the present application provide a decoder including a second memory and a second processor; wherein,

a second memory for storing a computer program capable of running on the second processor;

a second processor for performing the method according to the first aspect when the computer program is run.

In a seventh aspect, embodiments of the present application provide a computer storage medium storing a computer program which, when executed by a processor, implements a method as described in the first aspect, or a method as described in the second aspect.

In an eighth aspect, embodiments of the present application provide an electronic device, which at least includes an encoder as described in the third aspect or the fourth aspect and a decoder as described in the fifth aspect or the sixth aspect.

The embodiment of the application provides an information processing method, an encoder, a decoder, a storage medium and equipment, wherein at the encoder side, at least one piece of narrative side information to be added is determined; the at least one narrative bypass message is embedded in a target media file or bitstream of the digital visual media in a preset manner without altering the original visual content of the digital visual media. At the decoder side, at least one narrative bystander information is obtained by analyzing the code stream; the at least one narrative bystander information is optionally presented one or more times while the digital visual media is being played by the player. Therefore, for at least one piece of narrative side information to be added, special video editing software is not needed at this time, the narrative side information can be embedded into a target media file or bit stream of the digital visual media under the condition that the original visual content of the digital visual media is not changed, and the narrative side information can be presented once or a plurality of times when the digital visual media is played through a player, so that the operation of a user is simplified, the richness of the digital visual media is increased, and the viewing effect of the digital visual media is improved.

Drawings

Fig. 1 is a schematic flow chart of an information processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an overall structure of an ISO-BMFF file according to an embodiment of the present application;

FIG. 3 is a detailed schematic diagram of a side metadata box according to an embodiment of the present application;

fig. 4 is a flow chart of another information processing method according to an embodiment of the present application;

FIG. 5 is a flowchart of another information processing method according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of an encoder according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of the structure of another encoder according to the embodiments of the present application;

fig. 8 is a schematic diagram of a specific hardware structure of an encoder according to an embodiment of the present application;

fig. 9 is a schematic diagram of a composition structure of a decoder according to an embodiment of the present application;

fig. 10 is a schematic diagram of a composition structure of another decoder according to an embodiment of the present application;

fig. 11 is a schematic diagram of a specific hardware structure of a decoder according to an embodiment of the present application;

fig. 12 is a schematic diagram illustrating a composition structure of a white-matter system according to an embodiment of the present application.

Detailed Description

For a more complete understanding of the features and technical content of the embodiments of the present application, reference should be made to the following detailed description of the embodiments of the present application, taken in conjunction with the accompanying drawings, which are for purposes of illustration only and not intended to limit the embodiments of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict. It should also be noted that the term "first\second\third" in reference to the embodiments of the present application is merely to distinguish similar objects and does not represent a specific ordering for the objects, it being understood that the "first\second\third" may be interchanged in a specific order or sequence, where allowed, to enable the embodiments of the present application described herein to be practiced in an order other than that illustrated or described herein.

In recent years, smartphones have become the most desirable electronic device for people due to their availability and availability. It is not only necessary but also fashionable that people own smart phones today. Thus, smartphones have a number of significant effects on the whole society and culture. Worldwide, changes in lifestyle of people have become a general trend, one of which is that users are taking pictures or videos using smartphones, thereby recording their daily activities. In other words, today's people feel the necessity to capture every moment in life. At present, a user can take photos or record videos of famous landmarks by using a smart phone, and can record own life by 'self-timer'. As social media applications have grown in popularity, people have begun to learn to communicate through pictures and videos in addition to direct face-to-face communication or through telephone communication. For example, users take a "self-shot" and send a picture immediately to friends to show what they are currently doing. Thus, combining pictures and video with text has become a way to express information and emotion.

A picture or video may capture visual information, but relying on these alone may not tell a complete story. People send supplementary words through digital visual media to provide background information or express and reflect emotions related to the subject in visual content. Technically, in today's computing and communication platforms, digital media and these complementary words are handled as separate entities. Once a picture or video clip is stored in its particular media format in the electronic device, all emotions and background words related to the visual content are ignored. Thus, these pictures and videos quickly become boring and gradually lose value and life in a shorter period of time.

In the related art, although one can add a narrative bypass in a video clip, it is currently necessary to edit the video, either by embedding the narrative bypass in text format into the video overlay or by mixing the narrative bypass in sound format into the audio track. This process typically involves the use of special video editing software, which is inconvenient for the user. Because the original visual content of the digital visual media changes each time a new narrative is added, multiple users cannot share and exchange emotion of the visual media content with the narrative by using the method, and the watching effect of the digital video is reduced.

The embodiment of the application provides an information processing method, which is basically characterized in that when being applied to an encoder, the method comprises the following steps: determining at least one narrative bystander information to be added; the at least one narrative bypass message is embedded in a target media file or bitstream of the digital visual media in a preset manner without altering the original visual content of the digital visual media. When applied to a decoder, the basic idea of the method is: analyzing the code stream to obtain at least one narrative bystander; the at least one narrative bystander information is optionally presented one or more times while the digital visual media is being played by the player. Therefore, for at least one piece of narrative side information to be added, special video editing software is not needed at this time, the narrative side information can be embedded into a target media file or bit stream of the digital visual media under the condition that the original visual content of the digital visual media is not changed, and the narrative side information can be presented once or a plurality of times when the digital visual media is played through a player, so that the operation of a user is simplified, the richness of the digital visual media is increased, and the viewing effect of the digital visual media is improved.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

In an embodiment of the present application, referring to fig. 1, a schematic flow chart of an information processing method provided in an embodiment of the present application is shown. As shown in fig. 1, the method may include:

s101: at least one narrative bystander information to be added is determined.

S102: the at least one narrative bypass message is embedded in a target media file or bitstream of the digital visual media in a preset manner without altering the original visual content of the digital visual media.

It should be noted that, the information processing method according to the embodiment of the present Application is applied to a description encoder (may be simply referred to as an "encoder"), and the encoder may be implemented in hardware, or may be implemented in software, for example, in an Application (APP). In addition, the information processing method of the embodiment of the application can also be applied to electronic equipment capable of recording and playing digital visual media. The electronic device may be a smart phone, a tablet computer, a notebook computer, a palm computer, a personal digital assistant (Personal Digital Assistant, PDA), a portable media player (Portable Media Player, PMP), a navigation device, a wearable device, etc., which are not particularly limited herein.

It should be further noted that, the narrative information (narrative) may also be referred to as "narrative information," "narrative information," or "bystander information," which may be recorded, shared, and exchanged as a complementary part of the digital visual medium, and may also express, share, and exchange the emotion of the visual theme of the digital visual medium by people; accordingly, there is a need in embodiments of the present application to add narrative bypass information to digital visual media to enhance the viewing experience of the digital visual media.

In an embodiment of the present application, the type of narrative bypass information may include at least one of: text type and audio type. Wherein, the description side information corresponding to the text type can be called as written description information (writing narrative) and also can be called as text description information; the corresponding narrative bypass information for the audio type may be referred to as speech narrative (speech narrative) or as audio narrative.

Additionally, the type of digital visual media may include at least one of: video, images, and groups of images. Here, the video is specifically referred to as a digital video, the images are specifically referred to as "main image", "still image" and "digital image", and the image group is composed of a series of images, that is, the image group may include at least two images.

In a specific embodiment, when the type of the current narrative bypass information is a text type, the method may further include:

creating a text data segment;

accordingly, for S102, embedding the narrative bypass information into the target media file or bitstream of the digital visual media in a preset manner may specifically include: the current narrative bystander information is embedded into a target media file or bit stream of the digital visual media in a text data segment manner.

That is, if the type of narrative bypass information is a text type, a text data segment may be created at this time to embed the narrative bypass information in the target media file or bitstream of the digital visual media in the form of the text data segment.

Further, if the type of narrative bystander information is a text type, but it needs to be embedded in the target media file or bitstream of the digital visual media in the form of audio clips (or "audio clips"); then, in some embodiments, the method may further comprise:

converting the current description side information into the description side information corresponding to the audio type, and creating an audio fragment;

Accordingly, for S102, embedding the narrative bypass information into the target media file or bitstream of the digital visual media in a preset manner may specifically include: the current narrative bystander information is embedded into a target media file or bit stream of the digital visual media in the form of an audio clip.

That is, if the type of the narrative bypass information is text type, an audio clip can be created at this time, and then the narrative bypass information is converted into the narrative bypass information corresponding to the audio type, and then the narrative bypass information is embedded into the target media file or bit stream of the digital visual media in the form of the audio clip.

In another specific embodiment, when the type of the current narrative bypass information is an audio type, the method may further include:

creating an audio clip;

That is, if the type of narrative bypass information is an audio type, an audio clip may be created at this time to embed the narrative bypass information in the target media file or bitstream of the digital visual media in the manner of an audio clip.

Further, if the type of narrative bystander is an audio type, but it needs to be embedded in a text data segment into a target media file or bitstream of digital visual media; then, in some embodiments, the method may further comprise:

converting the current description side information into description side information corresponding to the text type, and creating a text data segment;

That is, if the type of the narrative bypass information is an audio type, a text data segment may be created at this time, and then the narrative bypass information is converted into a narrative bypass information corresponding to the text type, and then embedded into a target media file or bitstream of the digital visual media in the form of the text data segment.

It should be noted that, the type of the narrative information is converted from text to audio, or from audio to text, and may be implemented by a speech synthesizer, a record-to-text assistant, a quick audio converter, etc. in the related art, which is not particularly limited herein.

In some embodiments, for a plurality of users of the digital visual media, the determining at least one narrative bypass information to be added may include:

creating narrative bypass information for at least one user of said digital visual media, obtaining said at least one narrative bypass information.

It should be noted that, the embodiments of the present application may create narrative bypass information for multiple users of the same digital visual media, while not changing the original visual content of the digital visual media, so that multiple users may share and exchange the narrative bypass information.

In some embodiments, for the embedding location of the narrative bypass information, the embedding the at least one narrative bypass information into the target media file or bitstream of the digital visual media in a preset manner may include:

and storing the at least one narrative bystander information at a target position of the digital visual media in a preset mode to generate the target media file or the bit stream.

The target position may be a start position or any other position. In a specific embodiment, the narrative information may be stored only at the beginning of the digital visual media, making it easier to add the narrative information and to apply and popularize.

In addition, since the type of the digital visual media can be an image or an image group or a video, the type of the added narrative bystander information is different in different situations. In some embodiments, the method may further comprise:

determining that the type of the at least one narrative bystander information is a text type and/or an audio type when the type of the digital visual media is an image or an image group;

when the type of the digital visual media is video, determining that the type of the at least one narrative bystander information is a text type.

That is, if the type of digital visual media is an image or group of images, the type of narrative bystanders to be added at this time may be of the text type, of the audio type, or even of both. If the type of digital visual media is video, then the type of narrative bystander information to be added at this time may be a text type. In special cases, if the type of digital visual media is video, but the type of narrative bystander information to be added is audio, at least two audio decoders are needed at this time, which not only increases the implementation complexity, but also limits the popularization of the application.

Further, where the digital visual medium is an image or group of images, the type of the side information recited at this time may both be present, and in some embodiments, the method may further comprise:

if the at least one type of the narrative bypass information comprises a text type and an audio type, the narrative bypass information corresponding to the audio type is determined to be stored behind the narrative bypass information corresponding to the text type.

That is, if the narrative bypass information includes text narrative and audio narrative, then the audio narrative will be stored after the text narrative when stored.

In the description side information, the text description information and the audio description information may be the same or different. And the text narrative is embedded in the target media file or bitstream of the digital visual media in a text-type handling manner, the audio narrative is embedded in the target media file or bitstream of the digital visual media in an audio-type handling manner, and the audio narrative is to follow the text narrative.

It will be appreciated that in embodiments of the present application, the narrative bypass message may include: the narrative content and registration information associated with the narrative content. Wherein the registration information may include at least one of: the name of the narrator, the date and time of creation, ownership information of the affiliated visual content. Here, the ownership information may be used to indicate whether the user owns the visual content of the digital visual media.

Further, in some embodiments, the embedding the at least one narrative bypass information into the target media file or bitstream of the digital visual media in a preset manner for S102 without changing the original visual content of the digital visual media may include:

combining the narrative bypass information comprising the narrative content and the registration information and writing the narrative bypass information into a preset data set;

embedding the preset data set into a target media file or bit stream of the digital visual media in a preset manner.

That is, embodiments of the present application may combine narrative bypass information including narrative content and associated registration information in a preset dataset and embed it into a target media file or bitstream of digital visual media without altering the original visual content of the digital visual media.

It should be further noted that the preset data set (data set) may include a plurality of narrative bypass messages. In some embodiments, the number of narrative bystander information is at least one, and at this time, the method may further include:

writing at least one of said narrative bystander information into a predetermined data set.

In a specific embodiment, the writing of at least one of the narrative bystander information into a preset data set may include:

registering at least one narrative entry based on a name, creation date and time of a narrative corresponding to the at least one narrative bystander information;

writing the at least one narrative bystander information into the preset data set based on the at least one narrative entry.

That is, the at least one narrative bystandstill information may be from a creator of the visual content (e.g., a photographer, viewer, etc.), for example, the narrative bystandstill information may be created by the photographer when capturing the visual content, or by the viewer during the playing of the visual content, or by an organization that owns the visual content and wishes to add comments during editing. In other words, the user may generate and add narrative bypass information during video recording, editing, and presentation. In addition, for each narrative entry, the corresponding narrative name may be registered into a specific data structure to store the narrative bystander information into a predetermined data set.

Further, the embodiment of the application can also add new narrative side information. In some embodiments, the method may further comprise:

Determining new narrative side information to be added;

the new narrative bypass message is stored after the existing narrative bypass message.

It should be noted that the embodiments of the present application also allow the user to add new narrative bypass information after the existing narrative bypass information. When this occurs, new narrative bypass information will be added after the existing narrative bypass information.

Further, for at least one narrative bypass information to be added, it may depend on the general data structure of the digital visual media, and also on the international organization for standardization-based media file format (International Organization for Standardization-Base Media File Format, ISO-BMFF) data structure. That is, in some embodiments, for S101, the determining at least one narrative bystander information to be added may include:

determining the at least one narrative bypass message conforming to a preset data structure; the preset data structure at least comprises one of the following: a generic data structure and an ISO-BMFF data structure.

In one possible implementation, the digital visual media and narrative bypass information conform to a common data structure. Here, the information processing method of the embodiment of the present application may be applied to different digital media formats, and the specific implementation thereof depends on the syntax structure of the digital media content. Referring to table 1, a syntax example of a generic data structure provided in an embodiment of the present application is shown. In table 1, the generic data structure illustrates typical data set syntax structures and key data components that enable technical features of the present application. Wherein the first column is an example of a syntax element (Example of narrative data set syntax) for recording a preset data set describing the bystander information, and the second column is a Descriptor (Descriptor) corresponding to the syntax element.

TABLE 1

* Note that: ANL is equal to narrative_author_name_length in subsections.

In Table 1, the conventions used are as follows:

f (n): a bit string representing a fixed pattern, using n bits written from left to right (left to right);

b (8): representing a subsection, which may have any bit string pattern (8 bits);

u (n): an unsigned integer using n bits is represented.

In addition, in table 1, the syntax element may include: the audio_data_start_code, number_of_ narratives, text _encoding_standard_id, audio_name_ length, narrator _ name, narrative _creation_ date, narrative _creation_time, visual_content_ownership_ flag, narrative _data_type, text_audio_data_ length, narrative _audio_code_id, audio_audio_data_length.

For these syntax elements, the semantics for each syntax element will be described below:

the narrative_data_start_code is used to specify a four-byte bit string with a fixed pattern that describes the start position of the side information in the bit stream. It may generally consist of a three subsection start code prefix having a unique sequence followed by a byte dedicated to specifying narrative bystander information;

number_of_narratics specifies the total number of narrative whiteside information items. When a new narrative bypass message is added, the new narrative bypass message should be added to the bottom of the list immediately after all previous narrative bypass messages and the number_of_narratics value added by one.

text encoding standard id specifies the text encoding standard (Text coding standards) for the name of the narrative and the text narrative in the narrative bystander data segment. Wherein, table 2 shows an example of code values applicable to the general text coding standard provided in the embodiment of the application. Here, the first column is a general text coding standard (Text coding standards), and the second column is a code value example (text_encoding_standard_id value) of the text coding standard. In other words, in the embodiment of the present application, the text data segment may be encoded using a preset text encoding standard, where the preset text encoding standard may support any of the text encoding standards, such as UTF-8, UTF-16, GB2312-80, GBK, big 5, and the like.

TABLE 2

The native_name_length is used to specify the length of the native_name in subsections.

The narrative_name is used to specify the name of the bystander, where the writer may be an individual or a corporate organization.

The narrative_creation_date is used to specify the date on which the narrative bypass information is added. Any standard expression of date may be used herein. For example, the date may be represented in a digital format that uses a 4-digit number to represent the year, then a 2-digit number to represent the month, then a 2-digit number to represent the day. For example, 2019, 9, 21, 20190921, 2019, 10, 30, 20191030. In this expression, one byte is used for every two digits.

The narrative_creation_time is used to specify the time at which the narrative bypass information is added. Any standard expression of time may be used herein. For example, time may be expressed as: hh mm ss.TZ, where hh (hours), mm (minutes), ss (seconds) use one subsection per digit, and TZ (time zone) uses eight digits of code.

visual_content_ownership_flag equal to 1 indicates that the author of this narrative entry has visual content. visual_content_ownership_flag equal to 0 indicates that the author of this narrative entry does not possess visual content.

narrative_data_type is used to specify the type of narrative (i.e., data format) and a value equal to 0 indicates that the narrative is in text format (i.e., text type). A narrative_data_type equal to 1 indicates that the narrative bypass information is in audio format (i.e., audio type). narrative_data_type equal to 2 means that the narrative bystandstill information has both text format and audio format (i.e., both text type and audio type). If the original visual content is video, then the narrative_data_type can only be equal to 0.

text_narrative_data_length specifies the data length of the narrative bypass information of the text format in units of subsections, with a default value of 0.

The narrative_audio_codec_id is used to specify an audio codec used when encoding audio narration information. Wherein table 3 shows an example of code values applicable to a generic audio codec provided in an embodiment of the present application. Here, the first column is an audio codec (audio codec), and the second column is a code value example (audio_audio_codec_id value) of the audio codec. In other words, in the embodiment of the present application, the audio clip may be encoded using a preset audio encoding standard, where the preset audio encoding standard may support any of the audio encoding standards, such as AVS audio, MP3, AAC, and WAV.

TABLE 3 Table 3

audio codec	narrative_audio_codec_id value
MP3	0
AAC	1
AVS-audio	2
WAV	3
…(Reserved for any other audio codec)	…

audio_audio_data_length is used to specify the data length of the narrative bypass information of the audio format, and the default value is 0 in bytes.

text_narrative_data is used to represent narrative bypass information in a substantially text format.

audio_narrative_data is used to represent narrative bypass information that is actually in audio format.

In another possible embodiment, the digital visual media and narrative bypass information conform to an ISO-BMFF data structure. Here, ISO-BMFF has been widely used in the industry as a container format for digital visual media such as video, still images, and groups of images. For example, the most popular video stream and storage format today is the MP4 format, which is fully compliant with the ISO-BMFF data structure. In the present embodiment, the narrative data structure is described as being suitable for use with the original digital visual media encapsulated by the ISO-BMFF file format. This data structure is fully compliant with the metadata format in the ISO-BMFF, which may be embedded within a 'File' box at the 'File' level or a 'moov' box at the 'movie' level in the ISO-BMFF File format. In order to facilitate promotion and application of the narrative bypass information described in the embodiments of the present application, the narrative bypass information application format described in table 1 organizes and arranges related macro information in two related data layers, and such a data structure greatly facilitates embedding, editing and playing functions of bypass content on software.

Fig. 2 shows an overall structure diagram of an ISO-BMFF file provided in an embodiment of the present application. As shown in fig. 2, a meta-data segment of the suggested narrative bypass information is included. In FIG. 2, a standard ISO-BMFF file includes: 'ftyp', 'moov', 'trak', 'mdat' and 'data' boxes. Wherein the 'ftyp' box comprises conventional information about the media file, the 'moov' box comprises a 'trak' box of all meta-information about the original visual content of the digital visual media, and the 'mdat' box comprises the original visual content of the entire digital visual media. In this figure, when a new narrative bypass message is inserted, the narrative bypass messages suggested in Table 1 are contained in the metadata box 'meta (for narration)' for narrative bypass and the narrative bypass box 'navigation box'. The narrative bypass message here does not include the actual narrative bypass content. For example, real text content (text_narrative_data) or audio content (audio_narrative_data) representing narrative content is stored in a narrative bystander section in the 'mdat' box. As shown in fig. 2, this data segment is immediately following the original visual data segment. As described above, this 'meta (for narration)' cassette can also be placed in an identical 'moov' cassette.

In some embodiments, the ISO-BMFF data structure includes at least a narrative bypass metadata box, and the narrative bypass metadata box may include a narrative bypass metadata process box and a narrative bypass application box;

accordingly, the method may further comprise:

processing metadata of the current narrative side information through the narrative side metadata processing box;

by means of the narrative bypass application box, at least one of the following narrative information is described: the start position of the current narrative bypass information, the data length of the current narrative bypass information, and the total number of narrative bypass information.

Further, in some embodiments, the narrative bypass application box may include a narrative bypass description box, and the method may further include:

at least one of the following description information is described by the description side description box: text encoding criteria, narrative name, date of creation, time of creation, ownership flag of the affiliated visual content, type of narrative bystander, encoding criteria of narrative bystander, and text length of narrative bystander.

That is, the paralytic metadata box is described as a meta-data box, shown as "meta (for narration)" in fig. 2, which may be simply referred to as "meta". In the narrative bypass metadata box, the narrative bypass metadata process box is a meta-data handler box, which may be referred to simply as "hdlr". The bystander application box is described as a narration application box and may be referred to simply as "napp". The narrative bypass description box is a narrative description box, which may be referred to simply as "nrtd".

Further, for narrative bypass metadata boxes, in some embodiments, the method may further comprise:

if the digital visual media does not have a description white-out metadata box at the file level, creating the description white-out metadata box and describing the at least one description white-out information through the description white-out metadata box;

if the digital visual media has a narrative bypass metadata box at the file level, creating the narrative bypass metadata box in a meco container box and describing the at least one narrative bypass information by the narrative bypass metadata box.

Illustratively, fig. 3 shows a detailed structural schematic diagram of a metadata box provided in an embodiment of the present application, that is, shows a detailed structural example of the 'meta (for narration)' box when the narrative meta-data box is placed at the file level. In addition, this structure is equally applicable when the narrative meta-data box is placed within the 'moov' box, and embodiments of the present application are not particularly limited.

In fig. 3, the narrative bypass metadata box (represented by meta) shown in (3 a) may include a narrative bypass metadata process box (represented by hdlr) and a narrative bypass application box (represented by napp), wherein the narrative bypass application box may in turn include a narrative bypass description box (represented by nrtd). (3b) The illustrated narrative metadata box (represented by meta) is then included in a meco container (represented by 'meco' container) box, and the metadata box (represented by meta) may in turn include a narrative metadata process box (represented by hdlr) and a narrative notes application box (represented by napp), wherein the narrative notes application box may in turn include a narrative notes description box (represented by nrtd). Specifically, if the digital visual media does not have a metadata box at the file level, such as a video file or a group of image files, metadata for narrative bystanding information may be added to the 'meta' box as shown in fig. 3 (3 a). If the digital visual media already has a metadata box at the file level, e.g. the digital visual media is a picture, a new 'meta (for narration)' box will be added to the meco container box according to ISO-BMFF requirements, this time specifically as the 'meta' box shown in fig. 3 (3 b). Here, all the data boxes contained in the narrative metadata box 'meta (for narration)' are specially defined for the digital visual media narrative notes application format to carry the narrative notes necessary to achieve the above functions. The detailed syntax and semantics of these boxes are described in detail below.

(1) A white-out metadata box (Narrative meta-data box) is described, and its syntax elements are shown in table 4 below.

TABLE 4 Table 4

For the semantics of the syntax elements in table 4,

annotation: the metadata structure can be added at the file level or in a 'moov' box at the Movie level;

box_size represents the total size of the box in subsections;

box_type is set to 'meta',4 lowercase characters, indicating that this is a narrative bypass metadata box;

narrative_metadata_handler_box (): the box structure contained in the box is defined by the handler_type as described below, as shown in table 5 below. Here, the ISO-BMFF requires that the metadata processing cartridge be contained;

the navigation_application_box (): a main box representing a narrative application format is included in the narrative bypass metadata box. A detailed description thereof is shown in table 6 below.

(2) The syntax elements of the bystander metadata processing box (Narrative metadata handler box) are shown in table 5 below.

TABLE 5

For the semantics of the syntax elements in table 5,

box_size is the total size of the box, in subsections;

box_type is designated as 'hdlr',4 lowercase to indicate that this is for the narrative bypass metadata processing cartridge;

the handler_type is designated as 'napp',4 lowercase characters, representing that the metadata processing cartridge 'napp' will be used to define the media narrative application format;

version, flags, predefined, reserved, and name fields may be set according to ISO-BMFF requirements.

(3) The bystander application box (Narration application box) is described with syntax elements as shown in table 6 below.

TABLE 6

For the semantics of the syntax elements in table 6,

annotation: this is the main box that recites the application. The box is defined as a "complete box" so that future versions can be updated.

box_size represents the total size of the box in subsections;

box_type is designated as 'napp',4 lowercase characters, for indicating the metadata box format defined herein for narrative applications;

version, flags and reserved for future updates;

media_type represents the original media format of digital visual media. Example definitions are shown below (note: ISO-BMFF may also be used herein for media types defined for still images and groups of pictures)

i. Video (video): "hide" (4 lowercase characters);

still image (still image): "imag" (4 lowercase characters);

picture group (picture group): "picg" (4 lowercase characters).

number of narratives shows the total number of narrative side entries. When new narrative bypass information is added, it should be added to the bottom of the bypass list, immediately after all previous narrative bypass information, and the number of narratives value should be incremented by one;

The narrative_data_start_location indicates the starting position of the current narrative bypass information in the mdat box associated with the original digital visual media file, and takes subsections as units;

the narrative_data_length represents the data length of the current narration side information in the mdat box and takes subsections as a unit;

a narrative description (): this is a box that describes information for each narrative bystander entry, as described below.

(4) The description of the bystander box (Narrative description box) is described with its syntax elements as shown in table 7 below.

TABLE 7

For the semantics of the syntax elements in table 7,

annotation: this is the main box describing the narrative bypass information. The box is defined as a "complete box" so that future versions can be updated.

box_size represents the total size of the box in subsections;

box_type is designated as 'nrtd',4 lowercase characters to indicate that this is a narrative bypass description box 'nrtd';

version, flags and reserved for future updates;

text_encoding_standard_type is used to describe the text encoding criteria of the narrative name. The definition is the same as "text_encoding_standard_id" in tables 1 and 2. If the narrative is in text format, the text encoding criteria specified herein are equally applicable to encoding the narrative content;

The name_name_length is used for specifying the length of the name_name in subsections;

the narrator_name is used to specify the name of the narrative bypass author, where the author may be an individual or organization. Note that n in table 7 is equal to the length of the rotator_name_length;

the narrative_creation_date is used to specify the date of creation of the narrative bypass information. The definition is the same as in table 1;

the narrative_creation_time is used to specify the creation time of the narrative bypass information. The definition is the same as in table 1;

media_ownership_flag equal to 1 indicates that the paralyzer has visual contents;

media_ownership_flag equal to 0 indicates that the paralyzer does not own visual content;

the narrative_data_type is used for designating a data format of the narration information, and the equality of 0 indicates that the narration information is in text format, the narrative_data_type equality of 1 indicates that the narration information is in audio format, and the narrative_data_type equality of 2 indicates that the narration information is in text format and audio format. If the original media type of the digital visual media is video, i.e., media_type= 'video', then the narrative_data_type can only be 0;

audio_encoding_type is used to specify the coding standard for narrative bypass information for an audio format. The coding standard in table 3 may be followed here;

text_narrative_data_length represents the length of a text narration portion in narration bypass information having a text format or text and audio formats. When the narrative side information has both text and audio formats, the text narrative portion will be stored first in 'mdat' and then the audio narrative portion. The length of the audio narration portion is equal to narration_data_length minus text_narration_data_length in the narration application box.

Thus, in embodiments of the present application, at least one narrative bypass message may be embedded in a preset manner into a target media file or bitstream of digital visual media based on a generic data structure or an ISO-BMFF data structure. Specifically, in one possible implementation manner, embedding the at least one narrative bystander information into the bit stream in a preset manner may include: encoding the at least one narrative bystander information to generate a first bit stream; embedding the first bit stream into the bit stream.

It should be noted that the bit stream herein refers to a bit stream of streaming media. In this way, the encoder may transmit the first bitstream embedded bitstream to a decoder for subsequent decoders to obtain at least one narrative bypass message by parsing the bitstream.

In another possible implementation manner, the at least one narrative bystander is embedded into a target media file of the digital visual media in a preset manner, and the method can further comprise: encoding the target media file embedded with the at least one narrative side information to generate a new target media file; the new target media file is transmitted to the decoder. Here, for a new target media file, the encoder may be transmitted to the decoder in streaming or file-wise, so that the subsequent decoder can obtain at least one narrative bystandstill information by parsing the code stream.

The embodiment provides an information processing method which is applied to an encoder. By determining at least one narrative bystander information to be added; the at least one narrative bypass message is embedded in a target media file or bitstream of the digital visual media in a preset manner without altering the original visual content of the digital visual media. Thus, for at least one narrative side information to be added, special video editing software is not needed at this time, the narrative side information can be embedded into a target media file or bit stream of the digital visual media under the condition that the original visual content of the digital visual media is not changed, and the narrative side information can be selectively presented once or a plurality of times when the digital visual media is played through a player, so that the operation of a user is simplified, the richness of the digital visual media is increased, and the viewing effect of the digital visual media is also improved.

In another embodiment of the present application, referring to fig. 4, a schematic flow chart of another information processing method provided in the embodiment of the present application is shown. As shown in fig. 4, the method may include:

s401: and analyzing the code stream to obtain at least one piece of narrative bystander information.

S402: the at least one narrative bystander information is optionally presented one or more times while the digital visual media is being played by the player.

It should be noted that the information processing method according to the embodiment of the present Application is applied to a description decoder (may be simply referred to as a "decoder"), and the decoder may be implemented in hardware or in software, for example, in Application (APP). In addition, the information processing method of the embodiment of the application can also be applied to electronic equipment capable of recording and playing digital videos. Here, the electronic device may be a smart phone, a tablet computer, a notebook computer, a palm computer, a personal digital assistant (Personal Digital Assistant, PDA), a portable media player (Portable Media Player, PMP), a navigation device, a wearable device, or the like, which is not limited in any way.

It should be further noted that, in the embodiment of the present application, the decoder may be integrated with a playing function, and the player may be integrated with a decoding function, so the decoder described in the embodiment of the present application may also be referred to as a player. In other words, the execution subject of the information processing method of the embodiment of the present application may be a decoder, a player, or even an electronic device, which is not limited in any way.

It should be further noted that, when playing the digital visual media through the player, if the user wishes to present the at least one narrative information, the user may choose to present the at least one narrative information once, or may choose to present the at least one narrative information multiple times. In addition, if the user does not wish to present the at least one narrative bypass message, then only the digital visual media may be played at this time, and the embodiments of the present application are not limited in any way.

In an embodiment of the present application, the type of narrative bypass information may include at least one of: text type and audio type. The description side information corresponding to the text type may be referred to as text description information, and the description side information corresponding to the audio type may be referred to as audio description information.

In the embodiment of the application, the narrative bystander information can be used as a supplementary part of the digital visual media for recording, sharing and exchanging, and can also express, share and exchange the emotion of the visual theme of the digital visual media; accordingly, there is a need in embodiments of the present application for adding narrative bypass information to a target media file or bitstream of digital visual media in an encoder so that a decoder can enhance the viewing experience of the digital visual media when it is played.

In some embodiments, for S401, the parsing the code stream to obtain at least one narrative bystander information may include:

analyzing the code stream to obtain a target media file or bit stream sent by an encoder;

the at least one narrative bystander information is obtained from the target media file or the bit stream.

It should be noted that, after the encoder embeds at least one narrative bystander information into a target media file or bit stream of the digital visual media, the decoder can obtain the target media file or bit stream by parsing the bit stream; the at least one narrative bypass message is then extracted from the target media file or bitstream.

In some embodiments, narrative bypass information may be created for multiple users of digital visual media. Thus, for S401, the parsing the code stream to obtain at least one narrative bypass message may include:

and analyzing the code stream to acquire the at least one narrative bystander information created by at least one user of the digital visual media.

That is, embodiments of the present application may create narrative bypass information for multiple users of the same digital visual media such that the multiple users may share and exchange the narrative bypass information.

In some embodiments, the encoder may also write the narrative bypass information into a preset data set. Thus, at the decoder side, for S401, the parsing the code stream to obtain at least one narrative bypass message may include:

analyzing the code stream to obtain a preset data set;

and acquiring the at least one narrative bystander from the preset data set.

It should be noted that, in the embodiment of the present application, the narrative bypass information including the narrative content and the associated registration information may be combined in the preset data set, so that the at least one narrative bypass information may be obtained by parsing the code stream. That is, the narrative bypass information may include: the narrative content and registration information associated with the narrative content. Wherein the registration information may include at least one of: the name of the narrator, the date and time of creation, ownership information of the affiliated visual content. Here, the ownership information may be used to indicate whether the user owns the visual content of the digital visual media.

It should be further noted that the preset data set may include a plurality of narrative bypass messages. Such narrative bypass information may come from the creator of the visual content (e.g., photographer, viewer, etc.), for example, the narrative bypass information may be created by the photographer when capturing the visual content, or by the viewer during the playing of the visual content, or by an organization that owns the visual content and wishes to add comments during editing. In other words, the user may generate and add narrative bypass information during video recording, editing, and presentation. In addition, for each narrative entry, the corresponding narrative name may be registered into a specific data structure to store the narrative bystander information in a predetermined data set.

It should also be noted that, in the embodiment of the present application, new narrative bypass information may also be added. In some embodiments, the method may further comprise:

and when the digital visual media is played through the player, determining new description side information to be added, and embedding the new description side information into a target media file of the digital visual media.

That is, during playback of digital visual media through the player, a narrative editor and encoder may also be included in the player to allow the user to add new narrative bypass information to the target media file of the digital visual media.

In some embodiments, for S402, the presenting the at least one narrative bypass message one or more times selectable while playing the digital visual media through the player may include:

the at least one narrative bystandstill information is presented, optionally one or more times, at a start time of playing the digital visual media by the player.

That is, at the beginning of playing the digital visual media, the at least one narrative information may be played once or repeated multiple times. Therefore, when the narrative white-spot information is added, presentation time does not need to be added, so that the information processing method is easier to realize, and the narrative application and popularization are easier.

and after decoding the current narrative side information in a text data segment mode, displaying the current narrative side information in a text format through the player.

That is, if the type of the narrative bypass information is a text type, the narrative bypass information may be decoded in text data segments at this time and then displayed in text format by the player.

Further, if the type of narrative bypass information is text type, but the user wishes the player to play in audio format; then, in some embodiments, the method may further comprise:

and after decoding the current description side information in a text data segment mode, converting the decoded description side information into an audio format and playing the audio format through the player.

That is, if the type of the narrative bypass information is a text type, after decoding the narrative bypass information in the form of text data segments, the decoded narrative bypass information may be converted into an audio format and played by a player.

and after decoding the current narration side information in an audio clip mode, playing the current narration side information in an audio format through the player.

That is, if the type of the narrative bypass information is an audio type, the narrative bypass information may be decoded in the form of an audio clip at this time and then played in an audio format by a player.

Further, if the type of narrative information is an audio type, but the user wishes the player to display in text format; then, in some embodiments, the method may further comprise:

and after the current description side information is decoded in an audio clip mode, converting the decoded description side information into a text format and displaying the text format through the player.

That is, if the type of the narrative bypass information is an audio type, after decoding the narrative bypass information in the form of an audio clip, the decoded narrative bypass information may be converted into a text format and displayed by the player.

It should be noted that, the conversion of the narrative information from text to audio, or from audio to text, may be implemented by a speech synthesizer, a voice recording/text assistant, a quick audio converter, etc. in the related art, which is not limited herein.

Further, in some embodiments, the method may further comprise: if the type of the digital visual media is video and the type of the narrative bystander information is audio, determining that at least two audio decoders are arranged in the decoders; wherein the at least two audio decoders include a first audio decoder for decoding the digital visual media and playing, and a second audio decoder for decoding the narrative bypass information and playing.

That is, if the type of digital visual media is an image or group of images, the type of narrative bystanders to be added at this time may be of the text type, of the audio type, or even of both. If the type of digital visual media is video, then the type of narrative bystander information to be added at this time may be a text type. In special cases, if the type of digital visual media is video, but the type of narrative bystander information to be added is audio, at least two audio decoders are needed at this time, which not only increases the implementation complexity, but also limits the popularization of the application. Thus, if the type of digital visual media is video, the type of narrative bypass information to be added at this time is typically of the text type.

It will be appreciated that since the at least one narrative is not presented during the digital visual media playback, either presentation or presentation may be selected. Thus, in some embodiments, as shown in fig. 5, after S401, the method may further include:

s501: determining whether to present the at least one narrative bystandstill information.

S502: if the at least one narrative bystanding message is not presented, only the digital visual media is played through the player.

S503: if the at least one narrative bypass message is presented, the at least one narrative bypass message is presented one or more times selectable while the digital visual media is being played through the player.

It should be noted that, for S501, it may be determined whether to present the at least one narrative bypass message by using a narrative play switch. In particular, the player/decoder may include a narrative play switch that may be used to turn on or off presentation of narrative bypass information.

Thus, for narrative play switches, in some embodiments, the method may further comprise:

playing the at least one narrative side message with the digital visual media by the player when the narrative play switch is turned on;

when the narration playing switch is turned off, only the digital visual media is played through the player.

It should be noted that the user may choose whether to watch or listen to the narrative bypass information while playing the digital visual media. When the user selects that the narration side information does not need to be presented, the narration playing switch can be turned off, and the player only plays the digital visual media at this time, namely, the step of S502 is executed; when the user selects the presence of the narrative side information, the narrative play switch may be turned on, at which time the player may play the narrative side information and the digital visual media simultaneously, i.e., perform step S503.

That is, when the narrative play switch is turned on, in one particular embodiment, the at least one narrative bypass message is presented one or more times selectable when the digital visual media is played by the player. Wherein the user may choose to present the at least one narrative information once or may choose to repeatedly present the at least one narrative information a plurality of times.

Further, when the player plays the digital visual media, different presentation modes can be adopted for the narrative side information. In one possible implementation, when the narration play switch is turned on, the playing, by the player, of the at least one narration bystandstill information together with the digital visual media may include:

if the type of the narrative side information is text type, displaying the narrative side information as a subtitle covered on the visual content of the digital visual media through the player, or displaying the narrative side information in a single text window through the player, or converting the narrative side information into an audio format through the player for playing;

and if the type of the narrative side information is an audio type, playing the narrative side information by using the player in an audio format as an independent audio signal, or converting the narrative side information into a text format by using the player for displaying.

In another possible embodiment, when the narration play switch is turned on, the playing, by the player, of the at least one narration bystander together with the digital visual media may include:

playing, by the player, the digital visual media in a frozen mode or a looped mode in a background while presenting the at least one narrative bystandstill information in the foreground.

In yet another possible embodiment, when the narration play switch is turned on, the method may further include:

if the type of the at least one piece of narrative side information comprises a text type and an audio type, the narrative side information corresponding to the text type and the narrative side information corresponding to the audio type can be simultaneously presented or presented separately.

It should be noted that the player may include a narrative play switch that turns on or off the narrative bypass information. When the narrative play switch is turned off, the digital visual media will be played (or "played back") without any modification. When the narration play switch is turned on, the decoder/player may select the exact narration presentation format. For example, for text narrative information, it may be displayed as subtitles overlaid on visual content, or in a separate space (e.g. in a separate text window), or even played as a separate audio signal after being synthesized by the decoder/player. On the other hand, the speech narrative information for an image or group of images may be played as a separate audio signal or may be displayed in text format after being transcribed by the player. For a group of images, playback of the audio narrative is independent of the image playback format specified in the original visual content. When a narrative side message contains both a text narrative portion and an audio narrative portion in a predetermined data set, the two portions may be presented simultaneously (e.g., text display in the text narrative portion and audio playback in the audio narrative portion) or separately. In the embodiment of the present application, this is not particularly limited.

It should also be noted that, in some embodiments, the method may further include: in the player, a drop-down menu is created. The drop-down menu is used for configuring presentation format options when the narrative side information is presented.

Thus, the embodiment of the application can also construct a drop-down setting menu (called a drop-down menu for short) in the player, wherein the drop-down menu comprises options for configuring and describing the presentation format of the side information. In this way, after receiving the operation instruction of the drop-down menu, the player can determine the selected presentation format, and then present the narrative bystander information according to the selected presentation format.

It should also be noted that, in some embodiments, the method may further include: creating a narrative side white list in the player; wherein the narrative bypass list includes the at least one narrative bypass message, and the narrative bypass list is used for a user to view the narrative bypass message to be selected.

That is, embodiments of the present application may also view narrative bypass information. At this point, a narrative whitelist may be created to allow the user to selectively view narrative whiteinformation.

It should be further noted that, in some embodiments, when the narrative play switch is turned on, the method may further include: and when the at least one narrative side information is voice narrative information, playing the at least one narrative side information through the player, and presenting the visual content of the digital visual media and the registration information associated with the at least one narrative side information through the player. Here, the registration information may be a name of the narrator and a creation date/time at which the narrative was created in text form, and the like.

In particular, for a player, the viewer may also configure the player to display text content as overlaid or in a separate text window. When a separate text window is used, if the visual content is video or a set of images, the original digital visual media can be played in a loop mode in the background while one or more narrative notes are played at a time. In this case, the user can scroll through the narrative in this separate text window. In addition, the player may allow the user to configure the speed of scrolling text in the case where narrative bystandstill information is overlaid as subtitles over visual content. The player may also display visual content as well as the name of the narrator and the date/time of creation of the added narrative in text form as the voice narrative is played.

Further, for the at least one narrative bypass information, it may depend on a generic data structure of the digital visual media, and also on an ISO-BMFF data structure. That is, in some embodiments, for S401, the parsing the code stream to obtain at least one narrative bypass message may include:

analyzing the code stream to obtain the at least one narrative bystander information which accords with a preset data structure; the preset data structure at least comprises one of the following: a generic data structure and an ISO-BMFF data structure.

In one possible implementation, the digital visual media and narrative bypass information conform to a common data structure. Here, the information processing method of the embodiment of the present application may be applied to different digital media formats, and the specific implementation thereof depends on the syntax structure of the digital media content, as shown in table 1 above. In table 1, the syntax elements may include: the audio_data_start_code, number_of_audio_ occurrence, narrative _entry_time, number_of_ narratives, text _encoding_standard_id, audio_author_name_ length, narrative _author_ name, narrative _creation_date, visual_content_ownership_ flag, narrative _data_ type, narrative _audio_code_id, audio_data_length.

The syntax element text_encoding_standard_id is mainly used to indicate the text decoding standard used to describe the side information. In this embodiment, as shown in table 2, the text data segment may be decoded using a preset text decoding standard, where the preset text decoding standard may support any of the text decoding standards, such as UTF-8, UTF-16, GB2312-80, GBK, big 5, and the like.

For the syntax element narrative_audio_codec_id, it is mainly used to indicate the audio decoding standard used to describe the side information. In the embodiment of the present application, as shown in table 3, the audio clip may be decoded using a preset audio decoding standard, where the preset audio decoding standard may support any standard of audio decoding standards, such as AVS audio, MP3, AAC, and WAV.

In another possible embodiment, the ISO-BMFF data structure may include at least a narrative bypass metadata box, and the narrative bypass metadata box may include a narrative bypass metadata process box and a narrative bypass application box;

correspondingly, the parsing the code stream to obtain at least one narrative bystander information may include:

Decoding to obtain metadata of the current narrative side information through the narrative side metadata processing box;

decoding, by the narrative bypass application box, at least one of the following narrative information: the start position of the current narrative bypass information, the data length of the current narrative bypass information, and the total number of narrative bypass information.

decoding, by the narrative bypass description box, at least one of the following narrative information: text encoding criteria, narrative name, date of creation, time of creation, ownership flag of the affiliated visual content, type of narrative bystander, encoding criteria of narrative bystander, and text length of narrative bystander.

if the digital visual media does not have the description white-out metadata box at the file level, acquiring the description white-out metadata box, and decoding the description white-out metadata box to acquire the at least one description white-out information;

if the digital visual media has a narrative white-spot metadata box at the file level, the narrative white-spot metadata box is obtained from the meco container box, and the at least one narrative white-spot information is obtained through decoding of the narrative white-spot metadata box.

Illustratively, taking fig. 3 as an example, a detailed structural example of a 'meta (for narration)' box is shown when a narrative meta-data box can be placed at the file level. In addition, this structure is also applicable when the narrative meta-data box is placed within the 'moov' box. Specifically, if the digital visual media does not have a metadata box at the file level, such as a video file or a group of image files, metadata for narrative bystanding information may be added to the 'meta' box as shown in fig. 3 (3 a). If the digital visual media already has a metadata box at the file level, e.g. the digital visual media is a picture, a new 'meta (for narration)' box will be added to the meco container box according to ISO-BMFF requirements, this time specifically as the 'meta' box shown in fig. 3 (3 b). Here, all the data boxes contained in the narrative metadata box 'meta (for narration)' are specially defined for the digital visual media narrative notes application format to carry the narrative notes necessary to achieve the above functions. In the embodiments of the present application, detailed grammatical and semantic details of these boxes are specifically shown in tables 4, 5, 6 and 7 above.

Briefly, to enhance the viewing experience of digital visual media, embodiments of the present application provide a method by which narrative bypass information may be added to a digital video, image, or group of images. Here, the narrative bypass information may be in text format or audio format, or both, which may be written to a current media file or bitstream along with the original digital visual media and may be displayed or played along with the digital visual media. In particular, in embodiments of the present application, the method may add narrative bypass information to a digital image or video and save it with the original visual content in a format that facilitates communication and storage. Moreover, the method allows users to express, share and communicate their emotion to a visual theme through narrative bystanding information embedded into a target media file or bit stream, which can enhance the viewing experience of digital video and enhance the audience engagement of several generations.

In addition, for digital images, current metadata, such as EXIF [1], IPTC [2], are used only to describe and provide technical and administrative information about the image, such as technical features of the capturing process, or ownership and rights of the image, etc., and the techniques described in embodiments of the present application are used exclusively to record, share and exchange emotional reviews between the creator and viewer of the digital visual media. In particular, the technique also allows multiple users to share and exchange narratives while allowing users to record narrative information without changing the original visual content, and the user may also choose to view or listen to the narrative information while playing the digital video.

The embodiment provides an information processing method which is applied to a decoder. At least one narrative bystander information is obtained by analyzing the code stream; the at least one narrative bystander information is optionally presented one or more times while the digital visual media is being played by the player. Thus, for at least one narrative side information to be added, special video editing software is not needed at this time, the narrative side information can be embedded into a target media file or bit stream of the digital visual media under the condition that the original visual content of the digital visual media is not changed, and the narrative side information can be selectively presented once or a plurality of times when the digital visual media is played through a player, so that the operation of a user is simplified, the richness of the digital visual media is increased, and the viewing effect of the digital visual media is also improved.

In yet another embodiment of the present application, referring to fig. 6, a schematic diagram of the composition structure of an encoder 50 provided in the embodiment of the present application is shown, based on the same inventive concepts as the previous embodiments. As shown in fig. 6, the encoder 50 may include: a determining unit 501 and an encoding unit 502; wherein,

A determining unit 501 configured to determine at least one narrative bystander information to be added;

an encoding unit 502 configured to embed the at least one narrative bystander information in a target media file or bitstream of the digital visual media in a preset manner without changing the original visual content of the digital visual media.

In some embodiments, the encoding unit 502 is further configured to store the at least one narrative bystander information at a starting position of the digital visual medium in a preset manner, and generate the target media file or the bit stream.

In some embodiments, the determining unit 501 is further configured to create narrative bypass information for at least one user of the digital visual media, and obtain the at least one narrative bypass information.

In some embodiments, the type of narrative bystander information includes at least one of: text type and audio type; the types of digital visual media include at least one of: video, images, and image groups, the image groups comprising at least two images.

In some embodiments, referring to fig. 7, the encoder 50 may further include a creating unit 503 configured to create a text data segment when the type of the current narration bystanding information is a text type;

The encoding unit 502 is further configured to embed the current narrative bystander information in a text data segment into a target media file or bitstream of the digital visual media.

In some embodiments, the creating unit 503 is further configured to create an audio clip when the type of the current narrative bye information is an audio type;

the encoding unit 502 is further configured to embed the current narrative bystander information in the form of audio clips into a target media file or bitstream of the digital visual media.

In some embodiments, the creating unit 503 is further configured to, when the type of the current narrative bypass information is a text type, convert the current narrative bypass information into narrative bypass information corresponding to an audio type, and create an audio clip;

In some embodiments, the creating unit 503 is further configured to, when the type of the current narrative bypass information is an audio type, convert the current narrative bypass information into narrative bypass information corresponding to a text type, and create a text data segment;

In some embodiments, the determining unit 501 is further configured to determine that the type of the at least one narrative bystander information is a text type and/or an audio type when the type of the digital visual media is an image or a group of images; and determining that the type of the at least one narrative bystander information is a text type when the type of the digital visual media is video.

In some embodiments, the determining unit 501 is further configured to determine that the narrative bypass information corresponding to the audio type is stored after the narrative bypass information corresponding to the text type if the type of the at least one narrative bypass information includes a text type and an audio type.

In some embodiments, the narrative bystander information includes: the narrative content and registration information associated with the narrative content.

In some embodiments, the registration information includes at least one of: the name of the narrator, the date and time of creation, ownership information of the affiliated visual content.

In some embodiments, referring to fig. 7, the encoder 50 may further include a writing unit 504 configured to combine and write narrative bypass information including the narrative content and the registration information into a preset data set without changing the original visual content of the digital visual media;

the encoding unit 502 is further configured to embed the preset data set in a preset manner in a target media file or bitstream of the digital visual media.

In some embodiments, the writing unit 504 is further configured to write the at least one narrative bystander information into the preset data set.

In some embodiments, the writing unit 504 is specifically configured to register at least one narrative entry based on a name, a creation date, and a time of a narrative corresponding to the at least one narrative bystander information; and writing the at least one narrative bypass information into the preset dataset based on the at least one narrative entry.

In some embodiments, the determining unit 501 is further configured to determine new narrative bystander information to be added;

the writing unit 504 is further configured to store the new narrative bypass information after the existing narrative bypass information.

In some embodiments, the determining unit 501 is further configured to determine the at least one narrative bystander information conforming to a preset data structure; the preset data structure at least comprises one of the following: generic data structures and international standardization organization-based on media file format ISO-BMFF data structures.

In some embodiments, the ISO-BMFF data structure includes at least a narrative bypass metadata box including a narrative bypass metadata process box and a narrative bypass application box;

an encoding unit 502 further configured to process metadata of the current narrative bystander information through the narrative bystander metadata processing box; and describing, by the narrative bypass application box, at least one of the following narrative information: the start position of the current narrative bypass information, the data length of the current narrative bypass information, and the total number of narrative bypass information.

In some embodiments, the narrative bypass application box comprises a narrative bypass description box;

the encoding unit 502 is further configured to describe at least one of the following narrative information by means of the narrative bypass description box: text encoding criteria, narrative name, date of creation, time of creation, ownership flag of the affiliated visual content, type of narrative bystander, encoding criteria of narrative bystander, and text length of narrative bystander.

In some embodiments, the encoding unit 502 is further configured to create a narrative bypass metadata box if the digital visual media does not have the narrative bypass metadata box at the file level, and describe the at least one narrative bypass information with the narrative bypass metadata box; alternatively, if the digital visual media has a narrative bypass metadata box at the file level, the narrative bypass metadata box is created in the meco container box and the at least one narrative bypass information is described by the narrative bypass metadata box.

In some embodiments, the text data segment is encoded using a preset text encoding standard comprising at least one of: UTF-8, UTF-16, GB2312-80, GBK and Big 5.

In some embodiments, the audio clip is encoded using a preset audio encoding standard comprising at least one of: AVS audio, MP3, AAC, and WAV.

It will be appreciated that in the embodiments of the present application, a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., and may of course be a module, or may be non-modular. Furthermore, the components in the present embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional modules.

The integrated units, if implemented in the form of software functional modules, may be stored in a computer-readable storage medium, if not sold or used as separate products, and based on such understanding, the technical solution of the present embodiment may be embodied essentially or partly in the form of a software product, which is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or processor to perform all or part of the steps of the method described in the present embodiment. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Accordingly, embodiments of the present application provide a computer storage medium, for use in an encoder 50, storing a computer program that when executed by a first processor implements a method according to any of the preceding embodiments.

Based on the above-described composition of the encoder 50 and the computer storage medium, reference is made to fig. 8, which shows a schematic diagram of a specific hardware structure of the encoder 50 according to an embodiment of the present application. As shown in fig. 8, may include: a first communication interface 601, a first memory 602, and a first processor 603; the various components are coupled together by a first bus system 604. It is appreciated that the first bus system 604 is used to enable connected communications between these components. The first bus system 604 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration, the various buses are labeled as first bus system 604 in fig. 8. Wherein,

the first communication interface 601 is configured to receive and send signals during the process of receiving and sending information with other external network elements;

a first memory 602 for storing a computer program capable of running on the first processor 603;

the first processor 603 is configured to execute, when the computer program is executed:

determining at least one narrative bystander information to be added;

It is understood that the first memory 602 in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM). The first memory 602 of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

And the first processor 603 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in software form in the first processor 603. The first processor 603 described above may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the first memory 602, and the first processor 603 reads information in the first memory 602, and in combination with its hardware, performs the steps of the method described above.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (DSP devices, DSPD), programmable logic devices (Programmable Logic Device, PLD), field programmable gate arrays (Field-Programmable Gate Array, FPGA), general purpose processors, controllers, microcontrollers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof. For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Optionally, as another embodiment, the first processor 603 is further configured to perform the method of any of the previous embodiments when running the computer program.

The present embodiment provides an encoder, which may include a determination unit and an encoding unit. Therefore, for at least one piece of narrative side information to be added, special video editing software is not needed at this time, the narrative side information can be embedded into a target media file or bit stream of the digital visual media under the condition that the original visual content of the digital visual media is not changed, and the narrative side information can be presented once or a plurality of times when the digital visual media is played through a player, so that the operation of a user is simplified, the richness of the digital visual media is increased, and the viewing effect of the digital visual media is improved.

In still another embodiment of the present application, based on the same inventive concept as the previous embodiment, referring to fig. 9, a schematic diagram of the composition structure of a decoder 70 provided in the embodiment of the present application is shown. As shown in fig. 9, the decoder 70 may include: a decoding unit 701 and a playing unit 702; wherein,

a decoding unit 701 configured to parse the code stream to obtain at least one narrative bystander information;

a playing unit 702 configured to present the at least one narrative bystander information one or more times selectable when the digital visual media is played by the player.

In some embodiments, referring to fig. 10, the decoder 70 may further comprise a determining unit 703 configured to determine whether to present the at least one narrative bystander information;

a playing unit 702 further configured to play only the digital visual media by the player if the at least one narrative bystandstill information is not presented; alternatively, if the at least one narrative side information is presented, the step of presenting the at least one narrative side information one or more times selectable while the digital visual media is being played through the player is performed.

In some embodiments, the decoding unit 701 is further configured to parse the code stream to obtain a target media file or a bit stream sent by the encoder; and obtaining the at least one narrative bystander information from the target media file or the bit stream.

In some embodiments, the playing unit 702 is further configured to present the at least one narrative bystandings information at a selectable one or more times at a start time of playing the digital visual media by the player.

In some embodiments, the playing unit 702 is further configured to display the current narrative bystandstill information in a text format through the player after decoding the current narrative bystandstill information in a text data segment when the type of the current narrative bystandstill information is a text type.

In some embodiments, the playing unit 702 is further configured to, when the type of the current narrative bypass information is an audio type, decode the current narrative bypass information in the form of an audio clip, and play the decoded current narrative bypass information in the audio format by the player.

In some embodiments, the playing unit 702 is further configured to, when the type of the current narrative bypass information is text type, decode the current narrative bypass information in a text data segment manner, convert the decoded narrative bypass information into an audio format and play the narrative bypass information by the player.

In some embodiments, the playing unit 702 is further configured to, when the type of the current narrative bypass information is an audio type, decode the current narrative bypass information in the form of an audio clip, and convert the decoded narrative bypass information into a text format and display the text format by the player.

In some embodiments, the determining unit 703 is further configured to determine that the type of the at least one narrative bystander information is a text type and/or an audio type when the type of the digital visual media is an image or group of images; and determining that the type of the at least one narrative bystander information is a text type when the type of the digital visual media is video.

In some embodiments, the determining unit 703 is further configured to determine that at least two audio decoders are provided in the decoders if the type of the narrative bystander information is an audio type, in the case that the type of the digital visual medium is video; wherein the at least two audio decoders include a first audio decoder for decoding the digital visual media and playing, and a second audio decoder for decoding the narrative bypass information and playing.

In some embodiments, the decoding unit 701 is further configured to parse the code stream to obtain a preset data set; and acquiring the at least one narrative bystander information from the preset data set.

In some embodiments, the decoding unit 701 is further configured to parse a code stream to obtain the at least one narrative bystander created by the at least one user of the digital visual media.

In some embodiments, referring to fig. 10, the determining unit 703 may include a narrative play switch specifically configured to determine whether to present the at least one narrative side information via the narrative play switch; the narration playing switch is used for switching on or switching off the presentation of the narration side information.

In some embodiments, the playing unit 702 is further configured to play the at least one narrative bystander information with the digital visual media by the player when the narrative play switch is turned on; and playing only the digital visual media by the player when the narration play switch is turned off.

In some embodiments, the playing unit 702 is further configured to, when the narrative playing switch is turned on, display the narrative bystandstill information as a subtitle overlaid on the visual content of the digital visual media through the player, or display the narrative bystandstill information in a separate text window through the player, or convert the narrative bystandstill information into an audio format for playing through the player, if the type of the narrative bystandstill information is text type; and if the type of the narrative side information is an audio type, playing the narrative side information by using the player in an audio format as an independent audio signal, or converting the narrative side information into a text format by using the player for displaying.

In some embodiments, the playing unit 702 is further configured to play the digital visual media in a frozen mode or a looped mode in the background when the at least one narrative side information is presented in the foreground by the player when the narrative play switch is turned on.

In some embodiments, the playing unit 702 is further configured to play the at least one narrative side information by the player and present the visual content of the digital visual media and the registration information associated with the at least one narrative side information by the player when the narrative play switch is turned on and when the at least one narrative side information is a speech narrative.

In some embodiments, the determining unit 703 is further configured to determine new narrative bypass information to be added when the digital visual media is played by the player, and embed the new narrative bypass information into a target media file of the digital visual media.

In some embodiments, the decoding unit 701 is further configured to parse the code stream to obtain the at least one narrative bystander information that conforms to a preset data structure; the preset data structure at least comprises one of the following: a generic data structure and an ISO-BMFF data structure.

a decoding unit 701, further configured to decode, by the narrative bypass metadata processing box, metadata that obtains current narrative bypass information; and decoding, by the narrative bypass application box, at least one of the following narrative information: the start position of the current narrative bypass information, the data length of the current narrative bypass information, and the total number of narrative bypass information.

the decoding unit 701 is further configured to obtain, through the narrative bypass description box, at least one of the following narrative information: text encoding criteria, narrative name, date of creation, time of creation, ownership flag of the affiliated visual content, type of narrative bystander, encoding criteria of narrative bystander, and text length of narrative bystander.

In some embodiments, the decoding unit 701 is further configured to obtain the narrative metadata box if the digital visual media does not have the narrative metadata box at the file level, and obtain the at least one narrative information by decoding the narrative metadata box; or if the digital visual media has a description bystander metadata box at the file level, acquiring the description bystander metadata box from the meco container box, and decoding the description bystander metadata box to acquire the at least one description bystander information.

In some embodiments, the text data segment is decoded using a preset text decoding criteria including at least one of: UTF-8, UTF-16, GB2312-80, GBK and Big 5.

In some embodiments, the audio segment is decoded using a preset audio decoding standard comprising at least one of: AVS audio, MP3, AAC, and WAV.

It will be appreciated that in this embodiment, the "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., and may of course be a module, or may be non-modular. Furthermore, the components in the present embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional modules.

The integrated units may be stored in a computer readable storage medium if implemented in the form of software functional modules, and not sold or used as stand-alone products. Based on such understanding, the present embodiment provides a computer storage medium, applied to the decoder 70, storing a computer program that when executed by the second processor implements the method of any of the preceding embodiments.

Based on the above-described composition of the decoder 70 and the computer storage medium, reference is made to fig. 11, which shows a schematic diagram of a specific hardware structure of the decoder 70 provided in an embodiment of the present application. As shown in fig. 11, may include: a second communication interface 801, a second memory 802, and a second processor 803; the various components are coupled together by a second bus system 804. It is appreciated that the second bus system 804 is used to enable connected communications between these components. The second bus system 804 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled in fig. 11 as the second bus system 804. Wherein,

a second communication interface 801, configured to receive and send signals during information transceiving with other external network elements;

a second memory 802 for storing a computer program capable of running on the second processor 803;

a second processor 803 for executing, when running the computer program:

analyzing the code stream to obtain at least one narrative bystander;

It will be appreciated that the second memory 802 is similar in hardware function to the first memory 602 and the second processor 803 is similar in hardware function to the first processor 603; and will not be described in detail herein.

The present embodiment provides a decoder, which may include a decoding unit and a playing unit. Therefore, for at least one piece of narrative side information to be added, special video editing software is not needed at this time, the narrative side information can be embedded into a target media file or bit stream of the digital visual media under the condition that the original visual content of the digital visual media is not changed, and the narrative side information can be presented once or a plurality of times when the digital visual media is played through a player, so that the operation of a user is simplified, the richness of the digital visual media is increased, and the viewing effect of the digital visual media is improved.

In yet another embodiment of the present application, reference is made to fig. 12, which shows a schematic diagram of the composition of a narrative bypass system according to an embodiment of the present application. As shown in fig. 12, the narrative bypass system 90 may be composed of two parts, an encoder 901 and a decoder 902, and the narrative bypass system 90 may also be implemented by running a computer program or software on an electronic device. Here, these electronic devices are capable of capturing and displaying digital images, or recording and playing digital video. For example, such an electronic device may be a smart phone, a tablet, a laptop or notebook, a television, etc.

It should be noted that the encoder 901 may be the encoder 50 according to any of the foregoing embodiments, and the decoder 902 may be the decoder 70 according to any of the foregoing embodiments.

It should also be noted that, at the encoder 901 side, the electronic device may obtain the narrative information from the user and then embed the narrative information in a target media file or bitstream of the digital visual medium in a data segment of a specific format. The digital visual media may be images or video and the encoding process does not alter the original visual content of the digital visual media and its associated data. At the decoder 902 side, when an image or group of images or video is played, the electronic device extracts narrative bypass information from the target media file or bitstream and then presents the narrative bypass information to the viewer along with the visual content of the digital visual media.

In the narrative bypass system, when the digital visual media is an image or series of images, the narrative bypass information may be in text format or in audio format. In this system, text narrative information is represented by text data segments and speech narrative information is stored in the form of audio clips or audio clips. The system can support various standard encodings of text and audio, such as UTF-8, UTF-16, GB2312-80, GBK, big 5, AAC, MP3, AVS audio, WAV, and the like. Since the preset data set is mainly used for recording the narrative white-space information, the preset data set can also be called as a 'narrative data set'. In the narrative data set, the narrative information recorded in the data segment includes the name of the narrative, the date and time of creation, ownership of the visual content, and this technique supports a variety of narrative information, even though these narrative information may come from different users. For example, narrative bystanders may be created by a photographer at the time of capturing visual content, or by a viewer during playback of visual content, or by an organizer who owns visual content and wishes to add comments during editing. In the system, each narrative entry is registered by a narrative name in a specific data structure. The ownership flag is used to indicate whether the user owns the visual content. In the embodiment of the application, the user is also allowed to add new narrative side information after the existing narrative side information. When this occurs, new narrative bypass information will be added to the existing prior narrative bypass information. The embodiments of the present application also support narrative bypass information in text format and audio format. In this case, the audio narration portion will be stored after the text narration portion.

In a narrative bypass system, the basic function of the decoder/player is to first parse and decode the narrative bypass information and then present the narrative bypass information with the original visual content. Playback of the narrative bypass information may be controlled by a narrative play switch. When the narrative play switch is turned off, the original visual content will be played back without any modification. When the narration play switch is turned on, the decoder/player may select the exact narration presentation format. For example, for text narrative information, it may be displayed as subtitles overlaid on top of visual content, or in a separate text window, or may even be played as an audio signal after being synthesized by the decoder/player. Alternatively, the speech narrative of an image or group of images may be played as a separate audio signal or may be displayed as text after being transcribed by the player. For a set of images, careful playback of the speech narration is independent of the image playback format specified in the original digital visual media. When a narrative bypass message contains both a text narrative portion and an audio narrative portion in a predetermined data set, the two portions may be presented simultaneously or separately.

Further, the embodiments of the present application may also have several built-in player configurations to provide the user with maximum viewing experience and flexibility. In these configuration functions, a drop-down setup menu may be built into the player containing options for configuring the presentation format of the narrative and additional information such as the name of the narrative and the date of creation of the narrative is displayed on the display of the player based on the viewer's interests. For the review of narrative whiteinformation, embodiments of the present application may also create a narrative whitelist to allow the user to selectively review the narrative whiteinformation. The viewer may also configure the player to display the text content as overlaid or in a separate text window. When a separate text window is used, if the visual content is video or a set of images, the original digital visual media can be played in a loop mode in the background while one or more narrative notes are played at a time. In this case, the user can scroll through the narrative in this separate text window. In addition, the player may allow the user to configure the speed of scrolling text in the case where narrative bystandstill information is overlaid over visual content as subtitles. The player may also display the original visual content along with the name of the narrator and the date/time of creation of the narrative in text form as the voice narrative is played. In addition, the player may create a narrative editor and encoder that allows the user to add new narrative bystander information to the target media file of the digital visual media.

It will also be appreciated that embodiments of the present application provide a method for describing digital visual media such as images or groups of images or video of bystander information, in particular as follows:

in some embodiments, the present embodiments provide a narrative encoder that may be implemented by running a computer program or software on an electronic device capable of capturing and recording digital images and video files.

In some embodiments, the embodiments of the present application provide a narrative decoder that may be implemented by running a computer program or software on an electronic device capable of displaying images and playing video.

In some embodiments, embodiments of the present application may create narrative bypass information for multiple users of the same digital visual media.

In some embodiments, embodiments of the present application may create a text data segment to represent text narrative information.

In some embodiments, embodiments of the present application may produce an audio clip to represent the speech narrative.

In some embodiments, embodiments of the present application may create narrative bystander fragments using text format and audio format.

In some embodiments, embodiments of the present application may combine narrative bypass information including narrative content and associated registration information in a preset dataset having a specific data structure and embed it in a target media file or bitstream of digital visual media without altering the original visual content of the digital visual media.

In some embodiments, the text narrative may be decoded and displayed as narrative text by embodiments of the present application, or played as narrative audio by a speech synthesizer on the decoder.

In some embodiments, embodiments of the present application may decode and play the phonetic narrative information as narrative audio or present it transcribed as narrative text.

In some embodiments, the registration information for each narrative bystander includes, but is not limited to: the name of the narrator, the date and time of creation, ownership of the affiliated visual content.

In some embodiments, the text data segments may be encoded using any text encoding standard, such as UTF-8, UTF-16, GB2312-80, GBK, big 5, and the like.

In some embodiments, the audio clip may be encoded using any audio encoding standard, such as AVS audio, MP3, AAC, WAV, and so forth.

In some embodiments, the player has a switch, i.e., narrative play switch, that can turn on or off the play of narrative information.

In some embodiments, when the narrative play switch is turned on, the player plays the narrative bypass information along with the original visual content; when the narration playing switch is turned off, the player only plays the original visual content without any change; and the player plays the narrative bystanding information in the foreground while placing the original visual content in the background for playback in either frozen mode or in looped mode.

In some embodiments, the data segment is an ISO-BMFF narrative bypass metadata box format comprising:

a narrative bypass metadata processing box for processing metadata of the narrative bypass information;

a narration bystander application box having general information describing a start position of current narration bystander information, a data length of the current narration bystander information, a total number of the narration bystander information and the like, comprising:

a narrative bypass description box containing narrative information.

In some embodiments, embodiments of the present application provide a computer system for operating one or more steps of the method.

In some embodiments, embodiments of the present application provide a computer-readable medium storing instructions that, when executed by a processor in a computer system, cause the processor to perform one or more of the steps of the above-described methods.

Thus, to enhance the viewing experience of digital visual media, embodiments of the present application provide a method by which narrative bypass information may be added to a digital video, image, or group of images. Here, the narrative bypass information may be in text format or audio format, or both, which may be written to a current media file or bitstream along with the original digital visual media and may be displayed or played along with the digital visual media. By using the method of the embodiment of the application, users such as photographers or audiences can record own emotion when shooting or watching videos or images. The method supports narrative bystanding information from multiple users and registers each narrative entry by name of the narrative and creation date and time. The narrative data and associated registration information are stored in a specific data structure and stored to a data segment in the target media file or bitstream without altering the original visual and audio content.

The present embodiment provides a narrative bypass system that may include an encoder and a decoder. Thus, for the narrative bypass system, special video editing software is not needed any more, the narrative bypass information can be embedded into a target media file or bit stream of the digital visual media under the condition that the original visual content of the digital visual media is not changed, and the narrative bypass information can be presented once or a plurality of times when the digital visual media is played through a player, so that the operation of a user is simplified, the richness of the digital visual media is increased, and the viewing effect of the digital visual media is also improved.

It should be noted that, in this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

The methods disclosed in the several method embodiments provided in the present application may be arbitrarily combined without collision to obtain a new method embodiment.

The features disclosed in the several product embodiments provided in the present application may be combined arbitrarily without conflict to obtain new product embodiments.

The features disclosed in the several method or apparatus embodiments provided in the present application may be arbitrarily combined without conflict to obtain new method embodiments or apparatus embodiments.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Industrial applicability

In the embodiment of the application, at the encoder side, at least one piece of narrative bypass information to be added is determined; the at least one narrative bypass message is embedded in a target media file or bitstream of the digital visual media in a preset manner without altering the original visual content of the digital visual media. At the decoder side, at least one narrative bystander information is obtained by analyzing the code stream; the at least one narrative bystander information is optionally presented one or more times while the digital visual media is being played by the player. Thus, for at least one narrative side information to be added, special video editing software is not needed at this time, the narrative side information can be embedded into a target media file or bit stream of the digital visual media under the condition that the original visual content of the digital visual media is not changed, and the narrative side information can be selectively presented once or a plurality of times when the digital visual media is played through a player, so that the operation of a user is simplified, the richness of the digital visual media is increased, and the viewing effect of the digital visual media is also improved.

Claims

An information processing method applied to a decoder, the method comprising:

analyzing the code stream to obtain at least one narrative bystander;

the at least one narrative bystander information is optionally presented one or more times while the digital visual media is being played by the player.
The method of claim 1, wherein the method further comprises:

determining whether to present the at least one narrative bystandstill information;

if the at least one narrative side information is not presented, only playing the digital visual media through a player;

if the at least one narrative side information is presented, the step of presenting the at least one narrative side information one or more times selectable while the digital visual media is being played through the player is performed.
The method of claim 1, wherein the parsing the code stream to obtain at least one narrative bystander information comprises:

analyzing the code stream to obtain a target media file or bit stream sent by an encoder;

the at least one narrative bystander information is obtained from the target media file or the bit stream.
The method of claim 1, wherein the selectable one or more presentations of the at least one narrative bystandings while the digital visual media is being played by the player comprises:

The at least one narrative bystandstill information is presented, optionally one or more times, at a start time of playing the digital visual media by the player.
The method of claim 1, wherein,

the type of narrative bystander information includes at least one of the following: text type and audio type;

the types of digital visual media include at least one of: video, images, and image groups, the image groups comprising at least two images.
The method of claim 5, wherein when the type of the current narrative bystandstill information is a text type, the method further comprises:

and after decoding the current narrative side information in a text data segment mode, displaying the current narrative side information in a text format through the player.
The method of claim 5, wherein when the type of the current narrative bypass information is an audio type, the method further comprises:

and after decoding the current narration side information in an audio clip mode, playing the current narration side information in an audio format through the player.
The method of claim 5, wherein when the type of the current narrative bystandstill information is a text type, the method further comprises:

And after decoding the current description side information in a text data segment mode, converting the decoded description side information into an audio format and playing the audio format through the player.
The method of claim 5, wherein when the type of the current narrative bypass information is an audio type, the method further comprises:

and after the current description side information is decoded in an audio clip mode, converting the decoded description side information into a text format and displaying the text format through the player.
The method of claim 5, wherein the method further comprises:

determining that the type of the at least one narrative bystander information is a text type and/or an audio type when the type of the digital visual media is an image or an image group;

when the type of the digital visual media is video, determining that the type of the at least one narrative bystander information is a text type.
The method of claim 10, wherein the method further comprises:

if the type of the digital visual media is video and the type of the narrative bystander information is audio, determining that at least two audio decoders are arranged in the decoders; wherein the at least two audio decoders include a first audio decoder for decoding the digital visual media and playing, and a second audio decoder for decoding the narrative bypass information and playing.
The method of claim 1, wherein the parsing the code stream to obtain at least one narrative bystander information comprises:

analyzing the code stream to obtain a preset data set;

and acquiring the at least one narrative bystander from the preset data set.
The method of claim 1, wherein the parsing the code stream to obtain at least one narrative bystander information comprises:

and analyzing the code stream to acquire the at least one narrative bystander information created by at least one user of the digital visual media.
The method of claim 1, wherein the narrative bypass information comprises: the narrative content and registration information associated with the narrative content.
The method of claim 14, wherein the registration information comprises at least one of: the name of the narrator, date and time of creation, ownership information of the affiliated visual content.
The method of claim 1, wherein the decoder includes a narrative play switch for turning on or off presentation of narrative bystandstill information.
The method of claim 16, wherein the method further comprises:

playing the at least one narrative side message with the digital visual media by the player when the narrative play switch is turned on;

When the narration playing switch is turned off, only the digital visual media is played through the player.
The method of claim 17, wherein said playing, by said player, said at least one narrative bystandstill information with said digital visual media when said narrative play switch is turned on, comprises:

if the type of the narrative side information is text type, displaying the narrative side information as a subtitle covered on the visual content of the digital visual media through the player, or displaying the narrative side information in a single text window through the player, or converting the narrative side information into an audio format through the player for playing;

and if the type of the narrative side information is an audio type, playing the narrative side information by using the player in an audio format as an independent audio signal, or converting the narrative side information into a text format by using the player for displaying.
The method of claim 17, wherein said playing, by said player, said at least one narrative bystandstill information with said digital visual media when said narrative play switch is turned on, comprises:

Playing, by the player, the digital visual media in a frozen mode or a looped mode in a background while presenting the at least one narrative bystandstill information in the foreground.
The method of claim 17, wherein when the narrative play switch is turned on, the method further comprises:

and when the at least one narrative side information is voice narrative information, playing the at least one narrative side information through the player, and presenting the visual content of the digital visual media and the registration information associated with the at least one narrative side information through the player.
The method of claim 1, wherein the method further comprises:

and when the digital visual media is played through the player, determining new description side information to be added, and embedding the new description side information into a target media file of the digital visual media.
The method of any of claims 1 to 21, wherein the parsing the code stream to obtain at least one narrative bystander information comprises:

analyzing the code stream to obtain the at least one narrative bystander information which accords with a preset data structure; the preset data structure at least comprises one of the following: generic data structures and international standardization organization-based on media file format ISO-BMFF data structures.
The method of claim 22, wherein the ISO-BMFF data structure includes at least a narrative bypass metadata box including a narrative bypass metadata process box and a narrative bypass application box;

correspondingly, the parsing the code stream to obtain at least one narrative bystander information includes:

decoding to obtain metadata of the current narrative side information through the narrative side metadata processing box;

decoding, by the narrative bypass application box, at least one of the following narrative information: the start position of the current narrative bypass information, the data length of the current narrative bypass information, and the total number of narrative bypass information.
The method of claim 23, wherein the narrative bypass application box comprises a narrative bypass description box, the method further comprising:

decoding, by the narrative bypass description box, at least one of the following narrative information: text encoding criteria, narrative name, date of creation, time of creation, ownership flag of the affiliated visual content, type of narrative bystander, encoding criteria of narrative bystander, and text length of narrative bystander.
The method of claim 23, wherein the method further comprises:

If the digital visual media does not have the description white-out metadata box at the file level, acquiring the description white-out metadata box, and decoding the description white-out metadata box to acquire the at least one description white-out information;

if the digital visual media has a narrative white-spot metadata box at the file level, the narrative white-spot metadata box is obtained from the meco container box, and the at least one narrative white-spot information is obtained through decoding of the narrative white-spot metadata box.
The method of claim 6 or 8, wherein the text data segment is decoded using a preset text decoding criteria comprising at least one of: UTF-8, UTF-16, GB2312-80, GBK and Big 5.
The method of claim 7 or 9, wherein the audio clip is decoded using a preset audio decoding standard comprising at least one of: AVS audio, MP3, AAC, and WAV.
An information processing method applied to an encoder, the method comprising:

determining at least one narrative bystander information to be added;

the at least one narrative bypass message is embedded in a target media file or bitstream of the digital visual media in a preset manner without altering the original visual content of the digital visual media.
The method of claim 28, wherein the embedding the at least one narrative bystander information into a target media file or bitstream of the digital visual media in a preset manner comprises:

and storing the at least one narrative bystander information at the starting position of the digital visual media in a preset mode to generate the target media file or the bit stream.
The method of claim 28, wherein the determining at least one narrative bystander information to be added comprises:

creating narrative bypass information for at least one user of said digital visual media, obtaining said at least one narrative bypass information.
The method of claim 28, wherein,

the type of narrative bystander information includes at least one of the following: text type and audio type;

the types of digital visual media include at least one of: video, images, and image groups, the image groups comprising at least two images.
The method of claim 31, wherein when the type of the current narrative bystandstill information is a text type, the method further comprises:

creating a text data segment;

accordingly, the embedding the at least one narrative bypass message into the target media file or bit stream of the digital visual media in a preset manner includes:

The current narrative bystander information is embedded into a target media file or bit stream of the digital visual media in a text data segment manner.
The method of claim 31, wherein when the type of the current narrative bypass information is an audio type, the method further comprises:

creating an audio clip;

accordingly, the embedding the at least one narrative bypass message into the target media file or bit stream of the digital visual media in a preset manner includes:

the current narrative bystander information is embedded into a target media file or bit stream of the digital visual media in the form of an audio clip.
The method of claim 31, wherein when the type of the current narrative bystandstill information is a text type, the method further comprises:

converting the current description side information into the description side information corresponding to the audio type, and creating an audio fragment;

accordingly, the embedding the at least one narrative bypass message into the target media file or bit stream of the digital visual media in a preset manner includes:

the current narrative bystander information is embedded into a target media file or bit stream of the digital visual media in the form of an audio clip.
The method of claim 31, wherein when the type of the current narrative bypass information is an audio type, the method further comprises:

converting the current description side information into description side information corresponding to the text type, and creating a text data segment;

accordingly, the embedding the at least one narrative bypass message into the target media file or bit stream of the digital visual media in a preset manner includes:

the current narrative bystander information is embedded into a target media file or bit stream of the digital visual media in a text data segment manner.
The method of claim 31, wherein the method further comprises:

determining that the type of the at least one narrative bystander information is a text type and/or an audio type when the type of the digital visual media is an image or an image group;

when the type of the digital visual media is video, determining that the type of the at least one narrative bystander information is a text type.
The method of claim 28, wherein the method further comprises:

if the type of the at least one piece of narrative side information comprises a text type and an audio type, the narrative side information corresponding to the audio type is determined to be stored behind the narrative side information corresponding to the text type.
The method of claim 28, wherein the narrative bypass information comprises: the narrative content and registration information associated with the narrative content.
The method of claim 38, wherein the registration information includes at least one of: the name of the narrator, the date and time of creation, ownership information of the affiliated visual content.
The method of claim 38, wherein said embedding the at least one narrative bypass information in a target media file or bitstream of the digital visual media in a preset manner without altering the original visual content of the digital visual media comprises:

combining the narrative bypass information comprising the narrative content and the registration information and writing the narrative bypass information into a preset data set;

embedding the preset data set into a target media file or bit stream of the digital visual media in a preset manner.
The method of claim 40, wherein the method further comprises:

writing the at least one narrative bystander information into the preset data set.
The method of claim 41, wherein said writing said at least one narrative bystander information into said preset dataset comprises:

Registering at least one narrative entry based on a name, creation date and time of a narrative corresponding to the at least one narrative bystander information;

writing the at least one narrative bystander information into the preset data set based on the at least one narrative entry.
The method of claim 28, wherein the method further comprises:

determining new narrative side information to be added;

the new narrative bypass message is stored after the existing narrative bypass message.
The method of any one of claims 28 to 43, wherein the determining at least one narrative bystander information to be added comprises:

determining the at least one narrative bypass message conforming to a preset data structure; the preset data structure at least comprises one of the following: generic data structures and international standardization organization-based on media file format ISO-BMFF data structures.
The method of claim 44 wherein the ISO-BMFF data structure includes at least a narrative bypass metadata box including a narrative bypass metadata process box and a narrative bypass application box;

accordingly, the method further comprises:

Processing metadata of the current narrative side information through the narrative side metadata processing box;

by means of the narrative bypass application box, at least one of the following narrative information is described: the start position of the current narrative bypass information, the data length of the current narrative bypass information, and the total number of narrative bypass information.
The method of claim 45, wherein the narrative bypass application box comprises a narrative bypass description box, the method further comprising:

at least one of the following description information is described by the description side description box: text encoding criteria, narrative name, date of creation, time of creation, ownership flag of the affiliated visual content, type of narrative bystander, encoding criteria of narrative bystander, and text length of narrative bystander.
The method of claim 45, wherein the method further comprises:

if the digital visual media does not have a description white-out metadata box at the file level, creating the description white-out metadata box and describing the at least one description white-out information through the description white-out metadata box;

if the digital visual media has a narrative bypass metadata box at the file level, creating the narrative bypass metadata box in a meco container box and describing the at least one narrative bypass information by the narrative bypass metadata box.
The method of claim 32 or 35, wherein the text data segments are encoded using a preset text encoding standard comprising at least one of: UTF-8, UTF-16, GB2312-80, GBK and Big 5.
The method of claim 33 or 34, wherein the audio piece is encoded using a preset audio coding standard comprising at least one of: AVS audio, MP3, AAC, and WAV.
An encoder, the encoder comprising a determination unit and an encoding unit; wherein,

the determining unit is configured to determine at least one piece of narrative bystander information to be added;

the encoding unit is configured to embed the at least one narrative bystander information in a target media file or bit stream of the digital visual media in a preset manner without changing the original visual content of the digital visual media.
An encoder, the encoder comprising a first memory and a first processor; wherein,

the first memory is used for storing a computer program capable of running on the first processor;

the first processor being configured to perform the method of any of claims 28 to 49 when the computer program is run.
A decoder comprising a decoding unit and a playback unit; wherein,

the decoding unit is configured to analyze the code stream and acquire at least one narrative bystander;

the playing unit is configured to present the at least one narrative bystanding information one or more times selectable when the digital visual media is played by the player.
A decoder comprising a second memory and a second processor; wherein,

the second memory is used for storing a computer program capable of running on the second processor;

the second processor being adapted to perform the method of any of claims 1 to 27 when the computer program is run.
A computer storage medium storing a computer program which, when executed by a processor, implements the method of any one of claims 1 to 27 or the method of any one of claims 28 to 49.
An electronic device, wherein the electronic device comprises at least an encoder as claimed in claim 50 or 51 and a decoder as claimed in claim 52 or 53.