WO2021227580A1 - 信息处理方法及编码器、解码器、存储介质设备 - Google Patents

信息处理方法及编码器、解码器、存储介质设备 Download PDF

Info

Publication number
WO2021227580A1
WO2021227580A1 PCT/CN2021/075622 CN2021075622W WO2021227580A1 WO 2021227580 A1 WO2021227580 A1 WO 2021227580A1 CN 2021075622 W CN2021075622 W CN 2021075622W WO 2021227580 A1 WO2021227580 A1 WO 2021227580A1
Authority
WO
WIPO (PCT)
Prior art keywords
narration
information
media content
audio
visual media
Prior art date
Application number
PCT/CN2021/075622
Other languages
English (en)
French (fr)
Inventor
于浩平
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to CN202180035459.3A priority Critical patent/CN115552904A/zh
Publication of WO2021227580A1 publication Critical patent/WO2021227580A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/278Subtitling

Definitions

  • the embodiments of the present application relate to multimedia technology, and relate to, but are not limited to, information processing methods, encoders, decoders, and storage media devices.
  • smart phones Because of usability and affordability, smart phones have become the most popular electronic devices. Having a smart phone is not only necessary today, but also a norm. Therefore, smart phones have had many significant impacts on the entire society and culture.
  • One of the changes in people's lifestyle is that consumers use smartphones to take photos or shoot videos as a way to record their daily activities, which has become a universal trend worldwide.
  • the information processing methods, encoders, decoders, and storage media devices provided by the embodiments of the present application can allow users to embed the emotional expression of visual media content (that is, narration and narration information) into the media file of the visual media content ( media file) or bitstream, so that when the user wants to play back the visual media content on an electronic device, he can view the narrative data of the visual media content;
  • the information processing method and encoding provided in the embodiments of the present application
  • the decoder, decoder, and storage medium equipment are implemented as follows:
  • an embodiment of the present application provides an information processing method.
  • the method includes: parsing a code stream to obtain at least one narration narration information and a corresponding presentation time of the visual media content; when playing the visual media content, according to At the presentation time, the at least one narration narration information is presented.
  • an embodiment of the present application provides an information processing method, the method includes: determining at least one piece of narration narration information to be added and a corresponding presentation time; without changing the visual media content corresponding to the at least one piece of narration narration information In the case of, embedding the at least one narration narration information and the corresponding presentation time into the media file or bitstream of the visual media content in a preset manner to obtain a new media file or a new bitstream; The new media file or the new bit stream is encoded to obtain a bit stream.
  • an embodiment of the present application provides a decoder, the decoder includes a decoding module and a playback module; wherein the decoding module is used to parse the code stream to obtain at least one piece of narration and narration information of the visual media content and the corresponding The presentation time; the playback module is configured to present the at least one narration narration information according to the presentation time when the visual media content is played.
  • an embodiment of the present application provides a decoder, the decoder includes a memory and a processor; wherein the memory is configured to store a computer program that can run on the processor; the processor, It is used to execute the information processing method at the decoding end described in the embodiment of the present application when the computer program is running.
  • an embodiment of the present application provides an encoder, the encoder includes a determining module, an embedded module, and an encoding module; wherein the determining module is used to determine at least one narration narration information to be added and the corresponding presentation Time; the embedding module is configured to embed the at least one narration narration information and the corresponding presentation time in the visual media content in a preset manner without changing the visual media content corresponding to the at least one narration narration information From the media file or bit stream of the media content, a new media file or a new bit stream is obtained; the encoding module is used to encode the new media file or the new bit stream to obtain a bit stream.
  • an embodiment of the present application provides an encoder, the encoder includes a memory and a processor; wherein the memory is configured to store a computer program that can run on the processor; the processor, It is used to execute the information processing method on the encoding end described in the embodiment of the present application when the computer program is running.
  • an embodiment of the present application provides a computer storage medium, wherein the computer storage medium stores a computer program, and when the computer program is executed by a processor, the method described in the embodiment of the present application is implemented.
  • an electronic device wherein the electronic device includes at least the encoder described in the embodiment of the present application and/or the decoder described in the embodiment of the present application.
  • the user can be allowed to embed the emotional expression of the visual media content (that is, the narration and narration information) into the media file or bitstream of the visual media content, so that when the user wants to use the electronic device When the visual media content of the media is played back, the associated narration and narration information can be viewed.
  • the emotional expression of the visual media content that is, the narration and narration information
  • FIG. 1 is a schematic diagram of an implementation process of an information processing method on an encoding end in an embodiment of this application;
  • FIG. 2 is a schematic diagram of the implementation process of the information processing method at the decoding end according to an embodiment of the application;
  • FIG. 3 is a schematic diagram of the general data structure of the embodiment of the application and the structure of the International Organization for Standardization Base Media File Format (ISO-BMFF) file;
  • ISO-BMFF International Organization for Standardization Base Media File Format
  • FIG. 4 is a schematic diagram of the structure of an ISO-BMFF file according to an embodiment of the application.
  • 5A is a schematic diagram of the structure of an ISO-BMFF file according to an embodiment of the application.
  • FIG. 5B is a schematic diagram of the structure of the meta box 502 according to the embodiment of the application.
  • FIG. 6 is a schematic structural diagram of a decoder according to an embodiment of the application.
  • FIG. 7 is a schematic structural diagram of an encoder according to an embodiment of the application.
  • FIG. 8 is a schematic diagram of the hardware entity of an encoder according to an embodiment of the application.
  • FIG. 9 is a schematic diagram of the hardware entity of a decoder according to an embodiment of the application.
  • first ⁇ second ⁇ third involved in the embodiments of this application is used to distinguish similar or different objects, and does not represent a specific order of objects. Understandably, “first ⁇ second ⁇ third” “Three” may be interchanged in specific order or sequence when permitted, so that the embodiments of the present application described herein can be implemented in a sequence other than those illustrated or described herein.
  • the embodiment of the application provides an information processing method, which can be applied to the encoding end, and the electronic device corresponding to the encoding end.
  • the device can be any electronic device with encoding capability.
  • the electronic device can be a mobile phone, a personal computer, or a laptop. , TV, server, etc.
  • the functions implemented by the information processing method can be implemented by a processor in the electronic device calling program codes, and of course the program codes can be stored in a computer storage medium. It can be seen that the electronic device at least includes a processor and a storage medium.
  • FIG. 1 is a schematic diagram of the implementation process of the information processing method according to an embodiment of the application. As shown in FIG. 1, the method may include the following steps 101 to 103:
  • Step 101 Determine at least one narration narration information to be added and the corresponding presentation time
  • Step 102 Without changing the visual media content corresponding to the at least one narration narration information, embed the at least one narration narration information and the corresponding presentation time into the media file of the visual media content in a preset manner or In the bitstream, get a new media file or a new bitstream;
  • Step 103 Encode the new media file or the new bitstream to obtain a bitstream.
  • the visual media content is a video or a set of images; accordingly, when the narration narration information is original text, text converted from audio, or combined audio and text, the presentation of the text
  • the time is expressed in the form of a marked start frame and at least one continuous frame of the visual media content.
  • the visual media content is a video clip or a set of images; correspondingly, when the narration narration information is original audio, audio converted from text, or combined audio and text, the audio
  • the presentation time is expressed in the form of a marked start frame and duration of the visual media content.
  • the number of continuous frames in the duration of the audio converted from the text is less than the number of continuous frames of the corresponding text.
  • the method further includes: embedding the registration information of the narration narration information into a media file or a bitstream of the visual media content in a preset manner.
  • the registration information of the narration narration information includes at least one of the following: the name of the narrator, the creation date and time, and the ownership information of the visual media content.
  • embedding the at least one narration narration information into a media file or a bitstream of the visual media content in a preset manner includes: storing the at least one narration narration information in a preset manner in the visual media content. State the starting position of the visual media content.
  • the determining at least one narration narration information to be added includes: creating narration narration information for at least one user of the visual media content, and obtaining the at least one narration narration information.
  • the type of narration narration information includes at least one of the following: text type and audio type;
  • the type of visual media content includes at least one of the following: video, image, and image group, the image The group includes at least two images.
  • the method further includes: creating a text data segment; accordingly, embedding the at least one narration narration information into the visual medium in a preset manner
  • the media file or bitstream of the content includes: embedding the current narration narration information into the media file or bitstream of the visual media content in the form of a text data segment.
  • the method further includes: creating an audio segment; accordingly, embedding the at least one narration narration information into the visual media content in a preset manner
  • the media file or bitstream includes: embedding the current narration narration information into the media file or bitstream of the visual media content in an audio segment.
  • the method when the type of the current narration narration information is a text type, the method further includes: converting the current narration narration information into narration narration information corresponding to the audio type, and creating an audio segment; accordingly, The at least one narration narration information is embedded in a media file or a bitstream of the visual media content in a preset manner, including: embedding the current narration narration information into the media file of the visual media content in the form of an audio segment Or in the bitstream.
  • the method further includes: converting the current narration narration information into narration narration information corresponding to the text type, and creating a text data segment; accordingly, Embedding the at least one narration narration information into the media file or bitstream of the visual media content in a preset manner includes: embedding the current narration narration information into the visual media content in the form of text data segments Media file or bitstream.
  • the method further includes: when the type of the visual media content is an image or an image group, determining that the type of the at least one narration narration information is a text type and/or an audio type; When the type of the media content is a video, it is determined that the type of the at least one narration narration information is a text type.
  • the method further includes: if the type of the narration narration information includes a text type and an audio type, storing the narration narration information corresponding to the audio type after the narration narration information corresponding to the text type .
  • the method further includes: determining new narration narration information to be added; and storing the new narration narration information after the existing narration narration information.
  • the media file or bit stream conforms to a preset data structure; wherein, the preset data structure includes at least one of the following: a general data structure and an ISO-BMFF data structure; accordingly, the The at least one narration narration information and the corresponding presentation time are embedded in the media file or bitstream of the visual media content in a preset manner, including: the at least one narration narration information and the corresponding presentation time are set in the preset manner
  • the form of the data structure is embedded in the media file or bitstream of the visual media content.
  • the ISO-BMFF data structure includes at least a narration and narration metadata box
  • the narration and narration metadata box includes a metadata processing box and a narration and narration application box
  • the metadata processing box includes the current narration Metadata of the narration information
  • the narration narration application box includes at least one of the following narration information: the starting position of the current narration narration information, the length of the current narration narration information, and the total number of narration narration information.
  • the narration narration application box includes a narration narration description box
  • the method further includes: decoding through the narration narration description box to obtain at least one of the following narrative information: text encoding standard, narrator name, Creation date, creation time, ownership mark of auxiliary visual content, type of narration narration information, coding standard of narration narration information, and text length of narration narration information.
  • the method further includes: if the visual media content does not have a narration narration metadata box at the file level, obtaining the narration narration metadata box, and obtaining the narration narration metadata box through decoding The at least one narration narration information; if the visual media content has a narration narration metadata box at the file level, obtain the narration narration metadata box from the meco container box, and obtain the narration narration metadata box through decoding The at least one narration narration information.
  • the text data segment is encoded using a preset text encoding standard
  • the preset text encoding standard includes at least one of the following: UTF-8, UTF-16, GB2312-80, GBK, and Big 5.
  • the preset text encoding standard can also be any other predefined standard.
  • the audio clip is encoded using a preset audio coding standard
  • the preset audio coding standard includes at least one of the following: AVSaudio, MP3, AAC, and WAV.
  • the preset audio coding standard may also be any other predefined standard.
  • the embodiment of the application provides an information processing method.
  • the method can be applied to the decoding end and the corresponding electronic device of the decoding end.
  • the device can be any electronic device with decoding capability and playback capability.
  • the electronic device can be a mobile phone or a personal computer. Computers, laptops, televisions, servers, etc.
  • the functions implemented by the information processing method can be implemented by a processor in the electronic device calling program codes, and of course the program codes can be stored in a computer storage medium. It can be seen that the electronic device at least includes a processor and a storage medium.
  • FIG. 2 is a schematic diagram of the implementation flow of the information processing method according to an embodiment of the application. As shown in FIG. 2, the method may include the following steps 201 to 202:
  • Step 201 Parse the code stream to obtain at least one narration narration information and corresponding presentation time of the visual media content.
  • the corresponding designated presentation time is also added.
  • the narration narration information is only presented within the corresponding presentation time, and the presentation of the narration narration information is not performed at other times than the presentation time.
  • the presentation time of different narration narration information can be the same or different, which is not limited in this application. Two or more narration narration information can be presented at the same time, or different narration narration information can be presented in sequence.
  • the types of visual media content can be diverse.
  • the visual media content is an image, a group of images (that is, including two or more images), or a video.
  • the presentation format of the presentation time corresponding to different narration narration formats can be the same or different.
  • an overview of narrative narration formats and possible presentation formats is given in Table 1 below.
  • Table 1 describes the narration format vs. presentation format
  • Note 1 For text narration narration or converted text narration narration, for each frame or image marked by the "start” and “duration” of the video or a group of images in the window, the entire narration narration should be displayed together.
  • the narration narration should start from the "start" frame marked for the video or a group of images, and continue for the entire time period marked by the "duration" of the video, which is equal to the playback of the audio signal length. However, it is allowed to have audio narration narrations whose playing time exceeds the playing duration of the video. If this happens, the playback device can freeze the video playback at the end of the video playback time, or continue the video playback in a loop mode.
  • the duration of the narration narration of the synthesized audio should be less than the duration. If this is not the case, it should be considered that there is more than one narration narration at a given time.
  • the complete audio (original or synthetic) narration narration is associated with each frame in the duration.
  • the playback of the audio narration narration has nothing to do with the presentation of the image.
  • the player can decide whether to play the same synthesized audio for each frame or just for the start frame. For example, if the player plays an image as a still image, the player can repeat the audio narration narration for each frame. On the other hand, if the playback of the image has a certain frame rate, the synthesized audio can be played in an asynchronous manner. If the playback exceeds the duration, the player can freeze the video playback or continue playback (for example, in loop mode).
  • Step 202 When playing the visual media content, present the at least one narration narration information according to the presentation time.
  • the parsing the code stream to obtain at least one piece of narrative narration information and the corresponding presentation time of the visual media content includes: parsing the code stream to obtain a media file or bit stream sent by an encoder; from the media file Or in the bitstream, the visual media content, at least one narration narration information and the corresponding presentation time are obtained.
  • the visual media content is a video or a set of images; accordingly, when the narration narration information is original text, text converted from audio, or combined audio and text, the presentation of the text
  • the time is expressed in the form of a marked start frame and at least one continuous frame of the visual media content; when the visual media content is played, the at least one narration narration information is presented according to the presentation time, including: Starting from playing the start frame, continue to present the corresponding text until the at least one continuous frame is played.
  • the visual media content is a video clip or a set of images; correspondingly, when the narration narration information is original audio, audio converted from text, or combined audio and text, the audio
  • the presentation time is expressed in the form of a marked start frame and duration of the visual media content; when the visual media content is played, the at least one narration narration information is presented according to the presentation time, including: Playing the start frame starts to play the audio until the image or video frame within the duration is finished playing.
  • the number of continuous frames in the duration of the audio converted from the text is less than the number of continuous frames of the corresponding text.
  • the visual media content is an image, a group of images or videos
  • the narration narration information is original audio, audio converted from text, or combined audio and text; accordingly, the When playing the visual media content, presenting the at least one narration narration information according to the presentation time includes: playing the audio repeatedly when the piece of image is played or the group of images is displayed statically ; When playing the group of images or the video at a certain frame rate, the audio is played in an asynchronous manner.
  • presenting the at least one narration narration information according to the presentation time includes: playing the visual media content when the narration narration switch is turned on, And according to the presentation time, the at least one narration narration information is presented.
  • the method further includes: when the narration narration switch is turned off, playing the visual media content without presenting the at least one narration narration information.
  • the method further includes: when the narration narration switch is in the off state, playing the visual media content, and turning off the narration narration information corresponding to some attributes in the at least one narration narration information, Present the rest of the narrative narration information.
  • the presenting the narration narration information includes: when the narration narration information is original text or text converted from audio, superimposing the text on a playback screen of the visual media content The text is presented on top, or the text is presented in another window independent of the playback window of the visual media content, or the original text is converted into audio for playback.
  • the converting the original text into audio for playback includes: in a case where the visual media content has audio, mixing and playing the audio belonging to the narration narration and the audio belonging to the visual media content Or, stop playing the audio belonging to the visual media content and play the audio belonging to the narration narration separately.
  • the presenting the narration narration information includes: when the narration narration information is original audio or audio converted from original text, and the visual media content has audio, it will belong to the narration narration.
  • the audio is mixed and played with the audio belonging to the visual media content, or the audio belonging to the visual media content is stopped and the audio belonging to the narration narration is played separately, or the original audio is converted into text and then presented.
  • the presenting the narration narration information includes: when the narration narration information is a combined text and audio, presenting the text and the audio simultaneously or separately.
  • the presenting the narration narration information includes: providing a first option unit for the user to select the narration narration when the visual media content has not been played and the presentation time of the next narration narration information is reached The playback status of the information; and when the visual media content is played and the narration narration information is not played, a second option unit is provided for the user to select the playback status of the narration narration information; and present according to the selected option The narration narration information.
  • the presenting the narration narration information according to the selected option includes: freezing the playback of the visual media content when the first option of the first option unit is selected, Until the narration narration information is played, continue to play the next narration narration information and the visual media content; when the second option of the first option unit is selected, the playback of the narration narration information is ended and the next narration information is played. Narrate the narration information; when the third option of the second option unit is selected, the visual media content is played in a loop.
  • freezing of the playback of the visual media content refers to stopping at the current frame of the visual media content, rather than disappearing from the display interface.
  • the loop playback of the visual media content includes loop playback of the entire content of the visual media content, or loop playback of marked frame images in the visual media content.
  • the method further includes: obtaining registration information of the at least one narration narration information from the media file or the bitstream; when presenting the narration narration information, presenting the corresponding registration information .
  • the presenting the corresponding registration information when the narration narration information is presented includes: when the narration narration information is presented, displaying a trigger button of a drop-down menu; and when the trigger button receives a trigger operation When the time, the option of whether to play the registration information is displayed; when the option of instructing the play of the registration information receives an operation, the registration information is presented.
  • the registration information of the narration narration information includes at least one of the following: the name of the narrator, the creation date and time, and the ownership information of the visual media content.
  • presenting the at least one narration narration information according to the presentation time includes: playing the video media content in the background and according to the presentation time , Present the at least one narration narration information in the foreground.
  • the method further includes: receiving a new code stream, obtaining new narration narration information of the visual media content from the new code stream; and presenting the new narration narration information.
  • the presenting the new narration narration information includes: displaying an option of whether to play the new narration narration information; when an operation is received to indicate the option of playing the new narration narration information, presenting The new narration narration information.
  • the parsing the code stream to obtain the media file or bit stream sent by the encoder includes: parsing the code stream to obtain a media file or bit stream conforming to a preset data structure; wherein, the preset data structure Including at least one of the following: general data structure and ISO-BMFF data structure.
  • the ISO-BMFF data structure includes at least a narration and narration metadata box, and the narration and narration metadata box includes a narration and narration metadata processing box and a narration and narration application box;
  • the method further includes: processing the metadata of the current narration narration information through the narration narration metadata processing box; and describing at least one of the following narration information through the narration narration application box: The starting position, the data length of the current narration narration information, and the total number of narration narration information.
  • the narration narration application box includes a narration narration description box
  • the method further includes: using the narration narration description box to describe at least one of the following narrative information: text encoding standard, narrator name, creation Date, creation time, ownership mark of attached visual content, type of narration narration information, coding standard of narration narration information, and text length of narration narration information.
  • the method further includes: if a narration narration metadata box does not exist at the file level of the visual media content, creating the narration narration metadata box, and describing the content through the narration narration metadata box. At least one narration narration information; if the visual media content has a narration narration metadata box at the file level, create the narration narration metadata box in the meco container box, and describe the narration narration metadata box through the narration narration metadata box At least one narrative narration message.
  • obtaining the narration narration information from the media file or the bitstream includes: in the case that the narration narration information is a text type, from the media file according to a preset text decoding standard Or the narration narration information is obtained by decoding in the bitstream; wherein the preset text decoding standard is one of the following: UTF-8, UTF-16, GB2312-80, GBK, and Big 5.
  • the preset text decoding standard may also be any other predefined standard.
  • obtaining the narration narration information from the media file or the bitstream includes: in the case that the narration narration information is an audio type, according to a preset audio decoding standard, from the media file Or the narration narration information is obtained by decoding in the bitstream; wherein the preset audio decoding standard is one of the following: AVSaudio, MP3, AAC, and WAV.
  • the preset audio decoding standard may also be any other predefined standard.
  • the narration narration information can be text, audio, or both, and the narration narration information is written into a digital media file or bitstream together with the original visual media content (including audio data), and can be combined with digital media Show or play back together.
  • users such as photographers or spectators can record their emotions while capturing or watching videos or images.
  • the technology supports narration narration from multiple users, and registers each narration narration entry by user name (namely narrator), creation date, and creation time.
  • the narration and narration information together with the related registration information is used as a data set, and the specific data structure of the original visual and audio content is not changed, and it is stored in a digital media file or a bitstream.
  • a technique that can add narrative and narration information to visual media content such as digital images or videos, and save them in a format that facilitates communication and storage.
  • the narration narration information can be in text or audio format, or both. It can be displayed or played back together with the visual media content. You can also choose to turn off the narration narration and only display the original visual media content.
  • This technology allows users to express, share, and exchange their emotions about visual themes by embedding narrative narration information in digital media files or bitstreams, thereby enhancing the user's viewing experience of digital media and promoting the participation of generations of viewers.
  • the technology described in the embodiments of the present application is specifically used to record, share, and exchange emotional comments between the creator of visual media content and the audience.
  • this technology allows users to record narration narration information without changing the original visual media content.
  • users can choose to watch or listen to the narration narration during the playback of visual media content.
  • the narration narration system described in the embodiment of this application consists of two parts: an encoder and a decoder.
  • the entire system can be implemented by applications or software running on electronic devices, which can capture and display digital images, or record and play digital videos.
  • a device can be a smart phone, tablet computer, computer or laptop, or a television.
  • the device obtains the narration narration information from the narrator of the narration narration, and embeds the narration narration information as a data segment of a specific format into the media files or bitstreams used for images and videos.
  • the encoding process does not change the original visual media content and its related data.
  • the decoder side when the image is displayed or the video is played back, the device extracts the narration and narration information from the media file or bitstream, and then presents the narration and narration information to the audience.
  • the narration and narration information can be in the form of text, audio, or both.
  • the narration narration of the text type is represented by text data, while the narration narration of the oral narration is saved as an audio clip.
  • the system supports UTF-8, UTF-16, GB2312-80, GBK, Big 5, AAC, MP3, AVS-audio, WAV and other standard text and audio coding.
  • the narration narration and presentation time information added by the player can also be written into the narration narration information set according to the video playback time.
  • the narration narration information is added and the time that the player should present according to the video playback time is also written into the narration narration information set.
  • the time information can be represented by the video frame number in the order of presentation relative to the start of the video.
  • the frame number of the corresponding image is also recorded in the narration information set.
  • the narration narration information recorded in the data segment also includes the name of the creator of the narration narration, the creation date and time, and/or the ownership of the visual media content.
  • the system supports a variety of narrative narrations, which can come from visual media content creators (such as photographers), or from viewers, or from organizations that own the content and want to add comments. Therefore, users can generate and add narrative narration during the process of visual media content collection, editing, and presentation. Each narration narration entry is recorded in a specific data structure with the name of the creator.
  • the ownership flag is used to indicate whether the user owns the video.
  • the system also allows users to add new narration narrations after existing narration narrations associated with the same narration narration time. If this happens, add the new narration narration after the existing previous narration narration.
  • the technology also supports narration narration with a text part and an audio part. In this case, save the audio data after saving the text part.
  • the basic function of the decoder/player in the system is to first parse and decode the narration narration information set from the visual media file or bitstream, and then present the narration narration at the time specified by the narration narration information set.
  • the player can turn the narration narration on or off with a switch.
  • the narration narration is turned off, the original visual media content will be played without any changes.
  • the precise narration narration display format is the choice of the decoder/player software.
  • text-based narration narration can be displayed as subtitles superimposed on the visual media content, or displayed in a separate text area, and can even be played as an audio signal after being synthesized by a software decoder/player.
  • the audio signal can be mixed and played with the original audio track, or can be played separately by turning off the original audio track.
  • the voice narration can be played as an audio signal independent of the image or video, and can be mixed with the original audio track in the video.
  • the voice narration narration can also be displayed in text after being converted by the software player. When the narration narration has a text part and an audio part in the data set, these two parts should be presented together or separately.
  • the narration narration is too long to be presented before the next narration narration time or before the end of the video, there are multiple options for decoder construction to provide users with maximum flexibility and enhance the viewing experience.
  • the decoder can freeze the video playback at the next narration narration time, and at the same time, all the playback of the narration narration corresponding to the current narration narration time ends, and then continue the video playback and the presentation of the next narration narration. If the narration narration is too long, the decoder can choose to skip the rest of the narration narration to make the video play smoothly without freezing. The decoder can also play the original video in a loop mode while playing the narration narration.
  • the narration narration prompt function can be built into the player. When there is a narration narration associated with the visual media content being played, the narration narration prompt function will display the narration narration prompt message instead of displaying the actual narration narration. Then, the viewer can decide whether to turn on the narration narration playback function, and view or listen to the actual narration narration.
  • the player may be constructed with a drop-down menu with options for displaying additional information (such as the name of the narrator and the creation date of the narration narration) according to the viewer's request.
  • the narration narration playback mode the player can play the narration narration in the foreground and the original media (that is, visual media content) in the background.
  • the viewer can freeze the video, or simply repeat the video clip marked with "duration", or delete the entire narration narration when reviewing the narration.
  • the video loops as the background.
  • the player can have several options. For example, the player can display a new narration narration prompt message and allow the viewer to control the presentation of the new narration narration. In this case, the viewer can choose to switch to the narration narration mode again and let the media play as the background or freeze the media.
  • ANL is equal to narrative_author_name_length, in bytes.
  • f(n) Represents a bit string of a fixed pattern, using n bits written from left to right, where the left bit is first;
  • b(8) Represents a byte, which can have any bit string mode (8 bits);
  • u(n) Represents an unsigned integer using n bits.
  • narrative_data_start_code is used to specify a four-byte bit string with a fixed pattern that describes the starting position of the narration information in the bit stream. It usually consists of a three-byte start code prefix with a unique sequence, followed by a byte dedicated to specifying the narration information;
  • number_of_narrative_point is used to specify the total number of positions or frames in a video or a group of images.
  • the position or frame is designated as narrative_entry_time, or for a group of images, the position or frame is designated as narrative_starting_frame_number.
  • narrative_entry_time or for a group of images
  • narrative_starting_frame_number Whenever a new narrative_entry_time or a new narrative_starting_frame_number is added, the value should be increased by 1. If the original media file has only one image, number_of_narrative_point is set to 1.
  • narrative_entry_time is used to specify the time when the narrative narration is added and should be presented. This value is represented by the relevant frame number in the video in the order of presentation. The first frame in the video is set to its frame number equal to zero. This syntax element only exists when the original media is a video. If the duration of the narration is greater than 1, then the frame number shall be regarded as the starting frame number.
  • narrative_duration is used to specify the number of frames that the narrative narration will last when the original media is a video or a set of images. If the narration narration is an audio clip, narrative_duration is equal to the playback length of the audio signal. When the text narration narration is synthesized and played as an audio signal, the playback of the audio signal should be completed within narrative_duration. When an audio narration narration is converted to a text narration narration, the narration narration should be presented as a whole for each frame for the entire duration of the audio playback time.
  • Narrative_starting_frame_number is used to specify the frame number of the picture already described in the picture group. This syntax element only exists when the original media is a set of images. If the duration of the narration is greater than 1, then the frame number shall be regarded as the starting frame number.
  • number_of_narratives is used to specify the total number of narrative narrative information items.
  • the new narration narration information should be added to the bottom of the list, immediately after all the previous narration narration information, and the value of number_of_narratives should be increased by one.
  • Text_encoding_standard_id is used to specify the text coding standards (Text coding standards) used for the narrator's name and text narrative information in the narration narration data segment.
  • Table 2 shows an example of a code value suitable for a general text encoding standard provided by an embodiment of the present application.
  • the first column is general text coding standards (Text coding standards)
  • the second column is an example of code values of text coding standards (text_encoding_standard_id value).
  • the text data segment is encoded using the following preset text encoding standard
  • the preset text encoding standard includes at least one of the following: UTF-8, UTF-16, GB2312-80, GBK And Big 5.
  • the preset text encoding standard can also be any other predefined standard. Any standard encoding in the text encoding standard can be used here. .
  • narrative_author_name_length is used to specify the length of narrative_author_name, in bytes.
  • narrative_author_name is used to specify the name of the narrator, where the narrator can be an individual or a group organization.
  • narrative_creation_date is used to specify the date when narrative narration information is added. Any standard expression of date can be used here. For example, a date can be represented in a numeric format that uses 4 digits to represent the year, then 2 digits to represent the month, and then 2 digits to represent the day. For example, September 21, 2019 is 20190921, and October 30, 2019 is 20191030. In this expression, one byte is used for every two digits.
  • narrative_creation_time is used to specify the time to add narrative narration information. Any standard expression of time can be used here. For example, time can be expressed as: hh:mm:ss.TZ, where each digit of hh (hour), mm (minute), and ss (second) uses one byte, and TZ (time zone) uses eight-bit encoding.
  • visual_content_ownership_flag 1 means that the narrator owns visual media content.
  • visual_content_ownership_flag 0 means that the narrator of the narration narration item does not own the visual media content.
  • Narrative_data_type is used to specify the type of narrative narration information (ie data format), equal to 0 means that the narration is in text format, narrative_data_type equal to 1 means that the narration is in audio format, and narrative_data_type equal to 2 means that the narration narration is in text and audio format.
  • text_narrative_data_length is in bytes and is used to specify the length of the text narrative narration information.
  • narrative_audio_codec_id is used to specify the audio codec used when encoding the audio narration narration.
  • Table 4 shows an example of a code value suitable for a general audio codec provided by an embodiment of the present application.
  • the first column is an audio codec (audio codec)
  • the second column is an example of the code value of the audio codec (narrative_audio_codec_id value).
  • the audio clips may be encoded using one of the following general audio standards: AVS audio, MP3, AAC, and WAV.
  • audio_narrative_data_length is used to specify the length of audio narrative narration information, in bytes, the default value is 0.
  • text_narrative_data is used to represent the actual narrative narration information in text format.
  • audio_narrative_data is used to represent the actual audio narrative narration information.
  • ISO-BMFF is widely used in the industry as a container format for visual media content, such as videos, still images, and image groups.
  • the most popular video streaming and storage format today is the MP4 format, which is fully compliant with ISO-BMFF.
  • the narrative data structure is described as suitable for original visual media content encapsulated by the ISO-BMFF file format.
  • This data structure fully complies with the metadata format in ISO-BMFF, and it can be embedded in the ‘File’ box at the ‘file’ level in the ISO-BMFF file format or in the ‘moov’ box at the ‘movie’ level.
  • the narration narration information described is organized in three hierarchical layers, which makes it easy for software implementation to create, edit, and play narration narration for existing media files.
  • this figure shows the overall structure of an ISO-BMFF file with a suggested narration narration metadata section.
  • the standard ISO-BMFF file has a "ftyp” box (that is, the file type box 401 shown in FIG. 4), and a "moov” box (that is, one of the narration narration metadata boxes 402 shown in FIG.
  • the actual text content (text_narrative_data) or audio content (audio_narrative_data) representing the narration narration content is stored in the narration narration data segment in the "mdat” box.
  • this data segment immediately follows the original visual data segment.
  • this "meta (for narration)" box can also be placed in a “moov” box with exactly the same structure.
  • FIG. 4 is a schematic diagram of the structure of an ISO-BMFF file according to an embodiment of the application.
  • the structure 400 includes: a file type box (ie, "ftyp box”). ) 401, narration narration metadata box (indicated by meta) 402 and media data box (ie "mdat box”) 403; among them,
  • the file type box 401 is used to contain information indicating the type of the ISO-BMFF file
  • the narration and narration metadata box 402 is used to store the metadata of media data (that is, visual media content) and the metadata of the narration and narration information of the visual media content; wherein, the narration and narration information is the user's comments on the visual media content.
  • the emotional expression of the subject content is used to store the metadata of media data (that is, visual media content) and the metadata of the narration and narration information of the visual media content; wherein, the narration and narration information is the user's comments on the visual media content.
  • the emotional expression of the subject content is used to store the metadata of media data (that is, visual media content) and the metadata of the narration and narration information of the visual media content; wherein, the narration and narration information is the user's comments on the visual media content.
  • the emotional expression of the subject content is used to store the metadata of media data (that is, visual media content) and the metadata of the narration and narration information of the visual media content; wherein, the narration and narration information is the user's comments on the visual media content.
  • the emotional expression of the subject content is used
  • the visual media content can be a video, an image group, or an image.
  • the type of narration narration information is also not limited.
  • the narration narration information can be text, audio, or a combination of text and audio data.
  • the structure supports users to express their emotions about visual media content in the form of text or speech or a mixture of text and speech.
  • the media data box 403 is used to contain the visual media content and the narration and narration information.
  • the sequence of the visual media content and the narration narration information is not limited.
  • the narration narration information may be located after the visual media content.
  • the media data box 403 not only contains visual media content, but also contains narration and narration information of the visual media content.
  • the user s emotional expression of the visual media content (i.e. narration and narration information) is always in the same file as the visual media content, that is, in the ISO-BMFF file. In this way, as long as the user can obtain the visual media content, he can immediately record the response.
  • the emotion of the visual media content therefore, the structure makes it easier for users to add narrative narration, without the need to add narrative narration through additional specific applications; moreover, users only need to download the ISO-BMFF file to get the visual media content. Narrate aside.
  • FIG. 5A is a schematic diagram of the structure of an ISO-BMFF file according to an embodiment of the application.
  • the structure 500 includes: a file type box 401, a narration narration element A data box 402 and a media data box 403; among them, the narration and narration metadata box 402 includes:
  • the moov box 501 is used to contain the metadata of the visual media content.
  • the meta box 502 is used to contain the metadata of the narration and narration information of the visual media content.
  • the meta box 502 exists in the moov box 501 in the form of a file level or a movie level.
  • the structure of the meta box 502 is shown in FIG. 5B.
  • the syntax of the meta box 502 is shown in Table 1 below, and the meta box 502 is used to contain at least one of the information shown in Table 5.
  • the metadata structure can be added at the file level or in the "moov” box at the Movie level;
  • box_size is the total size of the box 502, in bytes
  • the box_type is set to "meta", which is 4 lowercase characters, indicating that this is a narration narration metadata box;
  • narration_application_box() Represents the main box of the narration narration application format, which is included in the narration narration metadata box. The detailed description is shown in Table 7 below.
  • box_size is the total size of the box, in bytes
  • box_type is designated as "hdlr” (ie lowercase 4 characters) to indicate that this is a narration metadata processing box;
  • the handler_type is designated as "napp” (ie lowercase 4 characters), indicating that the metadata processing box "napp” will be used to define the media narration application format;
  • the version, flags, predefined, reserved and name fields can be set according to ISO-BMFF requirements. Version, flags, predefined, reserved and name fields can be set.
  • Table 7 describes the syntax of the narration application box
  • box_size is the total size of the box, in bytes
  • the box_type is designated as "napp" (ie lowercase 4 characters), which is used to indicate that this is the metadata box format defined for the narration narration application;
  • media_type indicates the format of the visual media content.
  • the example definition is as follows (Note: The media type defined by ISO-BMFF for still images and image groups can also be used here);
  • Video "vide” (ie 4 lowercase characters);
  • Image group "picg” (ie 4 lowercase characters).
  • narrative_data_starting_location indicates the starting position of the current narration narration information in the "mdat" box associated with the original visual media content file, in bytes;
  • narrative_data_total_length indicates the total amount of narration narration information in the "mdat" box. This value should be updated whenever new narrative narration information is added to the ISO-BMFF file. Add this value to narrative_data_starting_location to get the starting position of the next narration narration information, which will simplify the software implementation of the narrative narration process;
  • number_of_narration_points specifies the total number of positions or frames that have been designated as narration points in a video or a group of images, that is, the total number of image frames that have been designated with narration narration in the visual media content. Whenever new narration narration information is added to an image frame that does not have any narration narration information, the value should be updated, for example, the value is increased by 1. If the visual media content has only one image (for example, in the case of a still image), the value is set to 1;
  • Narration_point is defined as the frame number of the narrated frame in the video or a group of images, that is, the frame number of the image frame that has been assigned narration narration in the visual media content. If the duration of the narration is greater than 1, the frame number shall be regarded as the starting frame number. If number_of_narration_points is greater than 1, then narration_points should be arranged in ascending order. Note: The syntax elements are similar to narrative_entry_time and narrative_starting_frame_number in Table 7;
  • narration_point_description() This is a box containing information about narration_point. That is, the narration point description box, which can contain at least one of the information shown in Table 8 below.
  • Table 8 describes the syntax of the point description box
  • box_size is the total size of the box, in bytes
  • number_of_narratives specifies the total number of narrative information items.
  • the new narration narration information should be added to the bottom of the list, immediately after all the previous narration narration information, and the value of number_of_narratives should be increased by one.
  • narrative_duration specifies the number of frames that the current narrative narration information will last when the original media is a video or a group of images. If the media is a still image and media_type is equal to "imag", then narrative_duration should be set to 1. If the narration narration information is an audio clip, narrative_duration is equal to the playback length of the audio signal. When the narrative narration information of the text type is synthesized and played as an audio signal, the playback of the audio signal should be completed within narrative_duration. When the narrative narration information of the audio type is converted to the text type, the narration narration information should be presented as a whole for each frame during the entire duration of the audio playback time;
  • narrative_data_location indicates the starting position of the narrative narration information in the "mdat” box relative to the narrative_data_starting_location specified in the narration narration application box, that is, the current narration narration information is relative to the The position of the starting position;
  • narrative_data_length indicates the length of the current narrative data, in bytes
  • narrative_description() This is a box containing information about narration_point, that is, the description box of the current narration narration information.
  • the box can contain at least one of the information shown in Table 9 below.
  • Table 9 The syntax of the description box currently narrating the narration information
  • box_size is the total size of the box, in bytes
  • text_encoding_standard_type is used to describe the text encoding standard of the narrator's name. Its definition is the same as "text_encoding_standard_id" in Table 7 and Table 8. If the narration narration is in text format, the text encoding standard specified here also applies to the encoding of the narration narration content;
  • narrative_author_name_length is used to specify the length of narrative_author_name, in bytes
  • narrative_author_name is used to specify the name of the person or entity who created the current narrative narration information. It should be noted that n in the table is equal to narrative_author_name_length;
  • narrative_creation_date is used to specify the date when narrative narration information is added. Its definition is the same as Table 2;
  • narrative_creation_time is used to specify the time of the narration narration information. Its definition is the same as Table 2;
  • media_ownership_flag 1 means that the narrator owns visual media content
  • media_content_ownership_flag 0 means that the narrator of the narration narration item does not own visual media content
  • audio_encoding_type is used to specify the encoding standard for the narration and narration information of the audio format. Any coding standard can be used here.
  • audio_encoding_type can be defined as follows:
  • text_narrative_data_length is a text part of a narration narration with a text part and an audio part.
  • the text part will first be saved in "mdat”, and then the audio data will be saved.
  • the length of the audio data is equal to narrative_data_length minus text_narrative_data_length in the description box of the image frame describing the narration information.
  • the decoder provided by the embodiment of the present application includes the modules included and the units included in each module.
  • FIG. 6 is a schematic structural diagram of the decoder according to the embodiment of the present application, as shown in FIG. 6,
  • the decoder 600 includes: a decoding module 601 and a playback module 602; among them,
  • the decoding module 601 is used to parse the code stream to obtain at least one narration narration information of the visual media content and the corresponding presentation time;
  • the playing module 602 is configured to present the at least one narration narration information according to the presentation time when the visual media content is played.
  • the decoding module 601 is configured to: parse the code stream to obtain a media file or bit stream sent by the encoder; from the media file or the bit stream, obtain visual media content and at least one narration narration information And the corresponding presentation time.
  • the visual media content is a video or a set of images; accordingly, when the narration narration information is original text, text converted from audio, or combined audio and text, the presentation of the text
  • the time is expressed in the form of a marked start frame and at least one continuous frame of the visual media content; the playing module 602 is configured to: from the beginning of playing the start frame, continue to present the corresponding text until the at least one continuous frame Finished playing.
  • the visual media content is a video clip or a set of images; correspondingly, when the narration narration information is original audio, audio converted from text, or combined audio and text, the audio The presentation time is expressed in the form of the marked start frame and duration of the visual media content; the playback module 602 is configured to: play the audio from the start frame to the image or video within the duration The frame has finished playing.
  • the number of continuous frames in the duration of the audio converted from the text is less than the number of continuous frames of the corresponding text.
  • the visual media content is an image, a group of images or videos
  • the narration narration information is original audio, audio converted from text, or combined audio and text
  • the playback module 602 Used to: repeatedly play the audio when playing the one image or statically displaying the set of images; when playing the set of images or the video at a certain frame rate, use non The audio is played in a synchronized manner.
  • the playing module 602 is configured to: play the visual media content when the narration narration switch is turned on, and present the at least one narration narration information according to the presentation time.
  • the playing module 602 is configured to: when the narration narration switch is in the off state, play the visual media content without presenting the at least one narration narration information.
  • the playback module 602 is configured to: in the case that the narration narration information is original text or text converted from audio, superimpose the text on the playback screen of the visual media content for presentation Or, present the text in another window independent of the playback window of the visual media content, or convert the original text into audio for playback.
  • the playing module 602 is configured to: when the visual media content has audio, mix and play the audio belonging to the narration narration with the audio belonging to the visual media content, or stop playing the audio belonging to the visual media content.
  • the audio of the visual media content is played separately and belongs to the audio of the narration narration.
  • the playback module 602 is configured to: when the narration narration information is original audio or audio converted from original text, and the visual media content has audio, combine the audio belonging to the narration narration with the audio belonging to the narration narration.
  • the audio of the visual media content is mixed and played, or the audio belonging to the visual media content is stopped and the audio belonging to the narration narration is played separately, or the original audio is converted into text and then presented.
  • the playing module 602 is configured to present the text and the audio simultaneously or separately when the narration narration information is a combined text and audio.
  • the playing module 602 is configured to provide a first option unit for the user to select the playing state of the narration narration information when the visual media content has not been played and the presentation time of the next narration narration information is reached. And when the visual media content is played and the narration narration information is not played, a second option unit is provided for the user to select the playing state of the narration narration information; and according to the selected option, the narration narration is presented information.
  • the playing module 602 is configured to: when the first option of the first option unit is selected, freeze the playing of the visual media content until the narration and narration information is finished playing, and then continue to play the next one.
  • the narration narration information and the visual media content when the second option of the first option unit is selected, stop playing the narration narration information and start playing the next narration narration information; in the second option unit of the second option unit When the three options are selected, the visual media content is played in a loop.
  • the playing module 602 is configured to: cyclically play the entire content of the visual media content, or cyclically play the marked frame images in the visual media content.
  • the playing module 602 is configured to: obtain registration information of the at least one narration narration information from the media file or the bitstream; when presenting the narration narration information, present the corresponding registration information.
  • the playing module 602 is configured to: display the trigger button of the drop-down menu when the narration narration information is presented; when the trigger button receives a trigger operation, display the option of whether to play the registration information; when instructed When the option of playing registration information receives an operation, the registration information is presented.
  • the registration information of the narration narration information includes at least one of the following: the name of the narrator, the creation date and time, and the ownership information of the visual media content.
  • the playing module 602 is configured to: play the video media content in the background, and present the at least one narration narration information in the foreground according to the presentation time.
  • the decoding module 601 is also used to receive a new code stream and obtain new narration narration information of the visual media content from the new code stream; the playback module 602 is also used to present the New narrative narration information.
  • the playing module 602 is configured to: display an option of whether to play the new narration narration information; and present the new narration narration when an operation is received to instruct the option to play the new narration narration information information.
  • the decoding module 601 is configured to: parse the code stream to obtain a media file or bit stream conforming to a preset data structure; wherein, the preset data structure includes at least one of the following: general data structure and International Organization for Standardization-based on the data structure of the media file format ISO-BMFF.
  • the ISO-BMFF data structure includes at least a narration and narration metadata box
  • the narration and narration metadata box includes a metadata processing box and a narration and narration application box
  • the decoding module 601 is used to: From the metadata processing box of the file or the bitstream, obtain the metadata of the current narration and narration information; from the narration and narration application box of the media file or the bitstream, obtain at least one of the following: The starting position, the length of the current narration narration information, and the total number of narration narration information.
  • the narration narration application box includes a narration narration description box
  • the method further includes: decoding through the narration narration description box to obtain at least one of the following narrative information: text encoding standard, narrator name, Creation date, creation time, ownership mark of auxiliary visual content, type of narration narration information, coding standard of narration narration information, and text length of narration narration information.
  • the decoding module 601 is further configured to: if there is no narration narration metadata box at the file level of the visual media content, obtain the narration narration metadata box, and pass the narration narration metadata box Decode to obtain the at least one narration narration information; if the visual media content has a narration narration metadata box at the file level, obtain the narration narration metadata box from the meco container box, and pass the narration narration metadata box The at least one narration narration information is obtained by decoding.
  • the decoding module 601 is configured to decode the narration narration from the media file or the bitstream according to a preset text decoding standard when the narration narration information is a text type.
  • the preset text decoding standard is one of the following: UTF-8, UTF-16, GB2312-80, GBK, and Big 5.
  • the preset text decoding standard may also be any other predefined standard.
  • the decoding module 601 is configured to decode the narration from the media file or the bitstream according to a preset audio decoding standard when the narration narration information is an audio type.
  • Narration information wherein, the preset audio decoding standard is one of the following: AVS audio, MP3, AAC, and WAV.
  • the preset audio decoding standard may also be any other predefined standard.
  • the encoder provided by the embodiment of the present application includes each module included and each unit included in each module.
  • FIG. 7 is a schematic structural diagram of the encoder according to the embodiment of the application, as shown in FIG. 7,
  • the encoder 700 includes: a determining module 701, an embedding module 702, and an encoding module 703; among them,
  • the determining module 701 is configured to determine at least one piece of narration narration information to be added and the corresponding presentation time;
  • the embedding module 702 is configured to embed the at least one narration narration information and the corresponding presentation time into the visual media content in a preset manner without changing the visual media content corresponding to the at least one narration narration information. From the media file or bit stream, get a new media file or new bit stream;
  • the encoding module 703 is configured to encode the new media file or the new bit stream to obtain a code stream.
  • the visual media content is a video or a set of images; accordingly, when the narration narration information is original text, text converted from audio, or combined audio and text, the presentation of the text
  • the time is expressed in the form of a marked start frame and at least one continuous frame of the visual media content.
  • the visual media content is a video clip or a set of images; correspondingly, when the narration narration information is original audio, audio converted from text, or combined audio and text, the audio
  • the presentation time is expressed in the form of a marked start frame and duration of the visual media content.
  • the number of continuous frames in the duration of the audio converted from the text is less than the number of continuous frames of the corresponding text.
  • the embedding module 702 is further configured to embed the registration information of the narration narration information into the media file or bitstream of the visual media content in a preset manner.
  • the registration information of the narration narration information includes at least one of the following: the name of the narrator, the creation date and time, and the ownership information of the visual media content.
  • the embedding module 702 is configured to store the at least one narration narration information in the starting position of the visual media content in a preset manner.
  • the determining module 701 is configured to: create narration narration information for at least one user of the visual media content, and obtain the at least one narration narration information.
  • the type of narration narration information includes at least one of the following: text type and audio type;
  • the type of visual media content includes at least one of the following: video, image, and image group, the image The group includes at least two images.
  • the embedding module 702 is configured to: create a text data segment; embed the current narration narration information into the visual media content in the form of a text data segment Media files or bitstreams.
  • the embedding module 702 is configured to: create an audio segment; embed the current narration narration information in the form of an audio segment into the media of the visual media content File or bitstream.
  • the embedding module 702 is configured to: convert the current narration narration information into narration narration information corresponding to the audio type, and create an audio segment;
  • the current narration narration information is embedded in the media file or bitstream of the visual media content in the form of audio fragments.
  • the embedding module 702 is configured to: convert the current narration narration information into narration narration information corresponding to the text type, and create a text data segment;
  • the current narration narration information is embedded in the media file or bitstream of the visual media content in the form of text data segments.
  • the determining module 701 is configured to determine that the type of the at least one narration narration information is a text type and/or an audio type when the type of the visual media content is an image or an image group; When the type of the media content is a video, it is determined that the type of the at least one narration narration information is a text type.
  • the embedding module 702 is configured to store the narration narration information corresponding to the audio type after the narration narration information corresponding to the text type if the type of the narration narration information includes a text type and an audio type. .
  • the determining module 701 is used to determine the new narration narration information to be added; the embedding module 702 is used to store the new narration narration information after the existing narration narration information.
  • the media file or bit stream conforms to a preset data structure; wherein, the preset data structure includes at least one of the following: general data structure and International Organization for Standardization-based media file format ISO-BMFF data structure
  • the embedding module 702 is configured to embed the at least one narration narration information and the corresponding presentation time into the media file or bitstream of the visual media content in the form of the preset data structure.
  • the ISO-BMFF data structure includes at least a narration and narration metadata box.
  • the narration and narration metadata box includes a narration and narration metadata processing box and a narration and narration application box; accordingly, the embedded module 702 also uses Yu: Through the narration and narration metadata processing box, process the metadata of the current narration and narration information; through the narration and narration application box, describe at least one of the following narration information: the start position of the current narration and narration information, and the current narration and narration information The length of the data and the total number of narration narration messages.
  • the narration narration application box includes a narration narration description box
  • the embedded module 702 is further used to describe at least one of the following narrative information through the narration narration description box: text encoding standard, narrator name , Creation date, creation time, ownership mark of auxiliary visual content, type of narration narration information, coding standard of narration narration information, and text length of narration narration information.
  • the embedding module 702 is further configured to: if there is no narration narration metadata box at the file level of the visual media content, create the narration narration metadata box, and pass the narration narration metadata box Describe the at least one narration narration information; if the visual media content has a narration narration metadata box at the file level, create the narration narration metadata box in the meco container box and describe it through the narration narration metadata box The at least one narration narration information.
  • the text data segment is encoded using a preset text encoding standard
  • the preset text encoding standard includes at least one of the following: UTF-8, UTF-16, GB2312-80, GBK, and Big 5.
  • the preset text encoding standard can also be any other predefined standard.
  • the audio clip is encoded using a preset audio coding standard
  • the preset audio coding standard includes at least one of the following: AVSaudio, MP3, AAC, and WAV.
  • the preset audio coding standard may also be any other predefined standard.
  • the embodiments of the present application if the above method is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium.
  • the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence or the parts that contribute to related technologies.
  • the computer software products are stored in a storage medium and include several instructions to enable The electronic device executes all or part of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), magnetic disk or optical disk and other media that can store program codes. In this way, the embodiments of the present application are not limited to any specific combination of hardware and software.
  • the embodiment of the present application provides a computer storage medium applied to the encoder 700, and the computer storage medium stores a computer program that, when executed by a processor, implements the method described in any one of the foregoing embodiments.
  • FIG. 8 shows a schematic diagram of a specific hardware structure of the encoder 700 provided in an embodiment of the present application.
  • it may include: a first communication interface 801, a memory 802, and a processor 803; various components are coupled together through a first bus system 804.
  • the first bus system 804 is used to implement connection and communication between these components.
  • the first bus system 804 also includes a power bus, a control bus, and a status signal bus.
  • various buses are marked as the first bus system 804 in FIG. 8. in,
  • the first communication interface 801 is used for receiving and sending signals in the process of sending and receiving information with other external network elements;
  • the memory 802 is configured to store a computer program that can run on the processor 803;
  • the processor 803 is configured to execute: when the computer program is running:
  • the memory 802 in the embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), and electrically available Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be a random access memory (Random Access Memory, RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • DDRSDRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • Enhanced SDRAM, ESDRAM Synchronous Link Dynamic Random Access Memory
  • Synchlink DRAM Synchronous Link Dynamic Random Access Memory
  • DRRAM Direct Rambus RAM
  • the processor 803 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 803 or instructions in the form of software.
  • the aforementioned processor 803 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC application specific integrated circuit
  • FPGA ready-made programmable gate array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 802, and the processor 803 reads the information in the memory 802, and completes the steps of the foregoing method in combination with its hardware.
  • the embodiments described in this application can be implemented by hardware, software, firmware, middleware, microcode, or a combination thereof.
  • the processing unit can be implemented in one or more application specific integrated circuits (ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing equipment (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processors, controllers, microcontrollers, microprocessors, and others for performing the functions described in this application Electronic unit or its combination.
  • ASIC application specific integrated circuits
  • DSP Digital Signal Processing
  • DSP Device digital signal processing equipment
  • PLD programmable Logic Device
  • PLD Field-Programmable Gate Array
  • FPGA Field-Programmable Gate Array
  • the technology described in this application can be implemented through modules (for example, procedures, functions, etc.) that perform the functions described in this application.
  • the software codes can be stored in the memory and executed by
  • the processor 803 is further configured to execute the method described in any one of the foregoing embodiments when the computer program is running.
  • FIG. 9 shows a schematic diagram of a specific hardware structure of the decoder 900 provided in an embodiment of the present application.
  • the decoder 900 may include: a second communication interface 901, a memory 902, and a processor 903; various components are coupled together through a second bus system 904.
  • the second bus system 904 is used to implement connection and communication between these components.
  • the second bus system 904 also includes a power bus, a control bus, and a status signal bus.
  • various buses are marked as the second bus system 904 in FIG. 9. in,
  • the second communication interface 901 is used for receiving and sending signals in the process of sending and receiving information with other external network elements;
  • the memory 902 is configured to store a computer program that can run on the processor 803;
  • the processor 903 is configured to execute: when the computer program is running:
  • Parse the code stream to obtain at least one narration narration information and the corresponding presentation time of the visual media content
  • the at least one narration narration information is presented according to the presentation time.
  • an embodiment of the present application provides a computer storage medium, wherein the computer storage medium stores a computer program, and when the computer program is executed by a processor, it implements the information processing method described in the encoding end, or Apply the method described in the decoding end of the embodiment.
  • An embodiment of the present application provides an electronic device, wherein the electronic device at least includes the encoder described in the embodiment of the present application and/or the decoder described in the embodiment of the present application.
  • the disclosed device and method may be implemented in other ways.
  • the above-described embodiments are merely illustrative.
  • the division of the modules is only a logical function division, and there may be other divisions in actual implementation, such as: multiple modules or components can be combined, or Integrate into another system, or some features can be ignored or not implemented.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or modules, and may be electrical, mechanical, or other forms. of.
  • modules described above as separate components may or may not be physically separate, and the components displayed as modules may or may not be physical modules; they may be located in one place or distributed on multiple network units; Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the embodiments of the present application may all be integrated into one processing unit, or each module may be individually used as a unit, or two or more modules may be integrated into one unit; the above-mentioned integration
  • the module can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • the foregoing program can be stored in a computer readable storage medium.
  • the execution includes The steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a mobile storage device, a read only memory (Read Only Memory, ROM), a magnetic disk, or an optical disk.
  • ROM Read Only Memory
  • the aforementioned integrated unit of the present application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium.
  • the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence or the parts that contribute to related technologies.
  • the computer software products are stored in a storage medium and include several instructions to enable The electronic device executes all or part of the method described in each embodiment of the present application.
  • the aforementioned storage media include: removable storage devices, ROMs, magnetic disks, or optical disks and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

提供了一种信息处理方法及编码器、解码器、存储介质设备。信息处理方法包括:解析码流,获得视觉媒体内容的至少一个叙述旁白信息和对应的呈现时间(201);在播放视觉媒体内容时,按照呈现时间,呈现至少一个叙述旁白信息(202)。

Description

信息处理方法及编码器、解码器、存储介质设备
相关申请的交叉引用
本申请要求以Haoping Yu的名义于2020年05月15日提交的、申请号为63/025,742的题为“A technology for narrating digital video”的在先美国临时专利申请的优先权,其全部内容通过引用结合在本申请中;以及
本申请要求以Haoping Yu的名义于2020年06月03日提交的、申请号为63/034,295的题为“A technology for narrating digital visual media”的在先美国临时专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及多媒体技术,涉及但不限于信息处理方法及编码器、解码器、存储介质设备。
背景技术
因为可用性和可承受性,智能手机已经成为最受欢迎的电子设备,拥有一部智能手机在今天不仅仅是必要的,而且已是一种常态。因此,智能手机对整个社会和文化产生了许多重大影响。人们生活方式的变化之一是,消费者使用智能手机拍照或拍摄视频,作为记录日常活动的一种方式,这已成为全球范围内的一个普遍趋势。
如今,人们觉得有必要捕捉生活中的每一个瞬间。消费者不仅用智能手机拍照或录制著名地标的视频,还通过“自拍”拍摄自己。随着社交媒体应用变得越来越流行,除了面对面或电话交谈外,当代人也开始学习通过照片和视频进行交流。他们会将拍摄的内容立即发送给朋友们,让他们看看自己在做什么。可见,图像和视频等视觉媒体内容已经成为一种表达信息和情感的方式。
然而,仅仅依赖于图像、图像组或者视频等视觉媒体内容,却不足以表达人们当时的情感。
发明内容
有鉴于此,本申请实施例提供的信息处理方法及编码器、解码器、存储介质设备,能够允许用户将对视觉媒体内容的情感表达(即叙述旁白信息)嵌入至视觉媒体内容的媒体文件(media file)或者比特流中,从而当用户想要在电子设备上回放该视觉媒体内容时,能够查看该视觉媒体内容的叙述旁白信息(narrative data);本申请实施例提供的信息处理方法及编码器、解码器、存储介质设备,是这样实现的:
第一方面,本申请实施例提供一种信息处理方法,所述方法包括:解析码流,获得视觉媒体内容的至少一个叙述旁白信息和对应的呈现时间;在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息。
第二方面,本申请实施例提供一种信息处理方法,所述方法包括:确定待添加的至少一个叙述旁白信息和对应的呈现时间;在不改变所述至少一个叙述旁白信息对应的视觉媒体内容的情况下,将所述至少一个叙述旁白信息和对应的呈现时间以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,得到新的媒体文件或新的比特流;对所述新的媒体文件或所述新的比特流进行编码,得到码流。
第三方面,本申请实施例提供一种解码器,所述解码器包括解码模块和播放模块;其中,所述解码模块,用于解析码流,获得视觉媒体内容的至少一个叙述旁白信息和对应的呈现时间;所述播放模块,用于在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息。
第四方面,本申请实施例提供一种解码器,所述解码器包括存储器和处理器;其中,所述存储器,用于存储能够在所述处理器上运行的计算机程序;所述处理器,用于在运行所述计算机程序时,执行本申请实施例所述的解码端的信息处理方法。
第五方面,本申请实施例提供一种编码器,所述编码器包括确定模块、嵌入模块和编码模块;其中,所述确定模块,用于确定待添加的至少一个叙述旁白信息和对应的呈现时间;所述嵌入模块,用于在不改变所述至少一个叙述旁白信息对应的视觉媒体内容的情况下,将所述至少一个叙述旁白信息和对应的呈现时间以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,得到新的媒体文件或新的比特流;所述编码模块,用于对所述新的媒体文件或所述新的比特流进行编码,得到码流。
第六方面,本申请实施例提供一种编码器,所述编码器包括存储器和处理器;其中,所述存储器,用于存储能够在所述处理器上运行的计算机程序;所述处理器,用于在运行所述计算机程序时, 执行本申请实施例所述的编码端的信息处理方法。
第七方面,本申请实施例提供一种计算机存储介质,其中,所述计算机存储介质存储有计算机程序,所述计算机程序被处理器执行时实现本申请实施例所述的方法。
第八方面,一种电子设备,其中,所述电子设备至少包括本申请实施例所述的编码器和/或本申请实施例所述的解码器。
在本申请实施例提供的信息处理方法中,能够允许用户将对视觉媒体内容的情感表达(即叙述旁白信息)嵌入至视觉媒体内容的媒体文件或比特流中,从而当用户想要在电子设备上回放该媒体视觉媒体内容时,能够查看关联的叙述旁白信息。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本申请的实施例,并与说明书一起用于说明本申请的技术方案。
图1为本申请实施例编码端的信息处理方法的实现流程示意图;
图2为本申请实施例解码端的信息处理方法的实现流程示意图;
图3为本申请实施例通用数据结构和国际标准化组织-基于媒体文件格式(International Organization for Standardization Base Media File Format,ISO-BMFF)的文件的结构的示意图;
图4为本申请实施例ISO-BMFF文件的结构的示意图;
图5A为本申请实施例ISO-BMFF文件的结构的示意图;
图5B为本申请实施例meta盒502的结构的示意图;
图6为本申请实施例解码器的结构示意图;
图7为本申请实施例编码器的结构示意图;
图8为本申请实施例的编码器的硬件实体示意图;
图9为本申请实施例的解码器的硬件实体示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面结合本申请实施例中的附图,对本申请的具体技术方案做进一步详细描述。以下实施例用于说明本申请,但不用来限制本申请的范围。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
需要指出,本申请实施例所涉及的术语“第一\第二\第三”是为了区别类似或不同的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
本申请实施例提供一种信息处理方法,所述方法可以应用于编码端,编码端对应的电子设备,该设备可以是任何具有编码能力的电子设备,电子设备可以是手机、个人计算机、笔记本电脑、电视机、服务器等。所述信息处理方法所实现的功能可以通过所述电子设备中的处理器调用程序代码来实现,当然程序代码可以保存在计算机存储介质中。可见,所述电子设备至少包括处理器和存储介质。
图1为本申请实施例信息处理方法的实现流程示意图,如图1所示,所述方法可以包括以下步骤101至步骤103:
步骤101,确定待添加的至少一个叙述旁白信息和对应的呈现时间;
步骤102,在不改变所述至少一个叙述旁白信息对应的视觉媒体内容的情况下,将所述至少一个叙述旁白信息和对应的呈现时间以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,得到新的媒体文件或新的比特流;
步骤103,对所述新的媒体文件或所述新的比特流进行编码,得到码流。
在一些实施例中,所述视觉媒体内容为视频或一组图像;相应地,在所述叙述旁白信息为原始文本、由音频转换的文本或组合的音频和文本时,所述文本的呈现时间以所述视觉媒体内容的被标记的开始帧和至少一个持续帧的形式表示。
在一些实施例中,所述视觉媒体内容为视频片段或一组图像;相应地,在所述叙述旁白信息为原始音频、由文本转换的音频或组合的音频和文本时,所述音频的呈现时间以所述视觉媒体内容的被标记的开始帧和持续时长的形式表示。
在一些实施例中,所述由文本转换的音频的持续时长内的持续帧数少于对应文本的持续帧数。
在一些实施例中,所述方法还包括:将所述叙述旁白信息的注册信息以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
在一些实施例中,所述叙述旁白信息的注册信息包括以下至少之一:叙述者姓名、创建日期和时间、所述视觉媒体内容的所有权信息。
在一些实施例中,将所述至少一个叙述旁白信息以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,包括:将所述至少一个叙述旁白信息以预设方式存储在所述视觉媒体内容的起始位置。
在一些实施例中,所述确定待添加的至少一个叙述旁白信息,包括:为所述视觉媒体内容的至少一个用户创建叙述旁白信息,获得所述至少一个叙述旁白信息。
在一些实施例中,所述叙述旁白信息的类型包括下述至少一项:文本类型和音频类型;所述视觉媒体内容的类型包括下述至少一项:视频、图像和图像组,所述图像组包括至少两张图像。
在一些实施例中,在当前叙述旁白信息的类型为文本类型时,所述方法还包括:创建文本数据段;相应地,将所述至少一个叙述旁白信息以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,包括:将所述当前叙述旁白信息以文本数据段的方式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
在一些实施例中,在当前叙述旁白信息的类型为音频类型时,所述方法还包括:创建音频片段;相应地,将所述至少一个叙述旁白信息以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,包括:将所述当前叙述旁白信息以音频片段的方式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
在一些实施例中,在当前叙述旁白信息的类型为文本类型时,所述方法还包括:将所述当前叙述旁白信息转换为音频类型对应的叙述旁白信息,并创建音频片段;相应地,将所述至少一个叙述旁白信息以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,包括:将所述当前叙述旁白信息以音频片段的方式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
在一些实施例中,在当前叙述旁白信息的类型为音频类型时,所述方法还包括:将所述当前叙述旁白信息转换为文本类型对应的叙述旁白信息,并创建文本数据段;相应地,将所述至少一个叙述旁白信息以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,包括:将所述当前叙述旁白信息以文本数据段的方式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
在一些实施例中,所述方法还包括:当所述视觉媒体内容的类型为图像或图像组时,确定所述至少一个叙述旁白信息的类型为文本类型和/或音频类型;当所述视觉媒体内容的类型为视频时,确定所述至少一个叙述旁白信息的类型为文本类型。
在一些实施例中,所述方法还包括:若所述叙述旁白信息的类型包括文本类型和音频类型,则将所述音频类型对应的叙述旁白信息存储在所述文本类型对应的叙述旁白信息之后。
在一些实施例中,所述方法还包括:确定待添加的新叙述旁白信息;将所述新叙述旁白信息存储在已有叙述旁白信息之后。
在一些实施例中,媒体文件或者比特流符合预设数据结构;其中,所述预设数据结构至少包括下述其中一项:通用数据结构和ISO-BMFF数据结构;相应地,所述将所述至少一个叙述旁白信息和对应的呈现时间以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,包括:将所述至少一个叙述旁白信息和对应的呈现时间以所述预设数据结构的形式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
在一些实施例中,所述ISO-BMFF数据结构至少包括叙述旁白元数据盒,所述叙述旁白元数据盒包括元数据处理盒和叙述旁白应用盒;其中,所述元数据处理盒包括当前叙述旁白信息的元数据;所述叙述旁白应用盒包括以下叙述信息的至少之一:当前叙述旁白信息的起始位置、当前叙述旁白信息的长度和叙述旁白信息的总数量。
在一些实施例中,所述叙述旁白应用盒包括叙述旁白描述盒,所述方法还包括:通过所述叙述旁白描述盒,解码获得以下叙述信息的至少之一:文本编码标准、叙述者姓名、创建日期、创建时间、附属视觉内容的所有权标志、叙述旁白信息的类型、叙述旁白信息的编码标准和叙述旁白信息的文本长度。
在一些实施例中,所述方法还包括:若所述视觉媒体内容在文件级别不存在叙述旁白元数据盒,则获取所述叙述旁白元数据盒,并通过所述叙述旁白元数据盒解码获得所述至少一个叙述旁白信息;若所述视觉媒体内容在文件级别存在叙述旁白元数据盒,则从meco容器盒中获取所述叙述旁白元数据盒,并通过所述叙述旁白元数据盒解码获得所述至少一个叙述旁白信息。
在一些实施例中,所述文本数据段采用预设文本编码标准进行编码,所述预设文本编码标准至少包括下述其中之一:UTF-8、UTF-16、GB2312-80、GBK和Big 5。当然,所述预设文本编码标准还可以是其他任何预定义的标准。
在一些实施例中,所述音频片段采用预设音频编码标准进行编码,所述预设音频编码标准至少包括下述其中之一:AVS audio、MP3、AAC和WAV。当然,所述预设音频编码标准还可以是其他任何预定义的标准。
本申请实施例再提供一种信息处理方法,所述方法可以应用于解码端,解码端对应的电子设备,该设备可以是任何具有解码能力和播放能力的电子设备,电子设备可以是手机、个人计算机、笔记本电脑、电视机、服务器等。所述信息处理方法所实现的功能可以通过所述电子设备中的处理器调用程序代码来实现,当然程序代码可以保存在计算机存储介质中。可见,所述电子设备至少包括处理器和存储介质。
图2为本申请实施例信息处理方法的实现流程示意图,如图2所示,所述方法可以包括以下步骤201至步骤202:
步骤201,解析码流,获得视觉媒体内容的至少一个叙述旁白信息和对应的呈现时间。
需要说明的是,在编码端,添加叙述旁白信息至视觉媒体内容的媒体文件或比特流中时,相应的指定的呈现时间也被添加进来。这样,在解码端进行叙述旁白信息的播放时,仅在相应的呈现时间内进行叙述旁白信息的呈现,而不会在呈现时间以外的其他时间进行该叙述旁白信息的呈现。不同的叙述旁白信息的呈现时间可以相同,也可以不同,在本申请中对此不作限定。可以同时呈现两个或两个以上的叙述旁白信息,也可以依次呈现不同的叙述旁白信息。
视觉媒体内容的类型可以是多种多样的,例如视觉媒体内容为一张图像、一组图像(即包括两张或两张以上的图像)或视频。对于不同类型的视觉媒体内容,不同的叙述旁白格式对应的呈现时间的呈现格式可以相同,也可以不同。举例来说,叙述旁白格式和可能的呈现格式的概述在下面的表1中给出。
表1叙述旁白格式v.s.呈现格式
Figure PCTCN2021075622-appb-000001
注释1:对于文本叙述旁白或转换文本叙述旁白,窗口内由视频或一组图像的“开始”和“持续时间”标记的每一帧或图像,整个叙述旁白应一起显示。
注释2:对于音频叙述旁白,叙述旁白应从为视频或一组图像标记的“开始”帧开始播放,并持续以视频的“持续时间”标记的整个时间段,该时间段等于音频信号的播放长度。然而,允许存在其播放时间超过了视频的播放持续时间的音频叙述旁白。如果发生这种情况,则播放设备可以在视频播放时间结束时冻结视频播放,或以循环模式继续视频播放。
注释3:合成音频的叙述旁白的持续应少于持续时间。如果不是这样,则应视为在特定时间存在有一个以上叙述旁白的情况。
注释4:如同常规的文本叙述旁白,转换文本对持续时间内的所有帧都有用。
注释5:对于一组图像,完整的音频(原始或合成)叙述旁白与持续时间内的每个帧相关联。音频叙述旁白的播放与图像的演示无关。播放器可以决定是对每个帧播放相同的合成音频,还是仅对开始帧播放相同的合成音频。例如,如果播放器将图像作为静止图像播放,则播放器可对每一帧重复音频叙述旁白。另一方面,如果图像的播放具有一定的帧率,则可以以非同步的方式播放合成音频。如果播放超过持续时间,则播放器可以冻结视频播放或继续播放(例如在循环模式下)。
步骤202,在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息。
在一些实施例中,所述解析码流,获得视觉媒体内容的至少一个叙述旁白信息和对应的呈现时间,包括:解析码流,得到编码器发送的媒体文件或者比特流;从所述媒体文件或者所述比特流中,获得视觉媒体内容、至少一个叙述旁白信息和对应的呈现时间。
在一些实施例中,所述视觉媒体内容为视频或一组图像;相应地,在所述叙述旁白信息为原始文本、由音频转换的文本或组合的音频和文本时,所述文本的呈现时间以所述视觉媒体内容的被标记的开始帧和至少一个持续帧的形式表示;所述在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息,包括:从播放所述开始帧开始,持续呈现对应的文本,直到所述至少一个持续帧播放完毕。
在一些实施例中,所述视觉媒体内容为视频片段或一组图像;相应地,在所述叙述旁白信息为原始音频、由文本转换的音频或组合的音频和文本时,所述音频的呈现时间以所述视觉媒体内容的被标记的开始帧和持续时长的形式表示;所述在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息,包括:从播放所述开始帧开始播放所述音频,直到所述持续时长内的图像或视频帧播放完毕。
在一些实施例中,所述由文本转换的音频的持续时长内的持续帧数少于对应文本的持续帧数。
在一些实施例中,所述视觉媒体内容为一张图像、一组图像或者视频,所述叙述旁白信息为原始音频、由文本转换的音频或组合的音频和文本;相应地,所述在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息,包括:在播放所述一张图像或者将所述一组图像进行静态展示时,重复播放所述音频;在按照一定帧率播放所述一组图像或者所述视频时,以非同步的方式播放所述音频。
在一些实施例中,所述在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息,包括:在叙述旁白开关处于打开状态时,播放所述视觉媒体内容,并按照所述呈现时间,呈现所述至少一个叙述旁白信息。
在一些实施例中,所述方法还包括:在所述叙述旁白开关处于关闭状态时,播放所述视觉媒体内容而不呈现所述至少一个叙述旁白信息。
在另一些实施例中,所述方法还包括:在所述叙述旁白开关处于关闭状态时,播放所述视觉媒体内容,并关闭所述至少一个叙述旁白信息中的部分属性对应的叙述旁白信息,呈现剩余部分的叙述旁白信息。
例如,呈现所述至少一个叙述旁白信息中表达的是同一含义或类似含义的多个叙述旁白信息中的一个即可。又如,仅呈现具有所述视觉媒体内容的所有权的叙述旁白信息。
在一些实施例中,所述呈现所述叙述旁白信息,包括:在所述叙述旁白信息为原始文本或由音频转换的文本的情况下,将所述文本叠加在所述视觉媒体内容的播放画面之上进行呈现,或者,在独立于所述视觉媒体内容的播放窗口的其他窗口呈现所述文本,或者,将所述原始文本转换为音频进行播放。
在一些实施例中,所述将所述原始文本转换为音频进行播放,包括:在所述视觉媒体内容具有音频的情况下,将属于叙述旁白的音频与属于所述视觉媒体内容的音频混合播放,或者,停止播放属于所述视觉媒体内容的音频而单独播放属于叙述旁白的音频。
在一些实施例中,所述呈现所述叙述旁白信息,包括:在所述叙述旁白信息为原始音频或由原始文本转换的音频以及所述视觉媒体内容具有音频的情况下,将属于叙述旁白的音频与属于所述视觉媒体内容的音频混合播放,或者,停止播放属于所述视觉媒体内容的音频而单独播放属于叙述旁白的音频,或者,将所述原始音频转换为文本之后进行呈现。
在一些实施例中,所述呈现所述叙述旁白信息,包括:在所述叙述旁白信息为组合的文本和音频的情况下,将所述文本和所述音频同时呈现或者分别呈现。
在一些实施例中,所述呈现所述叙述旁白信息,包括:在未播放完所述视觉媒体内容且到达下一叙述旁白信息的呈现时间时,提供第一选项单元供用户选择所述叙述旁白信息的播放状态;以及 在播放完所述视觉媒体内容且未播放完所述叙述旁白信息时,提供第二选项单元供用户选择所述叙述旁白信息的播放状态;以及根据被选择的选项,呈现所述叙述旁白信息。
在一些实施例中,所述根据被选择的选项,呈现所述叙述旁白信息,包括:在所述第一选项单元的第一选项被选择时,冻结(freeze)所述视觉媒体内容的播放,直至所述叙述旁白信息播放完毕,继续播放下一叙述旁白信息和所述视觉媒体内容;在所述第一选项单元的第二选项被选择时,结束播放所述叙述旁白信息而开始播放下一叙述旁白信息;在所述第二选项单元的第三选项被选择时,循环播放所述视觉媒体内容。
需要说明的是,所谓冻结视觉媒体内容的播放,是指停顿在视觉媒体内容的当前帧,而不是从显示界面上消失。
在一些实施例中,所述循环播放所述视觉媒体内容,包括:循环播放所述视觉媒体内容的整体内容,或者,循环播放所述视觉媒体内容中被标记的帧图像。
在一些实施例中,所述方法还包括:从所述媒体文件或者所述比特流中,获取所述至少一个叙述旁白信息的注册信息;在呈现所述叙述旁白信息时,呈现对应的注册信息。
在一些实施例中,所述在呈现所述叙述旁白信息时,呈现对应的注册信息,包括:在呈现所述叙述旁白信息时,显示下拉菜单的触发按键;在所述触发按键接收到触发操作时,显示是否播放注册信息的选项;当指示播放注册信息的选项接收到操作时,呈现所述注册信息。
在一些实施例中,所述叙述旁白信息的注册信息包括以下至少之一:叙述者姓名、创建日期和时间、所述视觉媒体内容的所有权信息。
在一些实施例中,所述在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息,包括:在背景中播放所述视频媒体内容,并按照所述呈现时间,在前景中呈现所述至少一个叙述旁白信息。
在一些实施例中,所述方法还包括:接收新的码流,从所述新的码流中获得所述视觉媒体内容的新的叙述旁白信息;呈现所述新的叙述旁白信息。
在一些实施例中,所述呈现所述新的叙述旁白信息,包括:显示是否播放所述新的叙述旁白信息的选项;当指示播放所述新的叙述旁白信息的选项接收到操作时,呈现所述新的叙述旁白信息。
在一些实施例中,所述解析码流,得到编码器发送的媒体文件或者比特流,包括:解析码流,得到符合预设数据结构的媒体文件或者比特流;其中,所述预设数据结构至少包括下述其中一项:通用数据结构和ISO-BMFF数据结构。
在一些实施例中,所述ISO-BMFF数据结构至少包括叙述旁白元数据盒,所述叙述旁白元数据盒包括叙述旁白元数据处理盒和叙述旁白应用盒;
相应地,所述方法还包括:通过所述叙述旁白元数据处理盒,处理当前叙述旁白信息的元数据;通过所述叙述旁白应用盒,描述以下叙述信息的至少之一:当前叙述旁白信息的开始位置、当前叙述旁白信息的数据长度和叙述旁白信息的总数。
在一些实施例中,所述叙述旁白应用盒包括叙述旁白描述盒,所述方法还包括:通过所述叙述旁白描述盒,描述以下叙述信息的至少之一:文本编码标准、叙述者姓名、创建日期、创建时间、附属视觉内容的所有权标志、叙述旁白信息的类型、叙述旁白信息的编码标准和叙述旁白信息的文本长度。
在一些实施例中,所述方法还包括:若所述视觉媒体内容在文件级别不存在叙述旁白元数据盒,则创建所述叙述旁白元数据盒,并通过所述叙述旁白元数据盒描述所述至少一个叙述旁白信息;若所述视觉媒体内容在文件级别存在叙述旁白元数据盒,则在meco容器盒中创建所述叙述旁白元数据盒,并通过所述叙述旁白元数据盒描述所述至少一个叙述旁白信息。
在一些实施例中,从所述媒体文件或者所述比特流中,获取叙述旁白信息,包括:在所述叙述旁白信息为文本类型的情况下,按照预设文本解码标准,从所述媒体文件或者所述比特流中解码获得所述叙述旁白信息;其中,所述预设文本解码标准为以下其中之一:UTF-8、UTF-16、GB2312-80、GBK以及Big 5。当然,所述预设文本解码标准还可以是其他任何预定义的标准。
在一些实施例中,从所述媒体文件或者所述比特流中,获取叙述旁白信息,包括:在所述叙述旁白信息为音频类型的情况下,按照预设音频解码标准,从所述媒体文件或者所述比特流中解码获得所述叙述旁白信息;其中,所述预设音频解码标准为以下其中之一:AVS audio、MP3、AAC以及WAV。当然,所述预设音频解码标准还可以是其他任何预定义的标准。
如今,图像、视频和短信已经成为一种表达信息和情感的方式。图像或视频可以捕捉视觉信息,但仅仅依靠它们可能不能讲述一个完整的故事。人们通过与视觉媒体内容一起发送互补词来提供背 景信息,或在视觉媒体内容中表达和反映自己对主体的情感。从技术上讲,在当今的计算和通信平台上,数字媒体和这些评论词作为单独的实体被处理。然而,一旦图像或视频片段以特定的媒体格式保存在电子设备中,关于视觉媒体内容的情感和背景的所有文字都被省略了。因此,这些图像和视频很快就变得乏味,并在相对较短的时间内逐渐失去生命。
基于此,下面将说明本申请实施例在一个实际的应用场景中的示例性应用。
为了增强数字媒体的观看体验,在本申请实施例中,描述了一种将叙述旁白信息添加到数字视频、图像或一组图像中的技术。叙述旁白信息可以是文本,也可以是音频,或两者兼有,并且,该叙述旁白信息与原始的视觉媒体内容(包括音频数据)一起写入数字媒体文件或比特流,并可以与数字媒体一起显示或回放。利用这项技术,摄像师或观众等用户可以在捕获或观看视频或图像时记录他们的情绪。该技术支持来自多个用户的叙述旁白,并按用户名(即叙述者)、创建日期和创建时间注册每个叙述旁白条目。将叙述旁白信息连同相关的注册信息作为数据集,以不改变原始视觉和音频内容的特定的数据结构,被保存在数字媒体文件或比特流中。
本申请实施例中,描述了一种技术,该技术可以向数字图像或视频等视觉媒体内容添加叙述旁白信息,并将它们以便于通信和存储的格式保存在一起。叙述旁白信息可以是文本或音频格式,或两者都可以,它可以与视觉媒体内容一起显示或回放,也可以选择关闭不显示叙述旁白,而只显示原始视觉媒体内容。这项技术允许用户通过嵌入数字媒体文件或比特流的叙述旁白信息来表达、分享和交换他们对视觉主题的情感,从而增强用户对数字媒体的观看体验,并促进几代观众的参与度。
到目前为止,还没有开发出一种技术来记录、共享和交换作为数字媒体文件中可方便使用的叙述旁白信息。今天,当人们在视频剪辑中添加叙述旁白信息时,他们不得不使用以前编辑视频的方法,要么将文本嵌入视频原始像素层,要么将声音混合在音频轨道中。这个过程通常需要使用特殊的视频编辑软件,不方便用户使用。因为每次添加新的叙述旁白内容都会改变原有的视频内容,所以不可能让多个用户使用这种方法来分享和交换叙述旁白内容。对于数字图像而言,元数据,如EXIF、IPTC等,仅用于描述和提供图像的技术和管理信息,如捕获过程的技术特征,或图像的所有权和权利等。相比之下,本申请实施例所描述的技术专门用于在视觉媒体内容的创造者和观众之间记录、共享和交换情感评论。特别地,这种技术允许用户在不改变原始视觉媒体内容的情况下记录叙述旁白信息。有了这项技术,用户可以在视觉媒体内容回放时选择观看或收听叙述旁白信息。
本申请实施例描述的叙述旁白系统由编码器和解码器两部分组成。整个系统可以通过运行在电子设备上的应用程序或软件来实现,这些电子设备可以捕捉和显示数字图像,或者录制和播放数字视频。例如,这种装置可以是一个智能手机、平板电脑、电脑或笔记本电脑、或一台电视机。在编码器端,设备从叙述旁白的叙述者那里获取叙述旁白信息,以及将该叙述旁白信息作为特定格式的数据段嵌入到用于图像和视频的媒体文件或位流中。编码过程不会改变原始视觉媒体内容及其相关数据。在解码器端,当图像显示或视频回放时,设备从媒体文件或比特流中提取叙述旁白信息,然后将叙述旁白信息呈现给观众。
在这个系统中,叙述旁白信息可以是文本的形式,也可以是音频的形式,或者两者兼有。文字类的叙述旁白由文本数据表示,而口头叙述旁白则保存为音频片段。系统支持UTF-8、UTF-16、GB2312-80、GBK、Big 5、AAC、MP3、AVS-audio、WAV等多种标准的文本和音频编码。对于视频,除了叙述旁白信息本身,被玩家添加叙述旁白和呈现的时间信息也可以按照视频回放时间被写入叙述旁白信息集中。对于视频,除了叙述旁白信息本身写入叙述旁白信息集之外,叙述旁白信息被添加并应由播放器根据视频播放时间呈现的时间,也写入叙述旁白信息集。该时间信息可以由视频帧号按照相对于视频的开始的展示顺序表示。对于一组图像,可以给多个图像提供叙述旁白。在这种情况下,相应图像的帧号也记录在叙述旁白信息集中。此外,记录在数据段中的叙述旁白信息还包括叙述旁白创建者的姓名、创建日期和时间和/或视觉媒体内容的所有权等。该系统支持多种叙述旁白,这些叙述旁白可以来自视觉媒体内容创建者(例如摄影师),或者来自观看者,或者来自拥有内容并想要添加评论的组织。因此,用户可以在视觉媒体内容采集、编辑和呈现过程中生成和添加叙述旁白。每个叙述旁白条目均以创建者的姓名记录在特定数据结构中。所有权标志用于表示用户是否拥有视频。该系统还允许用户在与相同叙述旁白时间相关联的现有叙述旁白之后添加新的叙述旁白。如果发生这种情况,则在现有的先前叙述旁白之后添加新的叙述旁白。该技术还支持具有文本部分和音频部分的叙述旁白。在这种情况下,在保存文本部分之后保存音频数据。
该系统中的解码器/播放器的基本功能是首先从视觉媒体文件或比特流中解析和解码叙述旁白信息集,然后以叙述旁白信息集指定的时间呈现叙述旁白。播放器可以由一个开关来打开或关闭叙述旁白。当关闭叙述旁白时,原始视觉媒体内容将在不进行任何更改的情况下播放。当打开叙述旁 白时,精确的叙述旁白展示格式是解码器/播放器软件的选择。例如,文本类的叙述旁白可以显示为叠加在视觉媒体内容上的字幕,或者在单独的文本区域中显示,甚至可以在由软件解码器/播放器合成之后作为音频信号播放。如果媒体内容是视频,则该音频信号可以与原始音轨混合播放,或者可通过关闭原始音轨而单独播放。另一方面,语音叙述旁白可以作为独立于图像或视频的音频信号播放,以及可以与视频中的原始音轨混合播放。语音叙述旁白还可以在由软件播放器转换之后以文本显示。当叙述旁白在数据集中具有文本部分和音频部分时,这两个部分应同时一起呈现或单独呈现。当叙述旁白太长而无法在下一个叙述旁白时间出现新的叙述旁白之前或在视频结束之前呈现时,有多个选项可供解码器构建使用,从而给用户提供最大的灵活性并增强观看体验。例如,解码器可以在下一个叙述旁白时间冻结视频播放,同时使对应于当前叙述旁白时间的叙述旁白的播放全部结束,然后继续视频播放和下一个叙述旁白的呈现。如果叙述旁白太长,则解码器可以选择跳过叙述旁白的其余部分,以使视频播放平稳,而不进行冻结。解码器还可以在播放叙述旁白时以循环模式播放原始视频。
可以在播放器中内置叙述旁白提示功能,当存在与正在播放的视觉媒体内容相关联的叙述旁白时,叙述旁白提示功能将显示叙述旁白提示消息,而不是显示实际叙述旁白。然后,观看者可以决定是否要打开叙述旁白播放功能,并查看或聆听实际叙述旁白。播放器可构建有下拉菜单,该下拉菜单具有用于根据观看者的要求显示额外信息(例如叙述者姓名和叙述旁白创建日期)的选项。在叙述旁白播放模式下,播放器可以在前景中播放叙述旁白同时在背景中播放原始媒体(即视觉媒体内容)。如果原始媒体是视频,则在视频播放期间的特定时间存在有一个以上叙述旁白时,观看者可以冻结视频,或者简单地重复以“持续时间”标记的视频片段,或者在回顾叙述旁白时将整个视频作为背景来循环。在播放现有叙述旁白期间出现新的叙述旁白时,播放器可以有数个选项。例如,播放器可以显示新的叙述旁白提示消息,并让观看者控制新的叙述旁白的呈现。在这种情况下,观看者可以再次选择切换到叙述旁白模式,并让媒体作为背景播放或冻结媒体。
用于叙述数字媒体的通用数据结构:
所描述的技术可以应用于不同的数字媒体格式。精确的实现方式依赖于视觉媒体内容的语法结构。下面的表2展示了可以实现上述功能的典型数据集语法和关键数据成分。
表2用于叙述旁白信息集的通用语法
Figure PCTCN2021075622-appb-000002
Figure PCTCN2021075622-appb-000003
*注释:ANL等于narrative_author_name_length,以字节为单位。
本文中使用的约定如下给出:
f(n):表示一个固定模式的位字符串,使用从左到右写入的n位,其中左位在前;
b(8):表示一个字节,可以具有任意位串模式(8位);
u(n):表示使用n位的无符号整数。
语法元素的语义如下:
narrative_data_start_code用于指定叙述旁白信息在比特流中的起始位置的具有固定模式的四字节位串。其通常由一个具有唯一序列的三个字节的起始码前缀,后跟一个专用于指定叙述旁白信息的的字节来组成;
number_of_narrative_point用于指定视频或一组图像中的位置或帧的总数,对于视频,该位置或帧被指定为narrative_entry_time,或者对于一组图像,该位置或帧被指定为narrative_starting_frame_number。每当增加新的narrative_entry_time或新的narrative_starting_frame_number时,该值应增加1。如果原始媒体文件只有一个图像,则number_of_narrative_point设置为1。
narrative_entry_time用于指定叙述旁白被添加并应呈现的时间。该值由视频中的相关帧号按照呈现顺序表示。视频中的第一帧设置为其帧号等于零。仅当原始媒体是视频时,该语法元素才存在。如果叙述旁白的持续时间大于1,则应将该帧号视为起始帧号。
narrative_duration用于指定当原始媒体是视频或一组图像时,叙述旁白将持续的帧的数量。如果叙述旁白是音频剪辑,则narrative_duration等于音频信号的播放长度。当文本叙述旁白被合成并作为音频信号播放时,音频信号的播放应在narrative_duration内完成。当音频叙述旁白转换为文本叙述旁白时,该叙述旁白应在音频播放时间的整个持续时间内针对每个帧整体地呈现。
Narrative_starting_frame_number用于指定已在图像组中叙述的图像的帧号。仅当原始媒体是一组图像时,才存在该语法元素。如果叙述旁白的持续时间大于1,则应将该帧号视为起始帧号。
number_of_narratives用于指定叙述旁白信息条目的总数。当加入一个新的叙述旁白信息时,应将新的叙述旁白信息添加到列表的底部,紧随所有先前的叙述旁白信息之后,并且将number_of_narratives值增加一个。
Text_encoding_standard_id用于指定在叙述旁白数据段中用于叙述者姓名和文本叙述信息的文本编码标准(Text coding standards)。其中,表2示出了本申请实施例提供的一种适用于通用文本编码标准的代码值示例。这里,第一列为通用文本编码标准(Text coding standards),第二列为文本编码标准的代码值示例(text_encoding_standard_id value)。换言之,在本申请实施例中,文本数据段采用下预设文本编码标准进行编码,所述预设文本编码标准至少包括下述其中之一:UTF-8、UTF-16、GB2312-80、GBK和Big 5。当然,所述预设文本编码标准还可以是其他任何预定义的标准。这里可以使用文本编码标准中的任何标准编码。。
表3文本编码标准的代码的示例
Figure PCTCN2021075622-appb-000004
narrative_author_name_length用以指定narrative_author_name的长度,以字节为单位。
narrative_author_name用以指定旁白叙述者的名称,这里叙述者可以是个人或者团体组织。
narrative_creation_date用以指定添加叙述旁白信息的日期。日期的任何标准表达式都可以在此使用。例如,日期可以用数字格式表示,该数字格式使用4位数字表示年,然后是2位数字表示月,然后是2位数字表示日。例如,2019年9月21日为20190921,2019年10月30日为20191030。在该表达式中,每两位数字使用一个字节。
narrative_creation_time用以指定添加叙述旁白信息的时间。时间的任何标准表达式都可以在这里使用。例如,时间可以表示为:hh:mm:ss.TZ,其中,hh(小时)、mm(分钟)、ss(秒)的每位数字使用一个字节,而TZ(时区)使用八位编码。
visual_content_ownership_flag等于1表示叙述者拥有视觉媒体内容。visual_content_ownership_flag等于0表示该叙述旁白条目的叙述者不拥有视觉媒体内容。
narrative_data_type用于指定叙述旁白信息的类型(即数据格式),等于0表示叙述旁白处于文本格式,narrative_data_type等于1表示叙述旁白处于音频格式,narrative_data_type等于2表示叙述旁白处于文本和音频格式。
text_narrative_data_length以字节为单位,用于指定文本叙述旁白信息的长度。
narrative_audio_codec_id用于指定在对音频叙述旁白进行编码时使用的音频编解码器。表4示出了本申请实施例提供的一种适用于通用音频编解码器的代码值示例。这里,第一列为音频编解码器(audio codec),第二列为音频编解码器的代码值示例(narrative_audio_codec_id value)。换言之,在本申请实施例中,音频片段可以采用下述其中之一的通用音频标准进行编码:AVS audio、MP3、AAC和WAV。
表4音频编码标准的代码的示例
音频编解码器 narrative_audio_codec_id值
MP3 0
AAC 1
AVS-audio 2
WAV 3
(Reserved for any other audio codec…)
audio_narrative_data_length用于指定音频叙述旁白信息的长度,以字节为单位,默认值为0。
text_narrative_data用于表示实际为文本格式的叙述旁白信息。
audio_narrative_data用于表示实际音频叙述旁白信息。
基于ISO-BMFF的媒体文件的叙述旁白信息结构:
ISO-BMFF在行业中广泛用作视觉媒体内容(例如视频,静止图像和图像组)的容器格式。例如,当今最流行的视频流和存储格式是MP4格式,它完全符合ISO-BMFF。在本申请实施例中,叙述性数据结构描述为适用于由ISO-BMFF文件格式封装的原始视觉媒体内容。此数据结构完全符合ISO-BMFF中的元数据格式,它可以嵌入在ISO-BMFF文件格式中‘文件’级的‘File’盒或者以‘电影’级的的‘moov’盒内。为了方便理解本申请中描述的叙述旁白特征,描述的叙述旁白信息以三个分层次的层来组织,这使得软件实现易于给现有的媒体文件创建、编辑和播放叙述旁白。
如图3所示,该图示出了具有建议的叙述旁白元数据段的ISO-BMFF文件的总体结构。在图3中,标准ISO-BMFF文件具有“ftyp”盒(即图4所示的文件类型盒401),“moov”盒(即图4所示的叙述旁白元数据盒402的其中之一),“trak”盒,“mdat”盒(即图4所示的所述叙述旁白元数据盒402的其中之一)和“data”盒(即图4所示的媒体数据盒403),其中“ftyp”盒具有关于媒体文件的常规信息,“moov”盒包含有关原始视觉媒体内容的所有元信息的“trak”盒,以及“mdat”盒盒包括全部的原始视觉媒体内容。在该图中,当插入一个新的叙述旁白信息时,表2建议的叙述旁白信息包含在用于叙述旁白的元数据盒“meta(for narration)”和叙述旁白盒“narration box”内。这里叙述旁白信息不包括实际的叙述旁白内容。例如,代表叙述旁白内容的实际文本内容(text_narrative_data)或音频内容(audio_narrative_data)保存在“mdat”盒中的叙述旁白数据段中。如图4所示,此数据段是紧跟在原始视觉数据段后。如上面所述,这个“meta(for narration)”盒也可以放置在结构完全相同的“moov”盒中。
本申请实施例提供一种ISO-BMFF文件的结构,图4为本申请实施例ISO-BMFF文件的结构的示意图,如图4所示,该结构400包括:文件类型盒(即“ftyp box”)401、叙述旁白元数据盒(用meta表示)402和媒体数据盒(即“mdat box”)403;其中,
文件类型盒401,用于容纳指示所述ISO-BMFF文件类型的信息;
叙述旁白元数据盒402,用于容纳媒体数据(即视觉媒体内容)的元数据和所述视觉媒体内容的叙述旁白信息的元数据;其中,所述叙述旁白信息为用户对所述视觉媒体内容的主题内容的情感表达。
需要说明的是,视觉媒体内容可以是视频、图像组或者一张图像。对于叙述旁白信息的类型也不做限定,该叙述旁白信息可以是文本类的,也可以是音频类的,还可以是文本类与音频类相结合的数据。也就是说,该结构支持用户以文字或者语音或者文字与语音混合的形式表达对视觉媒体内容的情感。
媒体数据盒403,用于容纳所述视觉媒体内容和所述叙述旁白信息。
对于视觉媒体内容和叙述旁白信息的排列顺序不做限定,例如,图4所示,叙述旁白信息可以位于视觉媒体内容之后。
在本申请实施例中,媒体数据盒403中不仅容纳有视觉媒体内容,还容纳有视觉媒体内容的叙述旁白信息。用户对该视觉媒体内容的情感表达(即叙述旁白信息)始终和视觉媒体内容在一个文件里,即都在ISO-BMFF文件中,这样,只要用户能够获得该视觉媒体内容,就能够随即记录对该视觉媒体内容的情感,因此该结构使得用户添加叙述旁白变得更为简单,无需通过额外的特定应用来添加叙述旁白;并且,用户只需下载该ISO-BMFF文件即可获得视觉媒体内容的叙述旁白。
本申请实施例再提供一种ISO-BMFF文件的结构,图5A为本申请实施例ISO-BMFF文件的结构的示意图,如图5A所示,该结构500包括:文件类型盒401、叙述旁白元数据盒402和媒体数据盒403;其中,叙述旁白元数据盒402包括:
moov盒501,用于容纳所述视觉媒体内容的元数据;以及
meta盒502,用于容纳所述视觉媒体内容的叙述旁白信息的元数据。
在一些实施例中,meta盒502以文件级别或电影级别的形式存在于moov盒501中。
在一些实施例中,meta盒502的结构如图5B所示,其中meta盒502的语法如下表1所示,meta盒502用于容纳表5所示的信息中的至少之一:
表5 meta盒502的语法
Figure PCTCN2021075622-appb-000005
对表5中的内容解释如下:
(1)该元数据结构可以添加在文件级,也可以添加在Movie级的“moov”盒内;
(2)box_size是该盒502的总大小,以字节为单位;
(3)box_type设置为“meta”,即小写的4个字符,表示这是一个叙述旁白元数据盒;
(4)narrative_metadata_handler_box():该盒中包含的盒子结构,由如下所述的handler_type定义,如下表6所示。这里,ISO-BMFF要求包含该元数据处理盒;
(5)narration_application_box():表示叙述旁白应用格式的主盒,包含在叙述旁白元数据盒中。其详细描述如下表7所示。
表6元数据处理盒的语法
Figure PCTCN2021075622-appb-000006
对表6中的内容解释如下:
(1)box_size是该盒的总大小,以字节为单位;
(2)box_type指定为“hdlr”(即小写的4个字符),以指示这是用于叙述旁白元数据处理盒;
(3)handler_type指定为“napp”(即小写的4个字符),表示元数据处理盒“napp”将用于定义媒体叙述应用格式;
(4)version、flags、predefined、reserved以及name字段可根据ISO-BMFF要求来设置版本,标志,预定义,保留和名称字段。
表7叙述旁白应用盒的语法
Figure PCTCN2021075622-appb-000007
需要说明的是,该叙述旁白应用盒,定义为“完整盒”,将来可被更新。对表7中的内容解释如下:
(1)box_size是该盒的总大小,以字节为单位;
(2)box_type指定为“napp”(即小写的4个字符),用于标示这里是为叙述旁白应用定义的元数据盒格式;
(3)version、flags以及reserved供将来更新的版本,标志和保留字段;
(4)media_type指示视觉媒体内容的格式。示例定义如下所示(注:这里还可以使用ISO-BMFF针对静止图像和图像组定义的媒体类型);
i.视频:“vide”(即小写的4个字符);
ii.静止图像:“imag”(即小写的4个字符);
iii.图像组:“picg”(即小写的4个字符)。
(5)narrative_data_starting_location指示当前叙述旁白信息在原始视觉媒体内容文件关联的“mdat”盒中的开始位置,以字节为单位;
(6)narrative_data_total_length指示“mdat”盒中的叙述旁白信息的总量。每当新的叙述旁白信息添加到ISO-BMFF文件时,都应更新该值。将该值与narrative_data_starting_location相加即可得到下一个叙述旁白信息的起始位置,这将简化该叙述旁白过程的软件实现;
(7)number_of_narration_points指定视频或一组图像中已指定为叙事点的位置或帧的总数,也就是视觉媒体内容中已被指定有叙述旁白的图像帧的总数。每当新的叙述旁白信息添加到尚无任何叙述旁白信息的图像帧时,该值应更新,例如该值增加1。如果视觉媒体内容只有一个图像(例如在静止图像的情况下),则该值设置为1;
(8)narration_point定义为视频或一组图像中已叙述的帧的帧号,即视觉媒体内容中已被指定有叙述旁白的图像帧的帧号。如果叙述旁白信息的持续时间大于1,则应将该帧号视为起始帧号。如果number_of_narration_points大于1,则narration_point应以升序排列。注:语法元素类似于表7中的narrative_entry_time和narrative_starting_frame_number;
(9)narration_point_description():这是包含关于narration_point的信息的盒。即叙述点描述盒,该盒可以容纳如下表8所示的信息中的至少之一。
表8叙述点描述盒的语法
Figure PCTCN2021075622-appb-000008
需要说明的是,对于上述narration_point_description()定义为“完整盒”,将来可更新。对表8中的内容解释如下:(1)box_size是该盒的总大小,以字节为单位;
(2)box_type指定为“nptd”(小写的4个字符),以指示这是一个narration_point的描述盒“nptd”;
(3)version、flags以及reserved供将来更新的版本,标志和保留字段;
(4)number_of_narratives指定叙述旁白信息条目的总数。当加入一个新的叙述旁白信息时,应将新的叙述旁白信息添加到列表的底部,紧随所有先前的叙述旁白信息之后,并且将number_of_narratives值增加一个。
(5)narrative_duration指定当原始媒体是视频或一组图像时,当前叙述旁白信息将持续的帧数。如果媒体是静止图像,media_type等于“imag”,则应将narrative_duration设置为1。如果叙述旁白信息是音频剪辑,则narrative_duration等于音频信号的播放长度。当文本类的叙述旁白信息被合成并作为音频信号播放时,音频信号的播放应在narrative_duration内完成。当音频类的叙述旁白信息转换为文本类时,该叙述旁白信息应在音频播放时间的整个持续时间内针对每个帧整体地呈现;
(6)narrative_data_location指示相对于叙述旁白应用盒中指定的narrative_data_starting_location(即叙述旁白信息的起始位置),该叙述旁白信息在“mdat”盒中的起始位置,即当前叙述旁白信息相对于所述起始位置的位置;
(7)narrative_data_length指示当前叙事数据的长度,以字节为单位;
(8)narrative_description():这是包含关于narration_point的信息的盒,即当前叙述旁白信息的描述盒。该盒可以容纳如下表9所示的信息中的至少之一。
表9当前叙述旁白信息的描述盒的语法
Figure PCTCN2021075622-appb-000009
需要说明的是,当前叙述旁白信息的描述盒,该盒定义为“完整盒”,将来可更新。对于上述表9所示的内容的解释如下:
(1)box_size是该盒的总大小,以字节为单位;
(2)box_type指定为“nrtd”(即小写的4个字符),表示这是当前叙述旁白信息的描述盒“nrtd”;
(3)version、flags以及reserved供将来更新的版本,标志和保留字段;
(4)text_encoding_standard_type用于描述叙述者姓名的文本编码标准。其定义与表7和表8中的“text_encoding_standard_id”相同;如果叙述旁白是文本格式,这里指定的文本编码标准同样适用于对叙述旁白内容的编码;
(5)narrative_author_name_length用于指定narrative_author_name的长度,以字节为单位;
(6)narrative_author_name用于指定创建当前叙述旁白信息的人员或实体的名称。需要说明的是,表中的n等于narrative_author_name_length;
(7)narrative_creation_date用于指定添加叙述旁白信息的日期。其定义与表2相同;
(8)narrative_creation_time用于指定该叙述旁白信息的时间。其定义与表2相同;
(9)media_ownership_flag等于1表示叙述者拥有视觉媒体内容;media_content_ownership_flag等于0表示该叙述旁白条目的叙述者不拥有视觉媒体内容;
(10)narrative_data_type用于指定叙述旁白信息的数据格式,等于0表示叙述旁白处于文本格式,narrative_data_type等于1表示叙述旁白处于音频格式,narrative_data_type等于2表示叙述旁白处于文本和音频格式;如果视觉媒体内容的原始媒体类型为视频,即media_type=‘vide’,则narrative_data_type只能为0;
(11)audio_encoding_type用于指定对音频格式的叙述旁白信息的编码标准。这里可使用任何编码标准。例如,audio_encoding_type可以如下定义:
i.对于文本叙述旁白:遵循表3中的编码标准;
ii.对于音频叙述旁白:遵循表4中的编码标准。
(12)text_narrative_data_length是具有文本部分和音频部分的叙述旁白的文本部分。当叙述旁白具有文本部分和音频部分时,文本部分将首先保存在“mdat”中,然后保存音频数据。音频数据的长度等于叙述旁白信息的图像帧的描述盒中的narrative_data_length减去text_narrative_data_length。
基于前述的实施例,本申请实施例提供的解码器,包括所包括的各模块、以及各模块所包括的各单元,图6为本申请实施例解码器的结构示意图,如图6所示,解码器600包括:包括解码模块601和播放模块602;其中,
解码模块601,用于解析码流,获得视觉媒体内容的至少一个叙述旁白信息和对应的呈现时间;
播放模块602,用于在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息。
在一些实施例中,解码模块601,用于:解析码流,得到编码器发送的媒体文件或者比特流;从所述媒体文件或者所述比特流中,获得视觉媒体内容、至少一个叙述旁白信息和对应的呈现时间。
在一些实施例中,所述视觉媒体内容为视频或一组图像;相应地,在所述叙述旁白信息为原始文本、由音频转换的文本或组合的音频和文本时,所述文本的呈现时间以所述视觉媒体内容的被标记的开始帧和至少一个持续帧的形式表示;播放模块602,用于:从播放所述开始帧开始,持续呈现对应的文本,直到所述至少一个持续帧播放完毕。
在一些实施例中,所述视觉媒体内容为视频片段或一组图像;相应地,在所述叙述旁白信息为原始音频、由文本转换的音频或组合的音频和文本时,所述音频的呈现时间以所述视觉媒体内容的被标记的开始帧和持续时长的形式表示;播放模块602,用于:从播放所述开始帧开始播放所述音频,直到所述持续时长内的图像或视频帧播放完毕。
在一些实施例中,所述由文本转换的音频的持续时长内的持续帧数少于对应文本的持续帧数。
在一些实施例中,所述视觉媒体内容为一张图像、一组图像或者视频,所述叙述旁白信息为原始音频、由文本转换的音频或组合的音频和文本;相应地,播放模块602,用于:在播放所述一张图像或者将所述一组图像进行静态展示时,重复播放所述音频;在按照一定帧率播放所述一组图像或者所述视频时,以非同步的方式播放所述音频。
在一些实施例中,播放模块602,用于:在叙述旁白开关处于打开状态时,播放所述视觉媒体内容,并按照所述呈现时间,呈现所述至少一个叙述旁白信息。
在一些实施例中,播放模块602,用于:在所述叙述旁白开关处于关闭状态时,播放所述视觉 媒体内容而不呈现所述至少一个叙述旁白信息。
在一些实施例中,播放模块602,用于:在所述叙述旁白信息为原始文本或由音频转换的文本的情况下,将所述文本叠加在所述视觉媒体内容的播放画面之上进行呈现,或者,在独立于所述视觉媒体内容的播放窗口的其他窗口呈现所述文本,或者,将所述原始文本转换为音频进行播放。
在一些实施例中,播放模块602,用于:在所述视觉媒体内容具有音频的情况下,将属于叙述旁白的音频与属于所述视觉媒体内容的音频混合播放,或者,停止播放属于所述视觉媒体内容的音频而单独播放属于叙述旁白的音频。
在一些实施例中,播放模块602,用于:在所述叙述旁白信息为原始音频或由原始文本转换的音频以及所述视觉媒体内容具有音频的情况下,将属于叙述旁白的音频与属于所述视觉媒体内容的音频混合播放,或者,停止播放属于所述视觉媒体内容的音频而单独播放属于叙述旁白的音频,或者,将所述原始音频转换为文本之后进行呈现。
在一些实施例中,播放模块602,用于:在所述叙述旁白信息为组合的文本和音频的情况下,将所述文本和所述音频同时呈现或者分别呈现。
在一些实施例中,播放模块602,用于:在未播放完所述视觉媒体内容且到达下一叙述旁白信息的呈现时间时,提供第一选项单元供用户选择所述叙述旁白信息的播放状态;以及在播放完所述视觉媒体内容且未播放完所述叙述旁白信息时,提供第二选项单元供用户选择所述叙述旁白信息的播放状态;以及根据被选择的选项,呈现所述叙述旁白信息。
在一些实施例中,播放模块602,用于:在所述第一选项单元的第一选项被选择时,冻结所述视觉媒体内容的播放,直至所述叙述旁白信息播放完毕,继续播放下一叙述旁白信息和所述视觉媒体内容;在所述第一选项单元的第二选项被选择时,结束播放所述叙述旁白信息而开始播放下一叙述旁白信息;在所述第二选项单元的第三选项被选择时,循环播放所述视觉媒体内容。
在一些实施例中,播放模块602,用于:循环播放所述视觉媒体内容的整体内容,或者,循环播放所述视觉媒体内容中被标记的帧图像。
在一些实施例中,播放模块602,用于:从所述媒体文件或者所述比特流中,获取所述至少一个叙述旁白信息的注册信息;在呈现所述叙述旁白信息时,呈现对应的注册信息。
在一些实施例中,播放模块602,用于:在呈现所述叙述旁白信息时,显示下拉菜单的触发按键;在所述触发按键接收到触发操作时,显示是否播放注册信息的选项;当指示播放注册信息的选项接收到操作时,呈现所述注册信息。
在一些实施例中,所述叙述旁白信息的注册信息包括以下至少之一:叙述者姓名、创建日期和时间、所述视觉媒体内容的所有权信息。
在一些实施例中,播放模块602,用于:在背景中播放所述视频媒体内容,并按照所述呈现时间,在前景中呈现所述至少一个叙述旁白信息。
在一些实施例中,解码模块601,还用于接收新的码流,从所述新的码流中获得所述视觉媒体内容的新的叙述旁白信息;播放模块602,还用于呈现所述新的叙述旁白信息。
在一些实施例中,播放模块602,用于:显示是否播放所述新的叙述旁白信息的选项;当指示播放所述新的叙述旁白信息的选项接收到操作时,呈现所述新的叙述旁白信息。
在一些实施例中,解码模块601,用于:解析码流,得到符合预设数据结构的媒体文件或者比特流;其中,所述预设数据结构至少包括下述其中一项:通用数据结构和国际标准化组织-基于媒体文件格式ISO-BMFF数据结构。
在一些实施例中,所述ISO-BMFF数据结构至少包括叙述旁白元数据盒,所述叙述旁白元数据盒包括元数据处理盒和叙述旁白应用盒;解码模块601,用于:从所述媒体文件或者所述比特流的元数据处理盒中,获得当前叙述旁白信息的元数据;从所述媒体文件或者所述比特流的叙述旁白应用盒中,获得以下至少之一:当前叙述旁白信息的起始位置、当前叙述旁白信息的长度和叙述旁白信息的总数量。
在一些实施例中,所述叙述旁白应用盒包括叙述旁白描述盒,所述方法还包括:通过所述叙述旁白描述盒,解码获得以下叙述信息的至少之一:文本编码标准、叙述者姓名、创建日期、创建时间、附属视觉内容的所有权标志、叙述旁白信息的类型、叙述旁白信息的编码标准和叙述旁白信息的文本长度。
在一些实施例中,解码模块601,还用于:若所述视觉媒体内容在文件级别不存在叙述旁白元数据盒,则获取所述叙述旁白元数据盒,并通过所述叙述旁白元数据盒解码获得所述至少一个叙述旁白信息;若所述视觉媒体内容在文件级别存在叙述旁白元数据盒,则从meco容器盒中获取所述叙 述旁白元数据盒,并通过所述叙述旁白元数据盒解码获得所述至少一个叙述旁白信息。
在一些实施例中,解码模块601,用于:在所述叙述旁白信息为文本类型的情况下,按照预设文本解码标准,从所述媒体文件或者所述比特流中解码获得所述叙述旁白信息;其中,所述预设文本解码标准为以下其中之一:UTF-8、UTF-16、GB2312-80、GBK以及Big 5。当然,所述预设文本解码标准还可以是其他任何预定义的标准。
在一些实施例中,解码模块601,用于:在所述叙述旁白信息为音频类型的情况下,按照预设用音频解码标准,从所述媒体文件或者所述比特流中解码获得所述叙述旁白信息;其中,所述预设音频解码标准为以下其中之一:AVS audio、MP3、AAC以及WAV。当然,所述预设音频解码标准还可以是其他任何预定义的标准。
基于前述的实施例,本申请实施例提供的编码器,包括所包括的各模块、以及各模块所包括的各单元,图7为本申请实施例编码器的结构示意图,如图7所示,编码器700包括:包括确定模块701、嵌入模块702和编码模块703;其中,
确定模块701,用于确定待添加的至少一个叙述旁白信息和对应的呈现时间;
嵌入模块702,用于在不改变所述至少一个叙述旁白信息对应的视觉媒体内容的情况下,将所述至少一个叙述旁白信息和对应的呈现时间以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,得到新的媒体文件或新的比特流;
编码模块703,用于对所述新的媒体文件或所述新的比特流进行编码,得到码流。
在一些实施例中,所述视觉媒体内容为视频或一组图像;相应地,在所述叙述旁白信息为原始文本、由音频转换的文本或组合的音频和文本时,所述文本的呈现时间以所述视觉媒体内容的被标记的开始帧和至少一个持续帧的形式表示。
在一些实施例中,所述视觉媒体内容为视频片段或一组图像;相应地,在所述叙述旁白信息为原始音频、由文本转换的音频或组合的音频和文本时,所述音频的呈现时间以所述视觉媒体内容的被标记的开始帧和持续时长的形式表示。
在一些实施例中,所述由文本转换的音频的持续时长内的持续帧数少于对应文本的持续帧数。
在一些实施例中,嵌入模块702,还用于将所述叙述旁白信息的注册信息以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
在一些实施例中,所述叙述旁白信息的注册信息包括以下至少之一:叙述者姓名、创建日期和时间、所述视觉媒体内容的所有权信息。
在一些实施例中,嵌入模块702,用于:将所述至少一个叙述旁白信息以预设方式存储在所述视觉媒体内容的起始位置。
在一些实施例中,确定模块701,用于:为所述视觉媒体内容的至少一个用户创建叙述旁白信息,获得所述至少一个叙述旁白信息。
在一些实施例中,所述叙述旁白信息的类型包括下述至少一项:文本类型和音频类型;所述视觉媒体内容的类型包括下述至少一项:视频、图像和图像组,所述图像组包括至少两张图像。
在一些实施例中,在当前叙述旁白信息的类型为文本类型时,嵌入模块702,用于:创建文本数据段;将所述当前叙述旁白信息以文本数据段的方式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
在一些实施例中,在当前叙述旁白信息的类型为音频类型时,嵌入模块702,用于:创建音频片段;将所述当前叙述旁白信息以音频片段的方式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
在一些实施例中,在当前叙述旁白信息的类型为文本类型时,嵌入模块702,用于:将所述当前叙述旁白信息转换为音频类型对应的叙述旁白信息,并创建音频片段;将所述当前叙述旁白信息以音频片段的方式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
在一些实施例中,在当前叙述旁白信息的类型为音频类型时,嵌入模块702,用于:将所述当前叙述旁白信息转换为文本类型对应的叙述旁白信息,并创建文本数据段;将所述当前叙述旁白信息以文本数据段的方式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
在一些实施例中,确定模块701,用于当所述视觉媒体内容的类型为图像或图像组时,确定所述至少一个叙述旁白信息的类型为文本类型和/或音频类型;当所述视觉媒体内容的类型为视频时,确定所述至少一个叙述旁白信息的类型为文本类型。
在一些实施例中,嵌入模块702,用于若所述叙述旁白信息的类型包括文本类型和音频类型, 则将所述音频类型对应的叙述旁白信息存储在所述文本类型对应的叙述旁白信息之后。
在一些实施例中,确定模块701,用于确定待添加的新叙述旁白信息;嵌入模块702,用于将所述新叙述旁白信息存储在已有叙述旁白信息之后。
在一些实施例中,媒体文件或者比特流符合预设数据结构;其中,所述预设数据结构至少包括下述其中一项:通用数据结构和国际标准化组织-基于媒体文件格式ISO-BMFF数据结构;嵌入模块702,用于:将所述至少一个叙述旁白信息和对应的呈现时间以所述预设数据结构的形式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
在一些实施例中,所述ISO-BMFF数据结构至少包括叙述旁白元数据盒,所述叙述旁白元数据盒包括叙述旁白元数据处理盒和叙述旁白应用盒;相应地,嵌入模块702,还用于:通过所述叙述旁白元数据处理盒,处理当前叙述旁白信息的元数据;通过所述叙述旁白应用盒,描述以下叙述信息的至少之一:当前叙述旁白信息的开始位置、当前叙述旁白信息的数据长度和叙述旁白信息的总数。
在一些实施例中,所述叙述旁白应用盒包括叙述旁白描述盒,嵌入模块702,还用于:通过所述叙述旁白描述盒,描述以下叙述信息的至少之一:文本编码标准、叙述者姓名、创建日期、创建时间、附属视觉内容的所有权标志、叙述旁白信息的类型、叙述旁白信息的编码标准和叙述旁白信息的文本长度。
在一些实施例中,嵌入模块702,还用于:若所述视觉媒体内容在文件级别不存在叙述旁白元数据盒,则创建所述叙述旁白元数据盒,并通过所述叙述旁白元数据盒描述所述至少一个叙述旁白信息;若所述视觉媒体内容在文件级别存在叙述旁白元数据盒,则在meco容器盒中创建所述叙述旁白元数据盒,并通过所述叙述旁白元数据盒描述所述至少一个叙述旁白信息。
在一些实施例中,所述文本数据段采用预设文本编码标准进行编码,所述预设文本编码标准至少包括下述其中之一:UTF-8、UTF-16、GB2312-80、GBK和Big 5。当然,所述预设文本编码标准还可以是其他任何预定义的标准。
在一些实施例中,所述音频片段采用预设音频编码标准进行编码,所述预设音频编码标准至少包括下述其中之一:AVS audio、MP3、AAC和WAV。当然,所述预设音频编码标准还可以是其他任何预定义的标准。
以上编码器和解码器实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请编码器和解码器实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。
需要说明的是,本申请实施例中,如果以软件功能模块的形式实现上述的方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得电子设备执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本申请实施例不限制于任何特定的硬件和软件结合。
本申请实施例提供了一种计算机存储介质,应用于编码器700,该计算机存储介质存储有计算机程序,所述计算机程序被处理器执行时实现前述实施例中任一项所述的方法。
基于上述编码器700的组成以及计算机存储介质,参见图8,其示出了本申请实施例提供的编码器700的具体硬件结构示意图。如图8所示,可以包括:第一通信接口801、存储器802和处理器803;各个组件通过第一总线系统804耦合在一起。可理解,第一总线系统804用于实现这些组件之间的连接通信。第一总线系统804除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图8中将各种总线都标为第一总线系统804。其中,
第一通信接口801,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
存储器802,用于存储能够在处理器803上运行的计算机程序;
处理器803,用于在运行所述计算机程序时,执行:
确定待添加的至少一个叙述旁白信息和对应的呈现时间;
在不改变所述至少一个叙述旁白信息对应的视觉媒体内容的情况下,将所述至少一个叙述旁白信息和对应的呈现时间以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,得到新的媒体文件或新的比特流;
对所述新的媒体文件或所述新的比特流进行编码,得到码流。
可以理解,本申请实施例中的存储器802可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请描述的系统和方法的存储器802旨在包括但不限于这些和任意其它适合类型的存储器。
而处理器803可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器803中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器803可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器802,处理器803读取存储器802中的信息,结合其硬件完成上述方法的步骤。
可以理解的是,本申请描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其组合中。对于软件实现,可通过执行本申请所述功能的模块(例如过程、函数等)来实现本申请所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。
可选地,作为另一个实施例,处理器803还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
基于上述解码器600的组成以及计算机存储介质,参见图9,其示出了本申请实施例提供的解码器900的具体硬件结构示意图。如图9所示,可以包括:第二通信接口901、存储器902和处理器903;各个组件通过第二总线系统904耦合在一起。可理解,第二总线系统904用于实现这些组件之间的连接通信。第二总线系统904除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图9中将各种总线都标为第二总线系统904。其中,
第二通信接口901,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
存储器902,用于存储能够在处理器803上运行的计算机程序;
处理器903,用于在运行所述计算机程序时,执行:
解析码流,获得视觉媒体内容的至少一个叙述旁白信息和对应的呈现时间;
在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息。
可以理解,存储器902与存储器802的硬件功能类似,处理器903与处理器803的硬件功能类似;这里不再详述。
相应地,本申请实施例提供一种计算机存储介质,其中,所述计算机存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如编码端所述的信息处理的方法、或者如本申请实施例解码端所述的方法。
本申请实施例提供一种电子设备,其中,所述电子设备至少包括本申请实施例所述的编码器和/或本申请实施例所述的解码器。
这里需要指出的是:以上解码器、编码器、存储介质和设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请解码器、编码器、存储介质和设备实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。
应理解,说明书通篇中提到的“一个实施例”或“一实施例”或“一些实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”或“在一些实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者设备中还存在另外的相同要素。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个模块或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或模块的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的模块可以是、或也可以不是物理上分开的,作为模块显示的部件可以是、或也可以不是物理模块;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部模块来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能模块可以全部集成在一个处理单元中,也可以是各模块分别单独作为一个单元,也可以两个或两个以上模块集成在一个单元中;上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得电子设备执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。
本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
以上所述,仅为本申请的实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (55)

  1. 一种信息处理方法,所述方法包括:
    解析码流,获得视觉媒体内容的至少一个叙述旁白信息和对应的呈现时间;
    在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息。
  2. 根据权利要求1所述的方法,其中,所述解析码流,获得视觉媒体内容的至少一个叙述旁白信息和对应的呈现时间,包括:
    解析码流,得到编码器发送的媒体文件或者比特流;
    从所述媒体文件或者所述比特流中,获得视觉媒体内容、至少一个叙述旁白信息和对应的呈现时间。
  3. 根据权利要求1所述的方法,其中,所述视觉媒体内容为视频或一组图像;在所述叙述旁白信息为原始文本、由音频转换的文本或组合的音频和文本时,所述文本的呈现时间以所述视觉媒体内容的被标记的开始帧和至少一个持续帧的形式表示;
    所述在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息,包括:
    从播放所述开始帧开始,持续呈现对应的文本,直到所述至少一个持续帧播放完毕。
  4. 根据权利要求1或3所述的方法,其中,所述视觉媒体内容为视频片段或一组图像;相应地,在所述叙述旁白信息为原始音频、由文本转换的音频或组合的音频和文本时,所述音频的呈现时间以所述视觉媒体内容的被标记的开始帧和持续时长的形式表示;
    所述在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息,包括:
    从播放所述开始帧开始播放所述音频,直到所述持续时长内的图像或视频帧播放完毕。
  5. 根据权利要求4所述的方法,其中,所述由文本转换的音频的持续时长内的持续帧数少于对应文本的持续帧数。
  6. 根据权利要求1所述的方法,其中,所述视觉媒体内容为一张图像、一组图像或者视频,所述叙述旁白信息为原始音频、由文本转换的音频或组合的音频和文本;相应地,所述在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息,包括:
    在播放所述一张图像或者将所述一组图像进行静态展示时,重复播放所述音频;
    在按照一定帧率播放所述一组图像或者所述视频时,以非同步的方式播放所述音频。
  7. 根据权利要求1所述的方法,其中,所述在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息,包括:
    在叙述旁白开关处于打开状态时,播放所述视觉媒体内容,并按照所述呈现时间,呈现所述至少一个叙述旁白信息。
  8. 根据权利要求7所述的方法,其中,所述方法还包括:在所述叙述旁白开关处于关闭状态时,播放所述视觉媒体内容而不呈现所述至少一个叙述旁白信息。
  9. 根据权利要求1或7所述的方法,其中,所述呈现所述叙述旁白信息,包括:
    在所述叙述旁白信息为原始文本或由音频转换的文本的情况下,将所述文本叠加在所述视觉媒体内容的播放画面之上进行呈现,或者,在独立于所述视觉媒体内容的播放窗口的其他窗口呈现所述文本,或者,将所述原始文本转换为音频进行播放。
  10. 根据权利要求9所述的方法,其中,所述将所述原始文本转换为音频进行播放,包括:
    在所述视觉媒体内容具有音频的情况下,将属于叙述旁白的音频与属于所述视觉媒体内容的音频混合播放,或者,停止播放属于所述视觉媒体内容的音频而单独播放属于叙述旁白的音频。
  11. 根据权利要求1所述的方法,其中,所述呈现所述叙述旁白信息,包括:
    在所述叙述旁白信息为原始音频或由原始文本转换的音频以及所述视觉媒体内容具有音频的情况下,将属于叙述旁白的音频与属于所述视觉媒体内容的音频混合播放,或者,停止播放属于所述视觉媒体内容的音频而单独播放属于叙述旁白的音频,或者,将所述原始音频转换为文本之后进行呈现。
  12. 根据权利要求1所述的方法,其中,所述呈现所述叙述旁白信息,包括:
    在所述叙述旁白信息为组合的文本和音频的情况下,将所述文本和所述音频同时呈现或者分别呈现。
  13. 根据权利要求1所述的方法,其中,所述呈现所述叙述旁白信息,包括:
    在未播放完所述视觉媒体内容且到达下一叙述旁白信息的呈现时间时,提供第一选项单元供用户选择所述叙述旁白信息的播放状态;以及
    在播放完所述视觉媒体内容且未播放完所述叙述旁白信息时,提供第二选项单元供用户选择所 述叙述旁白信息的播放状态;以及
    根据被选择的选项,呈现所述叙述旁白信息。
  14. 根据权利要求13所述的方法,其中,所述根据被选择的选项,呈现所述叙述旁白信息,包括:
    在所述第一选项单元的第一选项被选择时,冻结所述视觉媒体内容的播放,直至所述叙述旁白信息播放完毕,继续播放下一叙述旁白信息和所述视觉媒体内容;
    在所述第一选项单元的第二选项被选择时,结束播放所述叙述旁白信息而开始播放下一叙述旁白信息;
    在所述第二选项单元的第三选项被选择时,循环播放所述视觉媒体内容。
  15. 根据权利要求14所述的方法,其中,所述循环播放所述视觉媒体内容,包括:
    循环播放所述视觉媒体内容的整体内容,或者,循环播放所述视觉媒体内容中被标记的帧图像。
  16. 根据权利要求1所述的方法,其中,所述方法还包括:
    从所述媒体文件或者所述比特流中,获取所述至少一个叙述旁白信息的注册信息;
    在呈现所述叙述旁白信息时,呈现对应的注册信息。
  17. 根据权利要求16所述的方法,其中,所述在呈现所述叙述旁白信息时,呈现对应的注册信息,包括:
    在呈现所述叙述旁白信息时,显示下拉菜单的触发按键;
    在所述触发按键接收到触发操作时,显示是否播放注册信息的选项;
    当指示播放注册信息的选项接收到操作时,呈现所述注册信息。
  18. 根据权利要求16所述的方法,其中,所述叙述旁白信息的注册信息包括以下至少之一:叙述者姓名、创建日期和时间、所述视觉媒体内容的所有权信息。
  19. 根据权利要求1所述的方法,其中,所述在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息,包括:
    在背景中播放所述视频媒体内容,并按照所述呈现时间,在前景中呈现所述至少一个叙述旁白信息。
  20. 根据权利要求1所述的方法,其中,所述方法还包括:
    接收新的码流,从所述新的码流中获得所述视觉媒体内容的新的叙述旁白信息;
    呈现所述新的叙述旁白信息。
  21. 根据权利要求20所述的方法,其中,所述呈现所述新的叙述旁白信息,包括:
    显示是否播放所述新的叙述旁白信息的选项;
    当指示播放所述新的叙述旁白信息的选项接收到操作时,呈现所述新的叙述旁白信息。
  22. 根据权利要求2所述的方法,其中,所述解析码流,得到编码器发送的媒体文件或者比特流,包括:
    解析码流,得到符合预设数据结构的媒体文件或者比特流;其中,所述预设数据结构至少包括下述其中一项:通用数据结构和国际标准化组织-基于媒体文件格式ISO-BMFF数据结构。
  23. 根据权利要求22所述的方法,其中,所述ISO-BMFF数据结构至少包括叙述旁白元数据盒,所述叙述旁白元数据盒包括元数据处理盒和叙述旁白应用盒;
    相应地,所述方法还包括:
    从所述媒体文件或者所述比特流的元数据处理盒中,获得当前叙述旁白信息的元数据;
    从所述媒体文件或者所述比特流的叙述旁白应用盒中,获得以下至少之一:当前叙述旁白信息的起始位置、当前叙述旁白信息的长度和叙述旁白信息的总数量。
  24. 根据权利要求23所述的方法,其中,所述叙述旁白应用盒包括叙述旁白描述盒,所述方法还包括:
    通过所述叙述旁白描述盒,解码获得以下叙述信息的至少之一:文本编码标准、叙述者姓名、创建日期、创建时间、附属视觉内容的所有权标志、叙述旁白信息的类型、叙述旁白信息的编码标准和叙述旁白信息的文本长度。
  25. 根据权利要求23所述的方法,其中,所述方法还包括:
    若所述视觉媒体内容在文件级别不存在叙述旁白元数据盒,则获取所述叙述旁白元数据盒,并通过所述叙述旁白元数据盒解码获得所述至少一个叙述旁白信息;
    若所述视觉媒体内容在文件级别存在叙述旁白元数据盒,则从meco容器盒中获取所述叙述旁白元数据盒,并通过所述叙述旁白元数据盒解码获得所述至少一个叙述旁白信息。
  26. 根据权利要求2所述的方法,其中,从所述媒体文件或者所述比特流中,获取叙述旁白信息,包括:
    在所述叙述旁白信息为文本类型的情况下,按照预设文本解码标准,从所述媒体文件或者所述比特流中解码获得所述叙述旁白信息;
    其中,所述预设文本解码标准为以下其中之一:UTF-8、UTF-16、GB2312-80、GBK以及Big 5。
  27. 根据权利要求1所述的方法,其中,从所述媒体文件或者所述比特流中,获取叙述旁白信息,包括:
    在所述叙述旁白信息为音频类型的情况下,按照预设音频解码标准,从所述媒体文件或者所述比特流中解码获得所述叙述旁白信息;
    其中,所述预设音频解码标准为以下其中之一:AVS audio、MP3、AAC以及WAV。
  28. 一种信息处理方法,所述方法包括:
    确定待添加的至少一个叙述旁白信息和对应的呈现时间;
    在不改变所述至少一个叙述旁白信息对应的视觉媒体内容的情况下,将所述至少一个叙述旁白信息和对应的呈现时间以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,得到新的媒体文件或新的比特流;
    对所述新的媒体文件或所述新的比特流进行编码,得到码流。
  29. 根据权利要求28所述的方法,其中,所述视觉媒体内容为视频或一组图像;相应地,在所述叙述旁白信息为原始文本、由音频转换的文本或组合的音频和文本时,所述文本的呈现时间以所述视觉媒体内容的被标记的开始帧和至少一个持续帧的形式表示。
  30. 根据权利要求28所述的方法,其中,所述视觉媒体内容为视频片段或一组图像;相应地,在所述叙述旁白信息为原始音频、由文本转换的音频或组合的音频和文本时,所述音频的呈现时间以所述视觉媒体内容的被标记的开始帧和持续时长的形式表示。
  31. 根据权利要求30所述的方法,其中,所述由文本转换的音频的持续时长内的持续帧数少于对应文本的持续帧数。
  32. 根据权利要求28所述的方法,其中,所述方法还包括:将所述叙述旁白信息的注册信息以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
  33. 根据权利要求32所述的方法,其中,所述叙述旁白信息的注册信息包括以下至少之一:叙述者姓名、创建日期和时间、所述视觉媒体内容的所有权信息。
  34. 根据权利要求28所述的方法,其中,所述将所述至少一个叙述旁白信息以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中包括:
    将所述至少一个叙述旁白信息以预设方式存储在所述视觉媒体内容的起始位置。
  35. 根据权利要求28所述的方法,其中,所述确定待添加的至少一个叙述旁白信息,包括:
    为所述视觉媒体内容的至少一个用户创建叙述旁白信息,获得所述至少一个叙述旁白信息。
  36. 根据权利要求28所述的方法,其中,
    所述叙述旁白信息的类型包括下述至少一项:文本类型和音频类型;
    所述视觉媒体内容的类型包括下述至少一项:视频、图像和图像组,所述图像组包括至少两张图像。
  37. 根据权利要求36所述的方法,其中,在当前叙述旁白信息的类型为文本类型时,所述方法还包括:
    创建文本数据段;
    相应地,将所述至少一个叙述旁白信息以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,包括:
    将所述当前叙述旁白信息以文本数据段的方式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
  38. 根据权利要求36所述的方法,其中,在当前叙述旁白信息的类型为音频类型时,所述方法还包括:
    创建音频片段;
    相应地,将所述至少一个叙述旁白信息以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,包括:
    将所述当前叙述旁白信息以音频片段的方式嵌入到所述视觉媒体内容的媒体文件或者比特流 中。
  39. 根据权利要求36所述的方法,其中,在当前叙述旁白信息的类型为文本类型时,所述方法还包括:
    将所述当前叙述旁白信息转换为音频类型对应的叙述旁白信息,并创建音频片段;
    相应地,将所述至少一个叙述旁白信息以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,包括:
    将所述当前叙述旁白信息以音频片段的方式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
  40. 根据权利要求36所述的方法,其中,在当前叙述旁白信息的类型为音频类型时,所述方法还包括:
    将所述当前叙述旁白信息转换为文本类型对应的叙述旁白信息,并创建文本数据段;
    相应地,将所述至少一个叙述旁白信息以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,包括:
    将所述当前叙述旁白信息以文本数据段的方式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
  41. 根据权利要求36所述的方法,其中,所述方法还包括:
    当所述视觉媒体内容的类型为图像或图像组时,确定所述至少一个叙述旁白信息的类型为文本类型和/或音频类型;
    当所述视觉媒体内容的类型为视频时,确定所述至少一个叙述旁白信息的类型为文本类型。
  42. 根据权利要求28所述的方法,其中,所述方法还包括:
    若所述叙述旁白信息的类型包括文本类型和音频类型,则将所述音频类型对应的叙述旁白信息存储在所述文本类型对应的叙述旁白信息之后。
  43. 根据权利要求28所述的方法,其中,所述方法还包括:
    确定待添加的新叙述旁白信息;
    将所述新叙述旁白信息存储在已有叙述旁白信息之后。
  44. 根据权利要求28至43任一项所述的方法,其中,媒体文件或者比特流符合预设数据结构;其中,所述预设数据结构至少包括下述其中一项:通用数据结构和国际标准化组织-基于媒体文件格式ISO-BMFF数据结构;相应地,所述将所述至少一个叙述旁白信息和对应的呈现时间以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,包括:
    将所述至少一个叙述旁白信息和对应的呈现时间以所述预设数据结构的形式嵌入到所述视觉媒体内容的媒体文件或者比特流中。
  45. 根据权利要求43所述的方法,其中,所述ISO-BMFF数据结构至少包括叙述旁白元数据盒,所述叙述旁白元数据盒包括叙述旁白元数据处理盒和叙述旁白应用盒;
    相应地,所述方法还包括:
    通过所述叙述旁白元数据处理盒,处理当前叙述旁白信息的元数据;
    通过所述叙述旁白应用盒,描述以下叙述信息的至少之一:当前叙述旁白信息的开始位置、当前叙述旁白信息的数据长度和叙述旁白信息的总数。
  46. 根据权利要求45所述的方法,其中,所述叙述旁白应用盒包括叙述旁白描述盒,所述方法还包括:
    通过所述叙述旁白描述盒,描述以下叙述信息的至少之一:文本编码标准、叙述者姓名、创建日期、创建时间、附属视觉内容的所有权标志、叙述旁白信息的类型、叙述旁白信息的编码标准和叙述旁白信息的文本长度。
  47. 根据权利要求45所述的方法,其中,所述方法还包括:
    若所述视觉媒体内容在文件级别不存在叙述旁白元数据盒,则创建所述叙述旁白元数据盒,并通过所述叙述旁白元数据盒描述所述至少一个叙述旁白信息;
    若所述视觉媒体内容在文件级别存在叙述旁白元数据盒,则在meco容器盒中创建所述叙述旁白元数据盒,并通过所述叙述旁白元数据盒描述所述至少一个叙述旁白信息。
  48. 根据权利要求37或40所述的方法,其中,所述文本数据段采用预设文本编码标准进行编码,所述预设文本编码标准至少包括下述其中之一:UTF-8、UTF-16、GB2312-80、GBK和Big 5。
  49. 根据权利要求38或39所述的方法,其中,所述音频片段采用预设音频编码标准进行编码,所述预设音频编码标准至少包括下述其中之一:AVS audio、MP3、AAC和WAV。
  50. 一种解码器,所述解码器包括解码模块和播放模块;其中,
    所述解码模块,用于解析码流,获得视觉媒体内容的至少一个叙述旁白信息和对应的呈现时间;
    所述播放模块,用于在播放所述视觉媒体内容时,按照所述呈现时间,呈现所述至少一个叙述旁白信息。
  51. 一种解码器,所述解码器包括存储器和处理器;其中,
    所述存储器,用于存储能够在所述处理器上运行的计算机程序;
    所述处理器,用于在运行所述计算机程序时,执行如权利要求1至27任一项所述的方法。
  52. 一种编码器,所述编码器包括确定模块、嵌入模块和编码模块;其中,
    所述确定模块,用于确定待添加的至少一个叙述旁白信息和对应的呈现时间;
    所述嵌入模块,用于在不改变所述至少一个叙述旁白信息对应的视觉媒体内容的情况下,将所述至少一个叙述旁白信息和对应的呈现时间以预设方式嵌入到所述视觉媒体内容的媒体文件或者比特流中,得到新的媒体文件或新的比特流;
    所述编码模块,用于对所述新的媒体文件或所述新的比特流进行编码,得到码流。
  53. 一种编码器,所述编码器包括存储器和处理器;其中,
    所述存储器,用于存储能够在所述处理器上运行的计算机程序;
    所述处理器,用于在运行所述计算机程序时,执行如权利要求28至49任一项所述的方法。
  54. 一种计算机存储介质,其中,所述计算机存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至27任一项所述的方法、或者如权利要求28至49任一项所述的方法。
  55. 一种电子设备,其中,所述电子设备至少包括如权利要求52或53所述的编码器和/或如权利要求50或51所述的解码器。
PCT/CN2021/075622 2020-05-15 2021-02-05 信息处理方法及编码器、解码器、存储介质设备 WO2021227580A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202180035459.3A CN115552904A (zh) 2020-05-15 2021-02-05 信息处理方法及编码器、解码器、存储介质设备

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202063025742P 2020-05-15 2020-05-15
US63/025,742 2020-05-15
US202063034295P 2020-06-03 2020-06-03
US63/034,295 2020-06-03

Publications (1)

Publication Number Publication Date
WO2021227580A1 true WO2021227580A1 (zh) 2021-11-18

Family

ID=78526399

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/075622 WO2021227580A1 (zh) 2020-05-15 2021-02-05 信息处理方法及编码器、解码器、存储介质设备

Country Status (2)

Country Link
CN (1) CN115552904A (zh)
WO (1) WO2021227580A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120089904A1 (en) * 2008-12-31 2012-04-12 Microsoft Corporation Conversion of declarative statements into a rich interactive narrative
US20160212487A1 (en) * 2015-01-19 2016-07-21 Srinivas Rao Method and system for creating seamless narrated videos using real time streaming media
CN107851425A (zh) * 2015-08-05 2018-03-27 索尼公司 信息处理设备、信息处理方法和程序
CN109300177A (zh) * 2017-07-24 2019-02-01 中兴通讯股份有限公司 一种图片处理方法和装置
CN110475159A (zh) * 2018-05-10 2019-11-19 中兴通讯股份有限公司 多媒体信息的传输方法及装置、终端
CN111046199A (zh) * 2019-11-29 2020-04-21 鹏城实验室 一种为图像加旁白的方法以及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120089904A1 (en) * 2008-12-31 2012-04-12 Microsoft Corporation Conversion of declarative statements into a rich interactive narrative
US20160212487A1 (en) * 2015-01-19 2016-07-21 Srinivas Rao Method and system for creating seamless narrated videos using real time streaming media
CN107851425A (zh) * 2015-08-05 2018-03-27 索尼公司 信息处理设备、信息处理方法和程序
CN109300177A (zh) * 2017-07-24 2019-02-01 中兴通讯股份有限公司 一种图片处理方法和装置
CN110475159A (zh) * 2018-05-10 2019-11-19 中兴通讯股份有限公司 多媒体信息的传输方法及装置、终端
CN111046199A (zh) * 2019-11-29 2020-04-21 鹏城实验室 一种为图像加旁白的方法以及电子设备

Also Published As

Publication number Publication date
CN115552904A (zh) 2022-12-30

Similar Documents

Publication Publication Date Title
US11410704B2 (en) Generation and use of user-selected scenes playlist from distributed digital content
TWI701945B (zh) 用於高品質體驗的音頻信息的有效傳遞和使用的方法和裝置
US9031963B2 (en) Method of interactive video blogging
US8819559B2 (en) Systems and methods for sharing multimedia editing projects
JP2004288197A (ja) スクリーンエリアインセット内にデータ表現を提示するためのインターフェース
TW200837728A (en) Timing aspects of media content rendering
JP2000069442A (ja) 動画システム
CN114979750A (zh) 用于虚拟现实的方法、装置和计算机可读介质
US10084840B2 (en) Social networking with video annotation
US20220150296A1 (en) Method and apparatus for grouping entities in media content
WO2021227580A1 (zh) 信息处理方法及编码器、解码器、存储介质设备
US20240107087A1 (en) Server, terminal and non-transitory computer-readable medium
KR101295377B1 (ko) 파일 포맷을 구성하는 방법과 상기 파일 포맷을 가지는파일을 포함한 방송 신호를 처리하는 장치 및 방법
WO2022037026A1 (zh) 信息处理方法、编码器、解码器以及存储介质和设备
US20230062691A1 (en) Method, An Apparatus and A Computer Program Product for Video Encoding and Video Decoding
CN112312219A (zh) 一种流媒体视频播放、生成方法及设备
US11973820B2 (en) Method and apparatus for mpeg dash to support preroll and midroll content during media playback
US20230224557A1 (en) Auxiliary mpds for mpeg dash to support prerolls, midrolls and endrolls with stacking properties
US12009014B2 (en) Generation and use of user-selected scenes playlist from distributed digital content
Karlins Enhancing a Dreamweaver CS3 Web Site with Flash Video: Visual QuickProject Guide
JP3833192B2 (ja) 情報作成装置および方法、並びにプログラム
Green et al. Video
Ferncase QuickTime for filmmakers
Shrestha Optimising Media Contents for Mobile Devices: Creating Smart Media with Metadata

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21804709

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21804709

Country of ref document: EP

Kind code of ref document: A1